Numpy Percentile error

GrantPalmer · ‎10-09-2019

Hi all, running into an error and i'm not sure why when I am trying to rank the attribute field of a shapefile. I have a poly line shapefile of some streams that have an attribute field of some normalized data that is the data type of 'Double' and I am trying to rank these values by quartile. and store their rank in another attribute field. I know that you can use symbology > graduated colors > method: quantile with 4 classes and it is displaying my data correctly. However, I need to be able to have an attribute field with a rank to be able to use the data down the line.

I have been using python for my processes so far but I am currently running into an error and i'm not sure why/cant really find an answer online anywhere else. Here is a sample of my code

maximum = max(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(maximum)
minimum = min(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(minimum)
arr = arcpy.da.FeatureClassToNumPyArray(NHDFlowline_HUC12, ('Normalized_Linear'))
p1 = np.percentile(arr, 25)
p2 = np.percentile(arr, 50)
p3 = np.percentile(arr, 75)
p4 = np.percentile(arr, 100)

with arcpy.da.UpdateCursor(NHDFlowline_HUC12, ['Linear_Rank', 'Normalized_Linear']) as cursor:
 for row in cursor:
 if minimum <= row[1] <= p1:
 row[0] = 1
 elif p1 < row[1] <= p2:
 row[0] = 2
 elif p2 < row[1] <= p3:
 row[0] = 3
 elif row[1] > p3:
 row[0] = 4
 cursor.updateRow(row)

first step should store a max and min value for the normalized data attribute and then create an array containing the values of my shapefile's attribute field 'Normalized_Linear' then the next steps are to assing values to p1 thru p4 as the breaks for the quartile and then use updateCursor to store in the rank. The resulting error is:

Traceback (most recent call last):
File "script path", line 143, in <module>
p1 = np.percentile(arr, 25, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
File "C:\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\numpy\lib\function_base.py", line 4269, in percentile
interpolation=interpolation)
File "C:\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\numpy\lib\function_base.py", line 4011, in _ureduce
r = func(a, **kwargs)
File "C:\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\numpy\lib\function_base.py", line 4386, in _percentile
x1 = take(ap, indices_below, axis=axis) * weights_below
TypeError: invalid type promotion

I am unsure of how to go about fixing this TypeError: invalid type promotion. I feel like it may have something to do with the data type being double but if so, I would like to know how to work around this.

Any help would be much appreciated

DanPatterson_Retired · ‎10-09-2019

I suspect you will have to wait for 'stu' and 'uts'.

Far less exciting and safe is...

np.version.version

Out[10]: '1.16.4'

b = arcpy.da.FeatureClassToNumPyArray(in_fc, 'Doubles')

b
array([(1.2,), (1.4,), (1.8,), (1.6,), (1.9,), (1.1,), (1.3,), (1.5,), (1.7,)],
      dtype=[('Doubles', '<f8')])

a = b.view(dtype=np.float64)  # ---- a view with a float64 dtype

np.percentile(a, (25, 50, 75))

array([1.3, 1.5, 1.7])‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

JoeBorgione · ‎10-09-2019

You may want to use the syntax highlighter to post your code rather than a screen capture. One thing that jumps out at me are the lack of indents after 'for row in cursor:' as well as all your if/elif statements, but that might be a function of your screen capture....

That should just about do it....

GrantPalmer · ‎10-09-2019

Sorry, I had not used the syntax highlighter before, here is what that shows, the indents are fine in pycharm. The error i'm receiving is on line 6

maximum = max(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(maximum)
minimum = min(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(minimum)
arr = arcpy.da.FeatureClassToNumPyArray(NHDFlowline_HUC12, ('Normalized_Linear'))
p1 = np.percentile(arr, 25)
p2 = np.percentile(arr, 50)
p3 = np.percentile(arr, 75)
p4 = np.percentile(arr, 100)

with arcpy.da.UpdateCursor(NHDFlowline_HUC12, ['Linear_Rank', 'Normalized_Linear']) as cursor:
    for row in cursor:
        if minimum <= row[1] <= p1:
            row[0] = 1
        elif p1 < row[1] <= p2:
            row[0] = 2
        elif p2 < row[1] <= p3:
            row[0] = 3
        elif row[1] > p3:
            row[0] = 4
        cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoeBorgione · ‎10-09-2019

And my guess if you comment out line 6 it errors then at line 7 and so on?

That should just about do it....

GrantPalmer · ‎10-09-2019

Yes, this is true

DanPatterson_Retired · ‎10-09-2019

Grant, could you use /blogs/dan_patterson/2016/08/14/script-formatting

your indentation looks a bit off and the actual line numbers would help.

1. any nodata in your data set? if so, use np.nanpercentile …

2, your problem is that when you use da.FeatureClassToNumPyArray, you still have a structured array

b = arcpy.da.FeatureClassToNumPyArray(in_fc, 'Doubles')

b

array([(1.2,), (1.4,), (1.8,), (1.6,), (1.9,), (1.1,), (1.3,), (1.5,), (1.7,)],
      dtype=[('Doubles', '<f8')])

np.percentile(b, (25, 50, 75))
Traceback (most recent call last):

# ----- big error traceback snip

TypeError: invalid type promotion‍‍‍‍‍‍‍‍‍‍‍‍‍

Now recent versions of numpy have added a couple of helper functions. My favs are 'stu' (readup on it on GitHub on in the help for numpy)

from numpy.lib.recfunctions import structured_to_unstructured as stu

# ---- from before

b
array([(1.2,), (1.4,), (1.8,), (1.6,), (1.9,), (1.1,), (1.3,), (1.5,), (1.7,)],
      dtype=[('Doubles', '<f8')])

a = stu(b)  # ---- let the magic happen

np.percentile(a, (25, 50, 75))  # ---- you can use a tuple for the percentiles

array([1.3, 1.5, 1.7])‍‍‍‍‍‍‍‍‍‍‍‍‍

And 'stu' has an alter ego … 'uts' to go the other way, making an ndarray into a structured array. There is some technical details about repacking fields for some types of arrays, but read up on the discussion on the numpy discussion archive or on GitHub.

# ---- the other helpers
from numpy.lib.recfunctions import unstructured_to_structured as uts
from numpy.lib.recfunctions import repack_fields

# ---- ps... the naming of 'stu' and 'uts' is solely mine ;)‍‍‍‍‍

GrantPalmer · ‎10-09-2019

Sorry about the formatting, see the above reply for the proper use of syntax highlighter.

1. I do not have any NoData, every entry has a value or a 0.

2.When I try to

from numpy.lib.recfunctions import structured_to_unstructured as stu‍

Pycharm says it cannot find reference 'structured_to_unstructured' in 'recfunctions.py'

JoeBorgione · ‎10-09-2019

When I use python 3.x I can make that import, but with 2.x it does not. What version of python does your pycharm point to?

That should just about do it....

GrantPalmer · ‎10-09-2019

Python 3.6 (arcgispro-py3) I believe

JoeBorgione · ‎10-09-2019

That's strange. I use spyder for 3.x and Idle for 2.x. Here is the Idle error which seems to be what you are getting:

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    from numpy.lib.recfunctions import structured_to_unstructured as stu
ImportError: cannot import name structured_to_unstructured‍‍‍‍

That should just about do it....