Numpy Percentile error

2533
19
Jump to solution
10-09-2019 12:47 PM
GrantPalmer
New Contributor II

Hi all, running into an error and i'm not sure why when I am trying to rank the attribute field of a shapefile. I have a poly line shapefile of some streams that have an attribute field of some normalized data that is the data type of 'Double' and I am trying to rank these values by quartile. and store their rank in another attribute field. I know that you can use symbology > graduated colors > method: quantile with 4 classes and it is displaying my data correctly. However, I need to be able to have an attribute field with a rank to be able to use the data down the line. 

I have been using python for my processes so far but I am currently running into an error and i'm not sure why/cant really find an answer online anywhere else. Here is a sample of my code

maximum = max(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(maximum)
minimum = min(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(minimum)
arr = arcpy.da.FeatureClassToNumPyArray(NHDFlowline_HUC12, ('Normalized_Linear'))
p1 = np.percentile(arr, 25)
p2 = np.percentile(arr, 50)
p3 = np.percentile(arr, 75)
p4 = np.percentile(arr, 100)

with arcpy.da.UpdateCursor(NHDFlowline_HUC12, ['Linear_Rank', 'Normalized_Linear']) as cursor:
for row in cursor:
if minimum <= row[1] <= p1:
row[0] = 1
elif p1 < row[1] <= p2:
row[0] = 2
elif p2 < row[1] <= p3:
row[0] = 3
elif row[1] > p3:
row[0] = 4
cursor.updateRow(row)

first step should store a max and min value for the normalized data attribute and then create an array containing the values of my shapefile's attribute field 'Normalized_Linear' then the next steps are to assing values to p1 thru p4 as the breaks for the quartile and then use updateCursor to store in the rank. The resulting error is:

Traceback (most recent call last):
File "script path", line 143, in <module>
p1 = np.percentile(arr, 25, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
File "C:\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\numpy\lib\function_base.py", line 4269, in percentile
interpolation=interpolation)
File "C:\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\numpy\lib\function_base.py", line 4011, in _ureduce
r = func(a, **kwargs)
File "C:\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\numpy\lib\function_base.py", line 4386, in _percentile
x1 = take(ap, indices_below, axis=axis) * weights_below
TypeError: invalid type promotion

I am unsure of how to go about fixing this TypeError: invalid type promotion. I feel like it may have something to do with the data type being double but if so, I would like to know how to work around this.

Any help would be much appreciated

0 Kudos
1 Solution

Accepted Solutions
DanPatterson_Retired
MVP Esteemed Contributor

I suspect you will have to wait for 'stu' and 'uts'.

Far less exciting and safe is...

np.version.version

Out[10]: '1.16.4'

b = arcpy.da.FeatureClassToNumPyArray(in_fc, 'Doubles')

b
array([(1.2,), (1.4,), (1.8,), (1.6,), (1.9,), (1.1,), (1.3,), (1.5,), (1.7,)],
      dtype=[('Doubles', '<f8')])

a = b.view(dtype=np.float64)  # ---- a view with a float64 dtype

np.percentile(a, (25, 50, 75))

array([1.3, 1.5, 1.7])

View solution in original post

19 Replies
JoeBorgione
MVP Esteemed Contributor

You may want to use the syntax highlighter to post your code rather than a screen capture.  One thing that jumps out at me are the lack of indents after 'for row in cursor:' as well as all your if/elif statements, but that might be a function of your screen capture....

That should just about do it....
0 Kudos
GrantPalmer
New Contributor II

Sorry, I had not used the syntax highlighter before, here is what that shows, the indents are fine in pycharm. The error i'm receiving is on line 6

maximum = max(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(maximum)
minimum = min(row[0] for row in arcpy.da.SearchCursor(NHDFlowline_HUC12, ['Normalized_Linear']))
print(minimum)
arr = arcpy.da.FeatureClassToNumPyArray(NHDFlowline_HUC12, ('Normalized_Linear'))
p1 = np.percentile(arr, 25)
p2 = np.percentile(arr, 50)
p3 = np.percentile(arr, 75)
p4 = np.percentile(arr, 100)

with arcpy.da.UpdateCursor(NHDFlowline_HUC12, ['Linear_Rank', 'Normalized_Linear']) as cursor:
    for row in cursor:
        if minimum <= row[1] <= p1:
            row[0] = 1
        elif p1 < row[1] <= p2:
            row[0] = 2
        elif p2 < row[1] <= p3:
            row[0] = 3
        elif row[1] > p3:
            row[0] = 4
        cursor.updateRow(row)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos
JoeBorgione
MVP Esteemed Contributor

And my guess if you comment out line 6 it errors then at line 7 and so on?

That should just about do it....
0 Kudos
GrantPalmer
New Contributor II

Yes, this is true

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Grant, could you use /blogs/dan_patterson/2016/08/14/script-formatting 

your indentation looks a bit off and the actual line numbers would help.

1.  any nodata in your data set?  if so, use np.nanpercentile … 

2,  your problem is that when you use da.FeatureClassToNumPyArray, you still have a structured array

b = arcpy.da.FeatureClassToNumPyArray(in_fc, 'Doubles')

b

array([(1.2,), (1.4,), (1.8,), (1.6,), (1.9,), (1.1,), (1.3,), (1.5,), (1.7,)],
      dtype=[('Doubles', '<f8')])

np.percentile(b, (25, 50, 75))
Traceback (most recent call last):

# ----- big error traceback snip

TypeError: invalid type promotion

Now recent versions of numpy have added a couple of helper functions.  My favs are 'stu' (readup on it on GitHub on in the help for numpy)

from numpy.lib.recfunctions import structured_to_unstructured as stu

# ---- from before

b
array([(1.2,), (1.4,), (1.8,), (1.6,), (1.9,), (1.1,), (1.3,), (1.5,), (1.7,)],
      dtype=[('Doubles', '<f8')])

a = stu(b)  # ---- let the magic happen

np.percentile(a, (25, 50, 75))  # ---- you can use a tuple for the percentiles

array([1.3, 1.5, 1.7])

And 'stu' has an alter ego … 'uts'  to go the other way, making an ndarray into a structured array.  There is some technical details about repacking fields for some types of arrays, but read up on the discussion on the numpy discussion archive or on GitHub.  

# ---- the other helpers
from numpy.lib.recfunctions import unstructured_to_structured as uts
from numpy.lib.recfunctions import repack_fields

# ---- ps... the naming of 'stu' and 'uts' is solely mine ;)
GrantPalmer
New Contributor II

Sorry about the formatting, see the above reply for the proper use of syntax highlighter. 

1. I do not have any NoData, every entry has a value or a 0. 

2.When I try to 

from numpy.lib.recfunctions import structured_to_unstructured as stu

Pycharm says it cannot find reference 'structured_to_unstructured' in 'recfunctions.py'

0 Kudos
JoeBorgione
MVP Esteemed Contributor

When I use python 3.x I can make that import, but with 2.x it does not.  What version of python does your pycharm point to?

That should just about do it....
0 Kudos
GrantPalmer
New Contributor II

Python 3.6 (arcgispro-py3) I believe 

0 Kudos
JoeBorgione
MVP Esteemed Contributor

That's strange.  I use spyder for 3.x and Idle for 2.x.  Here is the Idle error which seems to be what you are getting:

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    from numpy.lib.recfunctions import structured_to_unstructured as stu
ImportError: cannot import name structured_to_unstructured
That should just about do it....
0 Kudos