Sorting and ranking... arcpy and numpy play nice...

2873
9
09-01-2016 07:08 AM
Labels (1)
DanPatterson_Retired
MVP Emeritus
1 9 2,873

I had made reference before of using numpy and arcpy together to perform particular tasks.  Sometimes the combination of the two enables the user to do things they wouldn't otherwise be able to do.  This ditty presents a very simple way to sort a dataset and produce a permantly sorted version while maintaining the original feature identification numbers.  So to begin with, I will remind/introduce the reader to the arcpy data access module's functions as well as comparable ones in numpy.

On the arcmap side, the user should become familiar with:

  • arcpy.Describe
  • arcpy.Sort
  • arcpy.da.FeatureClassToNumPyArray and
  • arcpy.da.NumPyArrayToFeatureClass. 

From the  numpy side:

  • np.setprintoptions
  • np.arange and
  • np.sort

The latter performs the same work as arcpy.Sort, but it doesn't have any license restrictions as the Sort tool does.  It also allows you to produce a sequential list of numbers like the oft used Autoincrement field calculator function.

The script will perform the following tasks:

  • import the required modules,
  • specify the input file,
  • the spatial reference of the input,
  • the fields used to sort on, in the order of priority,
  • a field to place the rank values, and
  • specify the output file.

One can add the field to place the ranks into ahead of time or add a line of code to do it within the script (see arcpy.AddField_management).

# imports ......

import numpy as np
import arcpy

np.set_printoptions(edgeitems=4,linewidth=75,precision=2,
                    suppress=True,threshold=10)

arcpy.env.overwriteOutput = True

# inputs  ......

input_shp = r'c:\!Scripts\Shapefiles\RandomPnts.shp'
all_fields = [fld.name for fld in arcpy.ListFields(input_shp)]
SR = arcpy.Describe(input_shp).spatialReference

sort_fields = ['X','Y']
order_field = 'Sort_id'

output_shp = r'c:\!Scripts\Shapefiles\SortedPnts.

# outputs ......

arr = arcpy.da.FeatureClassToNumPyArray(input_shp, "*", spatial_reference=SR)
arr_sort = np.sort(arr, order=sort_fields)

arr_sort['Sort_id'] = np.arange(arr_sort.size)

arcpy.da.NumPyArrayToFeatureClass(arr_sort, output_shp, ['Shape'], SR)


# summary .....

print('\nInput file: {}\nField list: {}'.format(input_shp,all_fields))

print('Sort by: {},  ranks to: {}'.format(sort_fields,orderfield))

‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

result

Input file: c:\!Scripts\Shapefiles\RandomPnts.shp

Field list: [u'FID', u'Shape', u'Id', u'X', u'Y', u'Sort_id']

Sort by: ['X', 'Y'],  ranks to: Sort_id

Sort_np_arc.png 

Give it a try with your own files.

9 Comments
AndrewHargreaves
Occasional Contributor II

Hi Dan

I need to perform some ranking of features in a Hosted Feature service - before i spent time trying to figure out how to get Numpy can you confirm if that is possible using Numpy?

Speaking of which, I've scoured the blog but can't find a "Starting point" for how to install Numpy and the Official Numpy page is not exactly user friendly..any pointers on how to easily get it up and running with something like PyScripter?

DanPatterson_Retired
MVP Emeritus

I know nothing about Hosted Feature services, so I will have to defer to others.

As for NumPy, it is installed by default when you install ArcMap.  To check, you can simply go into PyScripter and in the interactive window type the following at the prompt

>>> import numpy as np   # np is the common knickname for numpy

KevinHibma
Esri Regular Contributor

Andrew,

Give a read of this here: Quick Tips: Consuming Feature Services with Geoprocessing | ArcGIS Blog  Once you see how a feature service (hosted or straight from Server) can be used in arcpy/geoprocessing, you can apply Dan's techniques.

AndrewHargreaves
Occasional Contributor II

I've followed Kevin's advice and am working with a FeatureSet in Python - Thanks @Kevin Hibma.

I have my data being sorted, then reverse sorted. However, what I can't glean from Dan Patterson's code is how he populates the [sort_id] values...       

DanPatterson_Retired
MVP Emeritus
  • sort_fields = ['X','Y'] 
  • order_field = 'Sort_id' 
  • output_shp = r'c:\!Scripts\Shapefiles\SortedPnts.shp' 
  • # outputs ...... 
  • arr = arcpy.da.FeatureClassToNumPyArray(input_shp,"*",spatial_reference=SR) 
  • arr_sort = np.sort(arr, order=sort_fields) 
  • arr_sort['Sort_id'] = np.arange(arr_sort.size) 
  • arcpy.da.NumPyArrayToFeatureClass(arr_sort, output_shp,['Shape'],SR) 

change sort_fields = ['Consumption']

I sorted by X and Y field since I wanted lexicographical sorting of the coordinates...in your case you don't care.

order_field is  where the orde goes the data goes and this clever line

arr_sort['Sort_id'] = np.arange(arr_sort.size)   simply puts a sequence of numbers from 0 to the size of the array/table into the column called 'Sort_id' in the array arr_sort, WHICH ultimately becomes a table back in Arcl.

AndrewHargreaves
Occasional Contributor II

My Code is below - but it's failing on the np.arrange line.

What has me confused is you don't switch your array back to a FeatureClass until line 20. So, at line 19 you're still working with an array which has no field names - so how does specifying "sort_id" in line 19 of your code populate that field?

        sort_field = 'CONSUMPTION'

        order_field = 'RANK'

        rankfs = arcpy.FeatureSet()

        arr = arcpy.da.FeatureClassToNumPyArray(fs,"*")  #Create a NumPy Array

        arr_sort = np.sort(arr, order='CONSUMPTION')  #Sort by Consumption values

        arr_sort_rev =arr_sort[::-1]  #Reverse it so Highest is top

        arr_sort_rev['RANK'] = np.arange( arr_sort_rev.size)

        arcpy.da.NumPyArrayToFeatureClass(arr_sort, rankfs)

DanPatterson_Retired
MVP Emeritus

Love a lesson

>>> import numpy as np
>>> 
>>> X = [342004.0, 342056.0, 341674.0, 341547.0, 341936.0]
>>> Y = [5023921.0, 5023947.0, 5023846.0, 5023635.0, 5023331.0]
>>> IDs = np.arange(5)
>>> Results = np.zeros((5,))  #just making a blank field in Numpy
>>> Results
array([ 0.,  0.,  0.,  0.,  0.])
>>> temp = [ (IDs[i],X[i],Y[i],Results[i]) for i in range(5)]
>>> dt_tmp = [('IDs','i4'),('X','<f8'), ('Y','<f8'),('Results','i4')]
>>> a = np.array(temp,dt_tmp)
>>> a
array([(0, 342004.0, 5023921.0, 0), (1, 342056.0, 5023947.0, 0),
       (2, 341674.0, 5023846.0, 0), (3, 341547.0, 5023635.0, 0),
       (4, 341936.0, 5023331.0, 0)], 
      dtype=[('IDs', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Results', '<i4')])
>>> fields = a.dtype.names
>>> fields
('IDs', 'X', 'Y', 'Results')
>>> 
>>> # now sort the table on the X field
>>> arr_sort = np.sort(a, order=['X']) # sort_fields is always a list, even if one
>>> arr_sort
array([(3, 341547.0, 5023635.0, 0), (2, 341674.0, 5023846.0, 0),
       (4, 341936.0, 5023331.0, 0), (0, 342004.0, 5023921.0, 0),
       (1, 342056.0, 5023947.0, 0)], 
      dtype=[('IDs', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Results', '<i4')])
>>> # now fill in the Results field in decending order
>>> 
>>> vals = np.arange(0,5)
>>> vals
array([0, 1, 2, 3, 4])
>>> vals = vals[::-1]  # reverse the sort...a trick
>>> vals
array([4, 3, 2, 1, 0])
>>> 
>>> # now set the Results field to the 'vals'
>>> arr_sort['Results'] = vals   #shove 'em in
>>> arr_sort
array([(3, 341547.0, 5023635.0, 4), (2, 341674.0, 5023846.0, 3),
       (4, 341936.0, 5023331.0, 2), (0, 342004.0, 5023921.0, 1),
       (1, 342056.0, 5023947.0, 0)], 
      dtype=[('IDs', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Results', '<i4')])
>>> for i in arr_sort:
...  print i
...  
(3, 341547.0, 5023635.0, 4)
(2, 341674.0, 5023846.0, 3)
(4, 341936.0, 5023331.0, 2)
(0, 342004.0, 5023921.0, 1)
(1, 342056.0, 5023947.0, 0)
>>>

Now you can send it back to arcmap.

AndrewHargreaves
Occasional Contributor II

Dan Patterson​...thanks for your persistence. I have a confession - the issue wasn't your code.

When I followed @Kevin Hibma's advice to bring this in a FS I had forgotten to specify the required field I was trying to push the results into ('Results' in line 38 of your example above). The good news is, it all works great now! The bad news is that my ranking begins at 0 , not 1....is there a sneaky way to achieve that on the fly? Or must I specify a 'start' and 'stop' value like you do in line 30 of your code above? (I'm trying to avoid hard coding 'stop' values in so this would work with different data sets)

DanPatterson_Retired
MVP Emeritus

of course

>>> import numpy as np
>>> np.arange(5)
array([0, 1, 2, 3, 4])
>>> np.arange(1,6)
array([1, 2, 3, 4, 5])
>>> np.arange(4,-1,-1)
array([4, 3, 2, 1, 0])
>>> np.arange(0,10,2)
array([0, 2, 4, 6, 8])
>>>

lots of options

About the Author
Retired Geomatics Instructor at Carleton University. I am a forum MVP and Moderator. Current interests focus on python-based integration in GIS. See... Py... blog, my GeoNet blog...
Labels