How can you calculate Q1,Q2,Q3, and IQR using ESRI?

DanielAmrine · ‎03-18-2016

I hope the answer does not require Geostatistical Analyst!

To calculate these statistical values you have to first sort the values by ascending order.

Q2 is the mean or the total number of values divided by two and it gives you the position of the value in terms of n. so if you had 100 values then median is the (value at position 50) plus the (value at position 51) divided by 2. If you had 101 values the median would be the value at position 51.

I have several hundred shape files with values that I want to apply this calculation to.

I would like to use a combination of python and model builder to calculate these, calculate and upper and lower "fence" and then find the values that are above and below these "fences".

I had the work flow all worked out! until I tried to create a new sequential number field based on the ascending sort.

I found this code:

def autoIncrement(start=0,step=1):
    i=start
    while 1:
        yield i
        i+=step
incrementCursor = arcpy.UpdateCursor(table_name) #There is no guarantee of order here
incrementer = autoIncrement(10,2)
for row in incrementCursor:
    row.setValue(field, incrementer.next()) #Note use of next method
    incrementCursor.updateRow(row)

I changed table_name to "Name of shape file" with target field

I changed field to 'name of the field' in said shape file

selected the column in the open attribute table

When I pressed enter at the end of (row) it just went down one line..I pressed it again

and it went back to >>>

none of the rows were updated. I thought the cursor is what you have selected in your map document.

so I need to

1. Tell each shape file to sort the target values in ascending order (used sort tool with iterater in model builder)

2. create a script that populates a new field with sequential numbers based on the new sort.

Any help on this is much appreciated!

Dan

DanPatterson_Retired · ‎03-18-2016

the trick is to read the column of data, like Darren Wiens pointed out in his searchcursor approach (you can use plain NumPy is you are using python 2.7 or SciPy if you have python 3.4 installed) .

In pure python/nNumpy/SciPy there is FeatureClassToNumPy or TableToNumPy array method ... just specifying the FC or tbl that you are using, the field or fields that you want. It is read into memory and you process it in numpy or scipy (scipy isn't needed for most things).... then you do one of two things.

using Darren's approach, you would have to read the field in, do the processing, then use a search cursor to write the results out.
using the numpy approach, you use NumPyArrayToTable to make a table of the results which you join back to the original FC or table using the OBJECTID or FID field as the key field.

Comparison

option 1 is slower since you are reading the table twice... no biggy perhaps
option 2, the results are joined, hence not permanent, but .... of course.... arcpy has a method to join a numpy table results to a FC or shapefile.... alternately you just make the join permanent, save to a new FC or table

My preference is to use the join approach. If I want to skip the join and make a permanent FC or table, I just read the whole FC/table in, specify the field for processing, do the work, then send it back out using NumPyArrayToFeatureclass or NumPyArrayToTable..

I haven't used a searchcursor, updatecursor or insertcursor in quite a while.

So the choice is yours, sorting data on the OIDNAME or FID name is no sweat in numpy (or SciPy), so all your work can be unravelled back to the original order. I use this approach regularly to produce shapefiles sorted by geometric properties for some of the work I do.

The tools are there, esri has silently without fanfare given access to do a lot of things in vector and raster world that require a higher license level simply because you can do the work in numpy or SciPy and bring it back into ArcMap. The nice thing is, once you have an array, you can work in a variety of other environments or languages which offer different capabilities.

So the choice is yours... you can use a maximized combination of arcpy and numpy (ie via cursors) or a minimized combination of arcpy/numpy with most of the emphasis on numpy

Good luck... have a look at Darren's approach should you have immediate needs.

PS

NumPy Repository is a place I house musings and works related to array work. If you have a background already in that kind of work, it may be of interest to you at some stage (it assumes a certain level of python familiarity and works presented require python 3.4+... which you can get by installing ArcGIS PRO or by using an Anaconda distribution which includes the whole stack)

View solution in original post

DanPatterson_Retired · ‎03-18-2016

Sort—Help | ArcGIS for Desktop with an advanced license, produces a new output file which I am sure you don't want.

Summary Statistics—Help | ArcGIS for Desktop doesn't include the median let alone the Q's

Field Statistics has some code you can steal to at least get the median and the sorting, but it doesn't produce a field

There are other means, with builtin tools... but it would then require you to do

read the column
calculate the values
reclassify the initial column into the Q's

>>> p_75 = np.percentile(a,75.0,axis=0)
>>> p_50 = np.percentile(a,50.0,axis=0)
>>> p_25 = np.percentile(a,25.0,axis=0)
>>> a
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> p_25
3.25
>>> p50
Traceback (most recent call last):
  File "<string>", line 1, in <module>
NameError: name 'p50' is not defined
>>> p_50
5.5
>>> p_75
7.75

I have some reclassification code, or I am sure you can emulate it.

So the big problem is, you need to read the column via searchcursor use numpy (which you have ) to determine Q25, Q50 (median) and Q75, then... reclass the initial values into a new column

DanielAmrine · ‎03-18-2016

Dan Patterson,

This is very helpful in knowing there are tools to calculate these! I guess the next step is how do you take the values from a column in the attribute table and write them to the format you have shown [n, n+1, etc...] is this an array?

I did find some code that wrote the values of two fields in a dictionary and then you could do a "vlookup" kind of deal (drawing from EXCEL) to insert the value of one field and have it print the value of the "lookup field".

This still doesn't solve how to add an numbered "ID" field based on the ascending sort of the values.

For example I could use model builder to get the row count of each shape file...and then use that to calculate the Q1, Q2, and Q3...and then with the dictionaries I could use python to use the calculated values to pull the appropriate value from the target field.

in a shape file with 380 records the Q2 or P_50 would equal (n+1)*0.5 or 190.5 so I would have to pull the target value at 190 and add it to the target value at 191 and divide by 2. This would give me the median target value. You round down and take the next value when n is even.

That's essentially the workflow and I could do it in Excel pretty quickly but I'd have to do it for each individual shape file after exporting each individual table!

so it all boils down to having the target value field and then another field with a sequential number field based on the ascending sort.

Unfortunately ESRI made everything dependent on the "FID" field so all of their autoInctrements are based on this field regardless of how the table is sorted.

Dan Amrine

DanPatterson_Retired · ‎03-18-2016