Query and summarize revisited...

DanPatterson_Retired · ‎02-03-2016

---- Topic: Query and summarize revisited ----

This post https://community.esri.com/thread/171629 brought up the sometimes difficult task of finding information within tabular data. As usual, there are many ways to do the same thing... this is but one.

Pre-tasks: Use TableToNumPyArray to convert your tabular information into an array. It is pretty fool-proof

TableToNumPyArray—Help | ArcGIS for Desktop

The trick to remember is that each record in an array is treated as unique unless found otherwise. This makes it particularly easy to sort records on multiple columns and summarize and/or extract what you need from within. In this example, I concentrated on printing.

I used a simple list comprehension to show how to pull the records out according to the first column which we can treat as a class field. Once the classes were determined the list comprehension grouped the data into the classes and each class could have further information extracted/summarized.

The output produced can be expanded upon. Should want to partition the data into separate datasets, you can do it while it is into array form. Hope you get some ideas on breaking down problems into its constituent parts and using the tools available to you. For ArcGIS Pro, the new python suite, contains Pandas which provides an other alternate to the same process.

I am also sure that someone can come up with an sql statement that does some...but not all of the tasks outlined here.

Summarizing data

>>> # ---- simplify a step or two ----
>>> # - The data... just in array format
>>> 
>>> arr_data = [('a', 50, 4), ('c', 20, 1),
                ('a', 15, 5), ('e', 40, 4), 
                ('a', 35, 2),('b', 100, 5),
                ('c', 80, 3), ('d', 100, 3), ('e', 60, 2)]
>>> dt =[('col_0', np.str, 5),
         ('col_1','<i4'),
         ('col_2','<i4')]
>>> a = np.array(arr_data, dtype=dt)
>>> a.reshape((-1,1))
array([[('a', 50, 4)],
       [('c', 20, 1)],
       [('a', 15, 5)],
       [('e', 40, 4)],
       [('a', 35, 2)],
       [('b', 100, 5)],
       [('c', 80, 3)],
       [('d', 100, 3)],
       [('e', 60, 2)]], 
      dtype=[('col_0', '<U5'),... ])
>>>
>>> uni_info = np.unique(a, True, True)
>>> vals, idx, inv = uni_info  
>>> # vals is the data,
>>> # the others are for reshaping the array    
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Now for some output

>>> vals.reshape((-1,1))
array([[('a', 15, 5)],
       [('a', 35, 2)],
       [('a', 50, 4)],
       [('b', 100, 5)],
       [('c', 20, 1)],
       [('c', 80, 3)],
       [('d', 100, 3)],
       [('e', 40, 4)],
       [('e', 60, 2)]], 
      dtype=[('col_0', ... ])
>>> # returns array(['a', 'b', 'c', 'd', 'e'],dtype='<U5') 
>>> uni = np.unique(vals['col_0'])      
>>> subs = [ vals[vals['col_0']==i] for i in uni ]
>>>
>>> for sub in subs:
...  n = len(sub['col_0'])
...  t = sub['col_0'][0]
...  val_max = np.max(sub['col_1'])
...  val_min = np.min(sub['col_1'])
...  frmt = "type {} -N: {} -max: {} -min: {}\n  -sub {}"
...  print(frmt.format(t, n, val_max, val_min, sub))
...  
type a -N: 3 -max: 50 -min: 15
  -sub [('a', 15, 5) ('a', 35, 2) ('a', 50, 4)]
type b -N: 1 -max: 100 -min: 100
  -sub [('b', 100, 5)]
type c -N: 2 -max: 80 -min: 20
  -sub [('c', 20, 1) ('c', 80, 3)]
type d -N: 1 -max: 100 -min: 100
  -sub [('d', 100, 3)]
type e -N: 2 -max: 60 -min: 40
  -sub [('e', 40, 4) ('e', 60, 2)]
>>>
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

NOTE:

A pdf version was added. It contains more commentary on the process. A second example was added on 2015-02-18