# Numpy Pivot: Summary Statistics Results

4824
11
05-06-2016 02:41 AM Regular Contributor

!I recently posted the following discussion Convert Summary Table Structure (Python)​ which I thought I'd resolved with the help from Dan Patterson​. Please refer to my previous post for the explanation and the reasoning behind the following. Where I'm currently at is that I have been able to pivot the column Time and populate it with the values from the statistics count results.

The last hurdle that I'm trying to achieve is to merge the results so that there isn't a new record for each time interval as shown below: Incorrect Pivot Results: Correct Pivot Results:

```'''
Created on April 6, 2016

Summarise Number of Buildings

per Time Interval

(5, 10, 15, 25, 30, 60)

@author: PeterW
'''
# import site-packages and modules
from pathlib import Path
import numpy.lib.recfunctions as rfn
import pandas as pd  # Pandas version 0.13.0
import arcpy

# set arguments
saa_stats_table = r"E:\Projects\2016\G112224\Models\Schools\Schools_Combined_160505.gdb\Botrivier_Prim_SAA_Stats"

# environment settings
arcpy.env.overwriteOutput = True
fgdb = Path(saa_stats_table).parents

def pivot_table(saa_stats_table, fgdb):
fields = [f.name for f in arcpy.ListFields(saa_stats_table)]
table_recarray = arcpy.da.TableToNumPyArray(saa_stats_table, fields)  # @UndefinedVariable
print table_recarray
df = pd.DataFrame(table_recarray[fields])
pivot = df.pivot(index="OBJECTID",
columns="TIME",
values="FREQUENCY").fillna(0, downcast="infer")
pivot_fields = pivot.columns.values
# rename pivot fields with prefix "TIME"
pivot.columns = [("{0}{1}".format("TIME", field)) for field in pivot_fields]
# convert pandas dataframe to record array
pivot_recarray = pivot.to_records(index=False)
pivot_type = pivot_recarray.dtype.descr
pivot_type_new = [(x, x.replace(x, "<i2")) for x in pivot_type]
# change pivot record array data type to short integer
pivot_recarray = pivot_recarray.astype(pivot_type_new)
fields2 = ["TOWN", "SETTLEMENTNAME", "NAME"]
table_type_new = [(str(x), "<U25") for x in fields2]
# change table array data type to unicode 50 characters
table_recarray = table_recarray[fields2].astype(table_type_new)
recarray_list = [table_recarray, pivot_recarray]
# merge table and pivot record array
summary_table = str(Path(fgdb, "SAA_Stats_Test"))
# convert merged record array to file geodatabase table
if arcpy.Exists(summary_table):
arcpy.Delete_management(summary_table)
arcpy.da.NumPyArrayToTable(summary_array, summary_table)  # @UndefinedVariable
else:
arcpy.da.NumPyArrayToTable(summary_array, summary_table)  # @UndefinedVariable

pivot_table(saa_stats_table, fgdb)
#```

Python Code:

Any advice in how I can achieve the following. I suspect it would need to be done either before or after I have merged the two arrays (line 50). Any other alternative that won't require me having to change all my code would really be appreciated as time is a factor at the moment. I urgently need to get my results out.

NB. Please note I dont have an Advance Licence so Pivot_Table is not an option.

Thanks in advance. 11 Replies by MVP Esteemed Contributor

Peter, in Column C in the both images... they are different and the latter image has 2 unique rows, which is correct.  In order to consolidate and add the rows together column C must contain the same unique values so you are going to have to truncate Botrivier Prim to Botrivier, or append Prim to the second row.  The classification needs to be unique, not unique-ish Regular Contributor

Hi Dan

Column C would be the same. The reason that the second image is different was just to show that there could be values from Time5 - Time60 depending of the statistics results. The second record within the second image would have been four records within the first image. I've replaced the second image to show only one record and filled the empty columns to depict that the Time range of possible results depending on the statistics results. Regular Contributor III

If you are now getting figure 1 could you run Dissolve—Help | ArcGIS for Desktop  and sum fields D-J Regular Contributor

Hi Wes

The following is an output from Summary Statistics so you can't use Dissolve, thanks for the suggestion though.  Regular Contributor

I've found a temporary solution to the following, using Summary Statistics. I'd really appreciate any advice in how to alter my NumPy Function to get the results out within the correct format. The following will allow me to get my results out today. Thanks for the advice so far. by MVP Esteemed Contributor

Peter, check my documentation closely, I successfully got it to summarize within a time period, I don't know why yours is different, I haven't looked at the Pandas shroud over the base numpy implementation.  I will look through when I get a chance but there is definitely no need to summarize but for now that will have to be the work around.

Looking at your last example, the implementation I am referring to does a np.unique and np.where on whatever field, then sums for the remaining numeric columns (or specified numeric columns) by MVP Esteemed Contributor

Peter it is called Observation_Summary_1.pdf in Numpy Repository... I won't post the link here to save getting a whole bunch of "the link doesn't work" questions. Regular Contributor

Hi Dan

Thanks Dan, will have a look at the following. Regular Contributor

Hi Dan

I've used the code example within your "Observation_Summary_1.pdf" and understand the most of your code, but if you wouldn't mind unpacking from  Line 45 to 50. I'm unsure if line 45 is a list comprehension of some sort and not understanding the how its working.

```# coding: utf-8
"""
Dan Patterson:

Numpy Pivot Table Summary

16/05/2016

"""
# import site-packages and modules
import numpy as np
import arcpy

# set input summary table
input_table = r"E:\Projects\2016\G112224\Models\Schools\Schools_Combined_160505.gdb\De_Villiers_Graaff_Hs_SAA_Stats"
output_gdb = r"E:\Python\Testing\dan_patterson_numpy\SAA_Summary_Report_Testing.gdb"

# numpy pivot table function
def pivot_summary(input_table, output_gdb):
# convert summary table to structured numpy array
numpy_fields = ("OBJECTID", "TOWN", "SETTLEMENTNAME",
"NAME", "TIME", "FREQUENCY")
sum_array = arcpy.da.TableToNumPyArray(input_table, numpy_fields)  # @UndefinedVariable
# obtain unique records based on first three columns
unique_records = np.unique(sum_array[['TOWN', 'SETTLEMENTNAME', 'NAME']])
# number of unique rows
shp = len(unique_records)
# construct the output array
dt = [('TOWN', 'U20'), ('SETTLEMENTNAME', 'U20'), ('NAME', 'U20'),
('TIME5', np.int32), ('TIME10', np.int32), ('TIME15', np.int32),
('TIME20', np.int32), ('TIME25', np.int32), ('TIME30', np.int32),
('TIME60', np.int32)]
# populate array with zeros
pivot_array = np.zeros(shp, dtype=dt)
# assign the first three columns
pivot_array['TOWN'] = unique_records['TOWN']
# the values from the unique test
pivot_array['SETTLEMENTNAME'] = unique_records['SETTLEMENTNAME']
# everything is sorted
pivot_array['NAME'] = unique_records['NAME']
# loop through unique records array
for i in range(shp):
# pull out the rows that match
row_match = sum_array[sum_array[['TOWN', 'SETTLEMENTNAME',
'NAME']] == unique_records]
for j in range(len(row_match)):
column = 'TIME' + str(row_match['TIME'])
buildings = row_match['FREQUENCY']
pivot_array[column] = buildings
pivot_table = "{0}\\{1}".format(output_gdb, "Pivot_Table_Summary")
arcpy.da.NumPyArrayToTable(pivot_array, pivot_table)  # @UndefinedVariable
return pivot_table

pivot_table = pivot_summary(input_table, output_gdb)``` Thanks for your help Dan  