How to convert a python "collection" to a gdb table? (1-to-many)

1113
9
Jump to solution
07-25-2017 07:10 PM
RebeccaStrauch__GISP
MVP Esteemed Contributor

I have a attribute table that contains the results of a SpatialJoin giving me multiple records for each of my target polygon.

I'm trying to convert the above to a one-to-many table; one record for each unique HuntID w/a field listing all the uniqid.e.g.

   'RL450',   '50, 57, 70, 71, 81, 175, 177'

   'DM448,  '72, 76, ...'

   etc. 

Actual formatting (i.e. comma or space separated, quotes, etc, ) is yet tbd and not important at this point

I've done quite a bit of searching and have found two different simple functions to give me unique values for one field, and and other method using collections for giving me the 1-to-many "collection", but I haven't been able to figure out a simple way to get this back into a table. 

I think I am missing something obvious, and although I have done quite a few searchs and looked thru many of the  /blogs/dan_patterson/2016/05/09/the-links?sr=search&searchId=ceb1d9a4-61d3-4411-b4b9-2e06544edf3c&se...‌  I haven't figure it out.

#  the various snippets I've narrowed it down to so far
import arcpy
import os
import numpy

inDS = r'Database Connections\__me@wcgis_test2.sde\sde_wcgis_test2.DBO.FandA'
inDB = r'Database Connections\__me@wcgis_test2.sde'
inFC = r'Database Connections\__me@wcgis_test2.sde\sde_wcgis_test2.DBO.FandA\sde_wcgis_test2.DBO.SJ_hunt2flat'

outTable = r'Database Connections\__me@wcgis_test2.sde\testtable'

huntFld = 'HuntID'
uniqfld = 'uniqid'

#  Function to output a list of of unique values for a field
def unique_values(table, field):  
    with arcpy.da.SearchCursor(table, [field]) as cursor:
        return sorted({row[0] for row in cursor})
        
#  Function to output a array of unique values for a field
def unique_values_2(table, field):
    data = arcpy.da.TableToNumPyArray(table, [field])
    return numpy.unique(data[field])

huntList = unique_values(inFC, huntFld)       
huntList2 = unique_values_2(inFC, huntFld)

# ###  Collection portion
import collections
uniqvals = collections.defaultdict(list)

# output a collection of unique Field1 and the its unique Field2 values
for field1, field2 in arcpy.da.SearchCursor(inFC, (huntFld, uniqfld)):
    uniqvals[field1].append(field2)

# sample output
test = uniqvals.items()[95]
test
# (u'DM448', [72, 76, 77, 78, 79, 80, 81, 83, 86, 87, 88, 89, 90, 91, 92, 93, 105, 112, 116, 122])
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

BTW - I have tried various combinations of arcpy summary, frequency, pivot tables, Pro's (yes Pro) "summarize within" -- instead of SpatialJoin, but open to other suggestions (although this python solution will be fast enough, I believe). 

Thanks

This seems right up your alley Dan Patterson‌ 

Tags (3)
0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

Although I work with 9.7. itertools — Functions creating iterators for efficient looping — Python 2.7.13 documentation quite a bit, and hence would lean towards an itertools.groupby solution, I think the best approach is using a pandas data frame now that pandas is bundled with the ArcGIS Desktop Python distribution:

import pandas
tbl = # path to spatial join table
csv = # path to output csv

idx, rollup = "HuntID", "uniqid"

df = pandas.DataFrame.from_records(
    arcpy.da.TableToNumPyArray(tbl, [idx, rollup])
)
df[rollup] = df[rollup].apply(str)
df.groupby(idx)[rollup].apply(lambda x: ", ".join(x)).to_csv(csv, header=[rollup])

View solution in original post

0 Kudos
9 Replies
JoshuaBixby
MVP Esteemed Contributor

It seems like your current approach works, you just need to convert the list of items to a string.  What about join to create the string field you are looking for:

>>> l = [72, 76, 77, 78, 79, 80]
>>> ", ".join(str(i) for i in l)
'72, 76, 77, 78, 79, 80'
>>> 
RebeccaStrauch__GISP
MVP Esteemed Contributor

Thanks Joshua.  I may go that route, although I was hoping for a NumpyArrayToTable type solution, but with my initial test of that it didn't like the format of the array.  Thought maybe, since it was a nice function to create my collection, maybe there was a simple function to do the last step.  My head was spinning by the end of the day yesterday, but I might give it another shot/look.  Although Domenico's SQL solution might also work for my purposes. 

0 Kudos
nicogis
MVP Frequent Contributor

It isn't in python but you can also use sql creating a view.

See for example in sql server : https://www.mssqltips.com/sqlservertip/2914/rolling-up-multiple-rows-into-a-single-row-and-column-fo... 

RebeccaStrauch__GISP
MVP Esteemed Contributor

Thanks Domenico.  This may work for my needs.  I'm working on a very small subset right now for testing, and I think a pure SQL process might work in this case to get my table.  Then I can read it back into my gdb if needed.  I'll take a look at the sample and give it a try.

0 Kudos
RebeccaStrauch__GISP
MVP Esteemed Contributor

Not perfect, and not in a GDB attribute table, but I was able to get the "rolled up" list to a comma delimited file with two fields, one with my key field (huntID) and the other with a list of the unique polygon id's as a string.  For the purposes of this workflow development and test, I think this will work.

The issue with getting it into a GDB attribute file is the length of the polygon ID list.  Right now I have a list of integers, but in production, the id's will be much larger and possibly using the list of globids.  In that case, I may revisit Domenico Ciavarella 's SQL sample and/or get the list into a variable length field.

Although I did not use the SQL solution, it did give me the "roll-up" term which I then searched and found Advice on "roll up" / consolidation tool   which might be helpful for others.  So between that post, and the advice from both Domenico and Joshua Bixby , I decided the fastest solution was writing the output of collection.defaultdict to a .csv with a modified version of Joshua's formatting string. 

Sample output:

Here is the code I used to produce that,

import arcpy
import os
import collections 
uniqvals = collections.defaultdict(list)

theWorkspace = r"c:\__temp"    
arcpy.env.workspace = theWorkspace
arcpy.env.overwriteOutput = True

inFC = r'Database Connections\__me@wcgis_test2.sde\sde_wcgis_test2.DBO.FandA\sde_wcgis_test2.DBO.SJ_hunt2flat'


for field1, field2 in arcpy.da.SearchCursor(inFC, (huntFld, uniqfld)):
    uniqvals[field1].append(field2)
    #print("hunt {0} poly {1}".format(field1, field2))

outFile = "hunt2poly" 
outFileCSV = os.path.join(theWorkspace, ("{0}.csv".format(outFile)))

# open the file, verify it opened, and write the field/header names
csvFile = open(outFileCSV, 'w')
print(  "File {0} is open? {1}".format(str(outFileCSV), str(not csvFile.closed)))
csvFile.write("huntID, polys\n")

for hunt, plist in uniqvals.items():
    csvFile.write('{0},"{1}"\n'.format(hunt, ", ".join(str(pID) for pID in plist)))

# close the file and verify it closed properly
csvFile.close()
print(  "File {0} is open? {1}".format(str(outFileCSV), str(not csvFile.closed)))

I'm marking this comment as correct to close the post and hope it may help others.

JoshuaBixby
MVP Esteemed Contributor

Although I work with 9.7. itertools — Functions creating iterators for efficient looping — Python 2.7.13 documentation quite a bit, and hence would lean towards an itertools.groupby solution, I think the best approach is using a pandas data frame now that pandas is bundled with the ArcGIS Desktop Python distribution:

import pandas
tbl = # path to spatial join table
csv = # path to output csv

idx, rollup = "HuntID", "uniqid"

df = pandas.DataFrame.from_records(
    arcpy.da.TableToNumPyArray(tbl, [idx, rollup])
)
df[rollup] = df[rollup].apply(str)
df.groupby(idx)[rollup].apply(lambda x: ", ".join(x)).to_csv(csv, header=[rollup])
0 Kudos
RebeccaStrauch__GISP
MVP Esteemed Contributor

Thanks bixb0012 .  I'll switch the "correct answer" to you, so others will know that it works and is a much better way to go, assuming two things: 

  1. you are using a version where Pandas is included like 10.5.x (not included with 10.3.1) or you download/install manually, and
  2. you have a GDB table that is compatible with that version.  (i.e. SDE for SQL 2012+   or a fgdb)

Not only is your code more efficient, the output has both the HuntID and the uniqid polys within each sorted, my code did not.

Unfortunately, I'm still using 10.3.1 and SQL 2008 so the initial test would not work.  However, I did test on a ArcCatalog 10.5.1 install with my table copied to a fgdb, and it worked slick.  So, I'll keep this solution for the future since we are looking to upgrade soon(er or later) and can definitely use it for my prototype testing!

Thanks for taking the time!

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Most of the SciPy stack, including NumPy and Pandas, started getting bundled with ArcGIS 10.4.  Esri initially trumpeted SciPy modules would be bundled with 10.3.1, but it ended up not making that release.

0 Kudos
RebeccaStrauch__GISP
MVP Esteemed Contributor

But wouldn't you know, after I got all this figured out (which was an interesting exercise) the format they really wanted was what already had and was using as the input to the above.  So, good info to know, but ended up being a moot point (although I did prove that this is not a good option in the long run). 

0 Kudos