Join a dataframe to layer

511
3
11-03-2022 12:49 PM
WillemVan_Riet1
New Contributor III

Hello all,

I will try to summarize this as best as possible. I am essentially looking to join survey data to a shapefile! shouldn't be so hard..but I have been left stumped as to why the Join tool is leading me to duplicated rows. I've tried Dissolving the data on the repeated values (such as Shape_Area) but I lose all the columns from the survey data in the output - did another join to the newly dissolved output and again missing data. 

For reference, the shapefile row count is 104 and the csv of the survey data is 178 rows. So essentially I want to do a 1-many match on the unique identifier, in my example which is responde (essentially respondent id). I thought about maybe doing this in Pandas but it doesn't look like arcpy and pandas are friends yet but I found one of @DanPatterson's suggestions here

Following his instructions I think the arcpy.da.ExtendTable function could work well however, I'm not well versed with numpy and I'm having problems setting up my column names, or as numpy calls them "dtype". 

The example ESRI provides is pretty basic with just three values in the array, whereas my csv is 178 rows with quite a lot of data. I have been trying to pass on the dtype names + format with no success.

ESRI Example:

WillemVan_Riet1_0-1667504007537.png

This is my code 

 

# df = the csv of survey data

df2 = df[['respondent,C,254', 'category', 'value', 'value_cat', 'unique_ID']]
array = df2.to_records()
print(array)

 

WillemVan_Riet1_1-1667504178320.png

 

# apply the dtype attribute
result = df2.dtypes

print("Output:")
print(result)

 

 

WillemVan_Riet1_2-1667504220944.png

 

# try to create a dictionary of my column names this step may be redundant because I have the result variable above

# dts = {'names': ('Unnamed: 0','respondent,C,254','category', 'value', 'value_cat', 'unique_ID'),
#        'formats':(numpy.int64, numpy.object, numpy.object, numpy.object, numpy.object,
#                  numpy.int64)}
array_table = numpy.rec.fromrecords(array, dtype=dts)

# Convert array to a geodatabase table
table = filepath + '\\table_test'
#check that this is a valid filepath
print(table)

arcpy.da.NumPyArrayToTable(array_table, table)

# Append the array to an existing table on responde (shp) to respondent,c,254(csv)
arcpy.da.ExtendTable(cultural_shp, "responde", array, "respondent,C,254")

 

These are my errors - have I not formatted my array correctly? 

WillemVan_Riet1_3-1667504699617.png

Many thanks in advance!

 

0 Kudos
3 Replies
DanPatterson
MVP Esteemed Contributor
ExtendTable (in_table, table_match_field, in_array, array_match_field, {append_only})

 Which would be

 

arcpy.da.ExtendTable(cultural_shp, "responde", array, "respondent")

 

If the field names are correct and the featureclass name reference is correct.

 

 


... sort of retired...
0 Kudos
WillemVan_Riet1
New Contributor III

Hi Dan, 

Appreciate your prompt response!

Still getting an error - have I not called the correct fieldname in the array? Or is it some sort of type error for the fieldname? The fieldname I am looking to join to the shapefile is called "respondent,C,254" (name has been copied from the original csv).

Code attached and errors + outputs below in the screenshot

 

 

# try to create a dictionary of my column names
dts = {'names': ('Unnamed: 0','respondent,C,254','category', 'value', 'value_cat', 'unique_ID'),
       'formats':(numpy.int64, numpy.object, numpy.object, numpy.object, numpy.object,
                 numpy.int64)}
array_table = numpy.rec.fromrecords(array, dtype=dts)

# Convert array to a geodatabase table
table = filepath + '\\table_test'
arcpy.da.NumPyArrayToTable(array_table, table)

# Append the array to an existing table on responde (shp) to respondent,c,254(csv)
# Dan's comment: ExtendTable (in_table, table_match_field, in_array, array_match_field, {append_only})

arcpy.da.ExtendTable(cultural_shp, "responde", array_table, "respondent,C,254")

 

 


WillemVan_Riet1_0-1667519776066.png

 

0 Kudos
DanPatterson
MVP Esteemed Contributor

The warning is just that, you don't use numpy.object any more 'O' or object will suffice.

Your field names are horrible, but workable, BUT they won't be able to be used in extendtable because there is punctuation in them.  Check the actual data, it should be a list of tuples

# ---- with horrible field names
# ---- Note the method I use to create dtypes

np.asarray([(1, "test")], dtype=[('Unnamed: 0', 'i8'), ('respondent,C,254', 'O')])
array([(1, 'test')], dtype=[('Unnamed: 0', '<i8'), ('respondent,C,254', 'O')])

# ---- better
np.asarray([(1, "test")], dtype=[('Unnamed', 'i8'), ('respondent', 'O')])
array([(1, 'test')], dtype=[('Unnamed', '<i8'), ('respondent', 'O')])

... sort of retired...
0 Kudos