Join a dataframe to layer

WillemVan_Riet1 · ‎11-03-2022

Hello all,

I will try to summarize this as best as possible. I am essentially looking to join survey data to a shapefile! shouldn't be so hard..but I have been left stumped as to why the Join tool is leading me to duplicated rows. I've tried Dissolving the data on the repeated values (such as Shape_Area) but I lose all the columns from the survey data in the output - did another join to the newly dissolved output and again missing data.

For reference, the shapefile row count is 104 and the csv of the survey data is 178 rows. So essentially I want to do a 1-many match on the unique identifier, in my example which is responde (essentially respondent id). I thought about maybe doing this in Pandas but it doesn't look like arcpy and pandas are friends yet but I found one of @DanPatterson's suggestions here.

Following his instructions I think the arcpy.da.ExtendTable function could work well however, I'm not well versed with numpy and I'm having problems setting up my column names, or as numpy calls them "dtype".

The example ESRI provides is pretty basic with just three values in the array, whereas my csv is 178 rows with quite a lot of data. I have been trying to pass on the dtype names + format with no success.

ESRI Example:

This is my code

# df = the csv of survey data

df2 = df[['respondent,C,254', 'category', 'value', 'value_cat', 'unique_ID']]
array = df2.to_records()
print(array)

# apply the dtype attribute
result = df2.dtypes

print("Output:")
print(result)

# try to create a dictionary of my column names this step may be redundant because I have the result variable above

# dts = {'names': ('Unnamed: 0','respondent,C,254','category', 'value', 'value_cat', 'unique_ID'),
#        'formats':(numpy.int64, numpy.object, numpy.object, numpy.object, numpy.object,
#                  numpy.int64)}
array_table = numpy.rec.fromrecords(array, dtype=dts)

# Convert array to a geodatabase table
table = filepath + '\\table_test'
#check that this is a valid filepath
print(table)

arcpy.da.NumPyArrayToTable(array_table, table)

# Append the array to an existing table on responde (shp) to respondent,c,254(csv)
arcpy.da.ExtendTable(cultural_shp, "responde", array, "respondent,C,254")

These are my errors - have I not formatted my array correctly?

Many thanks in advance!

DanPatterson · ‎11-03-2022

ExtendTable (in_table, table_match_field, in_array, array_match_field, {append_only})

Which would be

arcpy.da.ExtendTable(cultural_shp, "responde", array, "respondent")

If the field names are correct and the featureclass name reference is correct.

... sort of retired...

WillemVan_Riet1 · ‎11-03-2022

Hi Dan,

Appreciate your prompt response!

Still getting an error - have I not called the correct fieldname in the array? Or is it some sort of type error for the fieldname? The fieldname I am looking to join to the shapefile is called "respondent,C,254" (name has been copied from the original csv).

Code attached and errors + outputs below in the screenshot

# try to create a dictionary of my column names
dts = {'names': ('Unnamed: 0','respondent,C,254','category', 'value', 'value_cat', 'unique_ID'),
       'formats':(numpy.int64, numpy.object, numpy.object, numpy.object, numpy.object,
                 numpy.int64)}
array_table = numpy.rec.fromrecords(array, dtype=dts)

# Convert array to a geodatabase table
table = filepath + '\\table_test'
arcpy.da.NumPyArrayToTable(array_table, table)

# Append the array to an existing table on responde (shp) to respondent,c,254(csv)
# Dan's comment: ExtendTable (in_table, table_match_field, in_array, array_match_field, {append_only})

arcpy.da.ExtendTable(cultural_shp, "responde", array_table, "respondent,C,254")

DanPatterson · ‎11-03-2022

The warning is just that, you don't use numpy.object any more 'O' or object will suffice.

Your field names are horrible, but workable, BUT they won't be able to be used in extendtable because there is punctuation in them. Check the actual data, it should be a list of tuples

# ---- with horrible field names
# ---- Note the method I use to create dtypes

np.asarray([(1, "test")], dtype=[('Unnamed: 0', 'i8'), ('respondent,C,254', 'O')])
array([(1, 'test')], dtype=[('Unnamed: 0', '<i8'), ('respondent,C,254', 'O')])

# ---- better
np.asarray([(1, "test")], dtype=[('Unnamed', 'i8'), ('respondent', 'O')])
array([(1, 'test')], dtype=[('Unnamed', '<i8'), ('respondent', 'O')])

... sort of retired...