When I perform a merge on two spatial dataframes, the SHAPE field converts to "object". Is there a way to re-set one of the SHAPE fields as a geometry field after the merge, or is there a way to maintain the SHAPE field as geometry?
e.g.
def overlapRows(in_sdf, master_sdf, key):
"""
Merge new attributes and point location with old attributes that aren't in the new dataset.
Includes new rows.
"""
join_sdf = pd.merge(left=in_sdf, right=master_sdf , how='outer', on=key, indicator=True)
print("Number of all records: ", len(join_sdf))
print("Number of new records: ", len(join_sdf[join_sdf['_merge']=='left_only']))
# if there's a duplicate, remove old attributes (y) and keep new attributes (x)
for fd in list(join_sdf.columns.values):
if fd.endswith("_x"):
join_sdf.rename(columns = {fd : fd.replace("_x", "")}, inplace=True)
elif fd.endswith("_y"):
del join_sdf[fd]
return(join_sdf)
mergeRows = overlapRows(dwhPt, dwhPl, 'prj_id')
print(mergeRows['SHAPE'])
Out:
1331 {'x': 517459.9827999994, 'y': -1082018.6127000...
1332 {'x': 524068.436499998, 'y': -998965.120000000...
1333 {'x': 600713.8193000033, 'y': -986437.45459999...
Name: SHAPE, dtype: object
Based on your code, you're keeping all of the in_sdf attributes no matter what, whether they're updates to the master_sdf or new additions. If that's the case, there's really no need to merge any columns besides your key.
join_sdf = in_sdf.merge(master_sdf[[key]], how=indicator, on=key, indicator=True)
All the input columns from in_sdf will be unchanged, including the geometry, and the key, being used in the join, will not have any suffixes to bother with.
EDIT: I was re-reading your post a bit more carefully. Can you clarify what the ultimate goal for this script is? You mention that the master_sdf may have data not present in in_sdf. How does your process preserve the data? The entire dataframe is going to get "some_column_x" and "some_column_y" columns, not just those with duplicates. Removing the "_y" columns will drop those items only in master_sdf in that situation.