How to maintain geometry when merging sdfs?

JaneSkillman · ‎11-02-2020

When I perform a merge on two spatial dataframes, the SHAPE field converts to "object". Is there a way to re-set one of the SHAPE fields as a geometry field after the merge, or is there a way to maintain the SHAPE field as geometry?

e.g.

def overlapRows(in_sdf, master_sdf, key):
    """
    Merge new attributes and point location with old attributes that aren't in the new dataset.
    Includes new rows.
    """
    
    join_sdf = pd.merge(left=in_sdf, right=master_sdf , how='outer', on=key, indicator=True)

    print("Number of all records: ", len(join_sdf))
    print("Number of new records: ", len(join_sdf[join_sdf['_merge']=='left_only']))
    
    # if there's a duplicate, remove old attributes (y) and keep new attributes (x)
    for fd in list(join_sdf.columns.values):
        if fd.endswith("_x"):
            join_sdf.rename(columns = {fd : fd.replace("_x", "")}, inplace=True)
        elif fd.endswith("_y"):
            del join_sdf[fd]

    return(join_sdf)

mergeRows = overlapRows(dwhPt, dwhPl, 'prj_id')‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

print(mergeRows['SHAPE'])

Out:

1331    {'x': 517459.9827999994, 'y': -1082018.6127000...
1332    {'x': 524068.436499998, 'y': -998965.120000000...
1333    {'x': 600713.8193000033, 'y': -986437.45459999...
Name: SHAPE, dtype: object

‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

jcarlson · ‎06-02-2021

Based on your code, you're keeping all of the in_sdf attributes no matter what, whether they're updates to the master_sdf or new additions. If that's the case, there's really no need to merge any columns besides your key.

join_sdf = in_sdf.merge(master_sdf[[key]], how=indicator, on=key, indicator=True)

All the input columns from in_sdf will be unchanged, including the geometry, and the key, being used in the join, will not have any suffixes to bother with.

EDIT: I was re-reading your post a bit more carefully. Can you clarify what the ultimate goal for this script is? You mention that the master_sdf may have data not present in in_sdf. How does your process preserve the data? The entire dataframe is going to get "some_column_x" and "some_column_y" columns, not just those with duplicates. Removing the "_y" columns will drop those items only in master_sdf in that situation.

- Josh Carlson
Kendall County GIS