Save spatially dataframe back to the original feature class?

davedoesgis · ‎05-06-2020

I created a spatially enabled dataframe from a FGDB feature class using pandas.DataFrame.spatial.from_featureclass(). After that, my code is doing (non-spatial) field calculations with Pandas, but the structure of the data doesn't fundamentally change. It is easier to do that way than jamming a complex Python code block into the field calculator.

My question is - Once I'm done with what I need to do, is there a way to save the result back to the original feature class? The ArcGIS API for Python docs have a section on saving the spatially enabled dataframe. There are instructions on saving to a FGDB feature class, but the options are really more like export (or save as) than save. I'd prefer to just keep working in the same feature class I started in.

Right now, I'm thinking the easiest way is to convert to numpy and use arcpy.da.ExtendTable. Is there a more straightforward way to do this?

DanPatterson_Retired · ‎05-06-2020

Your last sentence is the easiest for all featureclass geometries and the only way for poly* features. Just make sure you include the matching ID field (eg OBJECTID) to enable the join when saving the structured np array.

an oldish missive

/blogs/dan_patterson/2017/11/01/extendtable-permanent-joins-arcpy-and-numpy-play-nice

View solution in original post

DanPatterson_Retired · ‎05-06-2020

Your last sentence is the easiest for all featureclass geometries and the only way for poly* features. Just make sure you include the matching ID field (eg OBJECTID) to enable the join when saving the structured np array.

an oldish missive

/blogs/dan_patterson/2017/11/01/extendtable-permanent-joins-arcpy-and-numpy-play-nice

davedoesgis · ‎05-06-2020

Thanks, Dan Patterson‌. If anyone at Esri is listening, I really like how seamless it is to go from ArcGIS to Pandas, now we're just missing the round trip functionality. I know we can save as, i.e.: export, but I'd like to just save back to the original feature class. This would be especially helpful, since numpy doesn't handle NULL values.

Just to add some closure, here is the code I used.

Assumes you have a spatially enabled dataframe named 'sdf3'
I stripped out all the fields except 'GEOID', which is a unique identifier, and the new fields I created. As Dan noted, the OBJECTID will work for this if you don't have a unique field.

import numpy as np
numpy_array = np.array(np.rec.fromrecords(sdf3.values))
names = sdf3.dtypes.index.tolist()
numpy_array.dtype.names = tuple(names)
arcpy.da.ExtendTable(<feature_class>, 'GEOID', numpy_array, 'GEOID', 
                     append_only=False)‍‍‍‍‍‍‍‍‍‍‍‍

DanPatterson_Retired · ‎05-06-2020

David... to give you some ideas on how I handle nulls.

For float/double np.nan is fine. For string, None. For time, np.NaT

Sadly, no one can agree on nulls for integers (eg. there is no np.nint )

# Change FC <null> to a useful nodata value
def make_nulls(in_fc, include_oid=True, int_null=-999):
    """Return null values for a list of fields objects.

    This excludes objectid and geometry related fields.
    Throw in whatever else you want.

    Parameters
    ----------
    in_fc : featureclass or featureclass table
        Uses arcpy.ListFields to get a list of featureclass/table fields.
    int_null : integer
        A default to use for integer nulls since there is no ``nan`` equivalent
        Other options include

    >>> np.iinfo(np.int32).min # -2147483648
    >>> np.iinfo(np.int16).min # -32768
    >>> np.iinfo(np.int8).min  # -128

    >>> [i for i in cur.__iter__()]
    >>> [[j if j else -999 for j in i] for i in cur.__iter__() ]
    """
    nulls = {'Double': np.nan, 'Single': np.nan, 'Float': np.nan,
             'Short': int_null, 'SmallInteger': int_null, 'Long': int_null,
             'Integer': int_null, 'String': str(None), 'Text': str(None),
             'Date': np.datetime64('NaT'), 'Geometry': np.nan}
    #
    desc = Describe(in_fc)
    if desc['dataType'] not in ('FeatureClass', 'Table'):
        print("Only Featureclasses and tables are supported")
        return None, None
    in_flds = desc['fields']
    good = [f for f in in_flds if f.editable and f.type != 'Geometry']
    fld_dict = {f.name: f.type for f in good}
    fld_names = list(fld_dict.keys())
    null_dict = {f: nulls[fld_dict[f]] for f in fld_names}
    # ---- insert the OBJECTID field
    if include_oid and desc['hasOID']:
        oid_name = 'OID@'  # desc['OIDFieldName']
        oi = {oid_name: -999}
        null_dict = dict(list(oi.items()) + list(null_dict.items()))
        fld_names.insert(0, oid_name)
    return null_dict, fld_names‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

davedoesgis · ‎05-07-2020

Dan Patterson‌ - The reason I'm doing this via Pandas is that the field calculator's code block is pretty unwieldy for complex calculations. But even for simple things, the field calculator can be really slow. I don't have so much experience with the update cursor, but my general impression is that it's slow, too. Granted, it's file-based data, but even with indexing, it takes at least 100x what I would expect a SQL database to do the same non-spatial field calculations.

I'm seeing decent performance on exporting to Pandas and processing the data, but ExtendTable can be super slow to get the results back into ArcGIS.

Have you ever compared the speed of the field calculator or an update cursor to a round trip from ArcGIS to numpy/pandas?