Add support for Pandas Dataframes to arcpy

7668
10
02-14-2018 02:12 PM
Status: Under Consideration
Labels (1)
martinschaefer1
New Contributor III

Pandas are a great asset for any data scientist. Data manipulation using pandas dataframes is powerful and easy. At the moment they only way to read feature class data into pandas for manipulation is using structured numpy arrays using arcpy.da.FeatureClassToNumPyArray, and then convert that to a dataframe. That is quite straightforward, but the reverse is more difficult due to the data types dataframes use. Strings are usually stored as objects, which arcpy.da.NumPyArrayToFeatureClass doesn't support. So each column's dtype has to be checked and converted if necessary.

It'd be great to have a arcpy.da.FeatureClassToDataFrame and arcpy.da.DataFrameToFeatureClass.

Tags (3)
10 Comments
BruceHarold

Hi, I'm no pandas expert (never used it!) but there is a simple example of creating a DataFrame from a cursor here:

Summary Statistics—Help | ArcGIS Desktop 

martinschaefer1

Cheers for the suggestion. However, creating DataFrames is not the issue. Creating a feature class from a DF is, due to the need to go via numpyarrays.

alex_friant

Have you tried the Spatially Enabled DataFrame? Introduction to the Spatially Enabled DataFrame | ArcGIS for Developers 

  1. Import necessary modules
  2. read from a feature class into a dataframe
  3. write from dataframe to feature class

screen shots from documentation

simoxu
by MVP Regular Contributor

When will this feature be released roughly? it will make jupyter notebook in ArcGIS Pro more useful for data analysis.

HannesZiegler
Status changed to: In Product Plan

Hello all,

Thank you for your interest and feedback! As pointed out by @alex_friant, it is already possible to convert a data frame to a featureclass and vice versa using the ArcGIS API for Python.

However, we are in the planning stages for the ArcGIS Pro 3.1 release and investigating supporting the use of an ArcGIS API for Python Spatially Enabled Data Frame (SEDF) directly as input in ArcPy. Please keep in mind that this does not guarantee that this feature will make it into the product, only that the possibility is currently being explored.

HannesZiegler
Status changed to: Under Consideration

We realize the ArcGIS Pro 3.1 has come and gone and there has been no update on this. While we missed the 3.1 release and are likely to miss the 3.2 release as well, this is still actively being pursued. To better reflect the current status on this idea, I am reneging on the "In Product Plan" status and setting that status back a notch to "Under Consideration". 

In the meantime, consider using Arrow Tables as the middleman. Arrow Tables can be read directly into Geoprocessing tools. 

With the sedf.spatial.to_arrow() method you can convert your SEDF to an arrow table that is compatible with Geoprocessing tools. For example, try this:

patable = sedf.spatial.to_arrow();  # geometry encoding is WKB
arcpy.management.CopyFeatures(patable, os.path.join(out_dir, 'new_fc'))

 

JulianaSpector1

@HannesZiegler I'm trying to convert a dataframe to an arrow table and then run a geoprocessing tool on the arrow table (converting the arrow table to a standalone table in the working geodatabase using the Export Table tool). However, I get a fatal ArcGIS Pro error when running the line of code that calls the geoprocessing tool and the entire program shuts down. Have you seen something like this before?

 

JulianaSpector1_0-1701116899697.png

 

HannesZiegler

Hi @JulianaSpector1, this seems to be bug you are running into. If you haven't already, or if you are able to reproduce the crash, please submit an error report and include as much information as possible in the comment there. Specifically, it would be helpful to know the method you used to convert the data frame into an Arrow Table. Ideally, a comprehensive and reproducible workflow would help us to accurately understand what is happening.

 

HannesZiegler

@JulianaSpector1 In the upcoming Pro 3.3 release we fixed a crash that sometimes occurred when the Arrow Table has an unsupported schema specification; it may be the same crash you were experiencing. 

HannesZiegler

Hello all,

It's been a while since this idea was posted, and there have been some developments since (as noted in the previous responses). We wanted to summarize the current existing paths for moving data from a Featureclass to a Pandas DataFrame, and back again to a FeatureClass.

 

1) Use Apache Arrow

As described in the blog Leverage Apache Arrow in ArcGIS Pro (esri.com), you can convert a FeatureClass to an Arrow Table, and from there convert to a Pandas DataFrame. You can also go the other way around, converting from a Pandas DataFrame to an Arrow Table, and from there to a FeatureClass.

Note if there is a geometry column, the conversion from an Arrow Table back to a FeatureClass requires properly specifying Esri's required schema for Arrow Tables. Here's a flexible function that can help with that step:

 

def pandas_to_arrow(pdf, geom_fld, geom_encoding, geom_sr):
    """
    Convert a Pandas DataFrame to an Arrow Table with Esri required schema.

    Parameters
    ----------
    pdf : pandas.DataFrame
      The Pandas DataFrame to be converted to an Arrow Table with the Esri required schema applied.
    geom_fld : str
      The name of the geometry column.
    geom_encoding : str
      The geometry encoding of the geometry column. Valid values are "EsriShape", "EsriJSON", "WKB", "WKT", or "GeoJSON".
    geom_sr : str
      The spatial reference (as a WKT CSR string).

    Returns
    -------
    pyarrow.Table
        Arrow Table with Esri required schema applied.
    """

    # Create Arrow Table from PDF
    patable = pa.Table.from_pandas(pdf)

    # Grab the field that contains geometry & define Esri required schema
    fld = patable.field(geom_fld)
    fld = fld.with_metadata({'esri.encoding' : geom_type,'esri.sr_wkt': geom_sr})

    # Update the schema of the geometry field
    schema = patable.schema
    schema = schema.set(schema.get_field_index(geom_fld), fld)
    patable = patable.cast(schema)

    return patable

 

Note also: If you're just moving around a table with no geometry, you don't need to worry about the schema.

 

2) Use the ArcGIS API for Python's Spatially Enabled DataFrame (SEDF)

This is the easiest way to move data between Pandas (SEDF) and a FeatureClass.

An SEDF is just a Pandas DataFrame extended with additional spatial capabilities. The blog linked above also shows how to round-trip from a FeatureClass to/from an SEDF. SEDF is part of ArcGIS, so there are easy to use built in methods that take care of everything. 

You can also use an SEDF directly as an input to ArcPy geoprocessing tools.

 

While these don't provide direct ArcPy to/fromDataFrame methods, we believe these options should cover the needs. Due to these existing options and other technical reasons, we are currently not considering a direct arcpy.da.FeatureClassToDataFrame and arcpy.da.DataFrameToFeatureClass methods. However, we will keep this idea open for additional feedback for the time being.

 

Thank you for your time and feedback