topic Stand-Alone Table to Pandas Data Frame in Python Questions

Stand-Alone Table to Pandas Data Frame

KellyTaylor — Wed, 15 Nov 2023 20:04:26 GMT

Hello,

I'm looking for the most efficient way to convert a stand-alone table to a pandas data frame. This table will ultimately be geocoded and saved as a feature class, but I need to manipulate the data and fields quite a bit before that. I find pandas to the easiest and most efficient way to do the data manipulation.

Currently, I am using arcpy.conversion.TableToTable() to first convert the table to a csv file, and then pandas.read_csv() to convert to a data frame. The table has roughly 63,000 records and it is taking over an hour to do the TableToTable portion of the conversion.

Is there a better way to do this? Pandas is so quick so read the csv and I so wish I could read the table directly into pandas without the intermediate of a csv.

Many Thanks

Re: Stand-Alone Table to Pandas Data Frame

DanPatterson — Wed, 15 Nov 2023 20:27:31 GMT

Try

TableToNumPyArray—ArcGIS Pro | Documentation

TableToArrowTable—ArcGIS Pro | Documentation

to see if they are faster since Pandas has conversions for both

Re: Stand-Alone Table to Pandas Data Frame

LaurencePerry — Wed, 15 Nov 2023 21:44:12 GMT

Here's a function that should do the trick...

Originally sourced from: https://gist.github.com/d-wasserman/e9c98be1d0caebc2935afecf0ba239a0?permalink_comment_id=3623359

def feature_class_to_dataframe(input_fc: str, input_fields: list = None, query: str = ""): """Converts a feature class to a pandas dataframe. If no input fields are specified, all fields will be included. If a query is specified, only those features will be included in the dataframe. This is an excellent function to use when exploring data without having to queue up ArcGIS Pro. Particularly good for using pandas to generate unique field values. Args: input_fc (string): path to the input feature class input_fields (list, optional): List of fields for dataframe. Defaults to None. query (str, optional): Pandas query. Defaults to "". Returns: Pandas Dataframe: Dataframe of feature class """ from arcpy import Describe, ListFields from arcpy.da import SearchCursor from pandas import DataFrame # get list of fields if desired fields specified OIDFieldName = Describe(input_fc).OIDFieldName if input_fields: final_fields = [OIDFieldName] + input_fields # use all fields if no fields specified else: final_fields = [field.name for field in ListFields(input_fc)] # build dataframe row by row using search cursor data = [row for row in SearchCursor( input_fc, final_fields, where_clause=query)] fc_dataframe = DataFrame(data, columns=final_fields) # set index to object id fc_dataframe = fc_dataframe.set_index(OIDFieldName, drop=True) return fc_dataframe

Re: Stand-Alone Table to Pandas Data Frame

DavidSolari — Thu, 16 Nov 2023 01:21:06 GMT

This is a solid script, one minor tweak that can be handy with massive datasets is:

fc_dataframe = DataFrame((row for row in SearchCursor(input_fc, final_fields, query)), columns=final_fields)

This avoids creating a list for the data before it hits the DataFrame, saving a decent chunk of memory. Might even be faster in some cases.

Re: Stand-Alone Table to Pandas Data Frame

LaurencePerry — Thu, 16 Nov 2023 19:32:33 GMT

For my own knowledge mostly - I think that should still be the same in terms of memory expense no? It's going to run that tuple comprehension and build a DataFrame from it before the tuple is removed from memory I believe...

I think if you really wanted it to take as little memory as possible you might have to read a row in the cursor, convert that row to a temp dataframe, then concat that row into a main dataframe, and then delete the temp dataframe. Might be real slow...