<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Stand-Alone Table to Pandas Data Frame in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349915#M69240</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm looking for the most efficient way to convert a stand-alone table to a pandas data frame. This table will ultimately be geocoded and saved as a feature class, but I need to manipulate the data and fields quite a bit before that. I find pandas to the easiest and most efficient way to do the data manipulation.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Currently, I am using&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;arcpy.conversion.TableToTable(&lt;/SPAN&gt;&lt;SPAN&gt;) to first convert the table to a csv file, and then pandas.read_csv() to convert to a data frame. The table has roughly 63,000 records and it is taking over an hour to do the TableToTable portion of the conversion.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is there a better way to do this? Pandas is so quick so read the csv and I so wish I could read the table directly into pandas without the intermediate of a csv.&lt;/P&gt;&lt;P&gt;Many Thanks&lt;/P&gt;</description>
    <pubDate>Wed, 15 Nov 2023 20:04:26 GMT</pubDate>
    <dc:creator>KellyTaylor</dc:creator>
    <dc:date>2023-11-15T20:04:26Z</dc:date>
    <item>
      <title>Stand-Alone Table to Pandas Data Frame</title>
      <link>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349915#M69240</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I'm looking for the most efficient way to convert a stand-alone table to a pandas data frame. This table will ultimately be geocoded and saved as a feature class, but I need to manipulate the data and fields quite a bit before that. I find pandas to the easiest and most efficient way to do the data manipulation.&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Currently, I am using&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;arcpy.conversion.TableToTable(&lt;/SPAN&gt;&lt;SPAN&gt;) to first convert the table to a csv file, and then pandas.read_csv() to convert to a data frame. The table has roughly 63,000 records and it is taking over an hour to do the TableToTable portion of the conversion.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;Is there a better way to do this? Pandas is so quick so read the csv and I so wish I could read the table directly into pandas without the intermediate of a csv.&lt;/P&gt;&lt;P&gt;Many Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2023 20:04:26 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349915#M69240</guid>
      <dc:creator>KellyTaylor</dc:creator>
      <dc:date>2023-11-15T20:04:26Z</dc:date>
    </item>
    <item>
      <title>Re: Stand-Alone Table to Pandas Data Frame</title>
      <link>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349943#M69241</link>
      <description>&lt;P&gt;Try&lt;/P&gt;&lt;P&gt;&lt;A href="https://pro.arcgis.com/en/pro-app/latest/arcpy/data-access/tabletonumpyarray.htm" target="_blank"&gt;TableToNumPyArray—ArcGIS Pro | Documentation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;or&lt;/P&gt;&lt;P&gt;&lt;A href="https://pro.arcgis.com/en/pro-app/latest/arcpy/data-access/tabletoarrowtable.htm" target="_blank"&gt;TableToArrowTable—ArcGIS Pro | Documentation&lt;/A&gt;&lt;/P&gt;&lt;P&gt;to see if they are faster since Pandas has conversions for both&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2023 20:27:31 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349943#M69241</guid>
      <dc:creator>DanPatterson</dc:creator>
      <dc:date>2023-11-15T20:27:31Z</dc:date>
    </item>
    <item>
      <title>Re: Stand-Alone Table to Pandas Data Frame</title>
      <link>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349973#M69243</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here's a function that should do the trick...&amp;nbsp;&lt;/P&gt;&lt;P&gt;Originally sourced from:&amp;nbsp;&lt;A href="https://gist.github.com/d-wasserman/e9c98be1d0caebc2935afecf0ba239a0?permalink_comment_id=3623359" target="_blank"&gt;https://gist.github.com/d-wasserman/e9c98be1d0caebc2935afecf0ba239a0?permalink_comment_id=3623359&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;def feature_class_to_dataframe(input_fc: str, input_fields: list = None, query: str = ""):
    """Converts a feature class to a pandas dataframe. If 
    no input fields are specified, all fields
    will be included. If a query is specified, only those
    features will be included in the dataframe.

    This is an excellent function to use when exploring data
    without having to queue up ArcGIS Pro. Particularly good
    for using pandas to generate unique field values.

    Args:
        input_fc (string): path to the input feature class
        input_fields (list, optional): List of fields for dataframe. 
            Defaults to None.
        query (str, optional): Pandas query. Defaults to "".

    Returns:
        Pandas Dataframe: Dataframe of feature class
    """

    from arcpy import Describe, ListFields
    from arcpy.da import SearchCursor
    from pandas import DataFrame

    # get list of fields if desired fields specified
    OIDFieldName = Describe(input_fc).OIDFieldName
    if input_fields:
        final_fields = [OIDFieldName] + input_fields

    # use all fields if no fields specified
    else:
        final_fields = [field.name for field in ListFields(input_fc)]

    # build dataframe row by row using search cursor
    data = [row for row in SearchCursor(
        input_fc, final_fields, where_clause=query)]
    fc_dataframe = DataFrame(data, columns=final_fields)

    # set index to object id
    fc_dataframe = fc_dataframe.set_index(OIDFieldName, drop=True)
    
    return fc_dataframe&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Nov 2023 21:44:12 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1349973#M69243</guid>
      <dc:creator>LaurencePerry</dc:creator>
      <dc:date>2023-11-15T21:44:12Z</dc:date>
    </item>
    <item>
      <title>Re: Stand-Alone Table to Pandas Data Frame</title>
      <link>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1350100#M69250</link>
      <description>&lt;P&gt;This is a solid script, one minor tweak that can be handy with massive datasets is:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;fc_dataframe = DataFrame((row for row in SearchCursor(input_fc, final_fields, query)), columns=final_fields)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This avoids creating a list for the data before it hits the DataFrame, saving a decent chunk of memory. Might even be faster in some cases.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Nov 2023 01:21:06 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1350100#M69250</guid>
      <dc:creator>DavidSolari</dc:creator>
      <dc:date>2023-11-16T01:21:06Z</dc:date>
    </item>
    <item>
      <title>Re: Stand-Alone Table to Pandas Data Frame</title>
      <link>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1350426#M69252</link>
      <description>&lt;P&gt;For my own knowledge mostly - I think that should still be the same in terms of memory expense no? It's going to run that tuple comprehension and build a DataFrame from it before the tuple is removed from memory I believe...&lt;/P&gt;&lt;P&gt;I think if you really wanted it to take as little memory as possible you might have to read a row in the cursor, convert that row to a temp dataframe, then concat that row into a main dataframe, and then delete the temp dataframe. Might be real slow...&lt;/P&gt;</description>
      <pubDate>Thu, 16 Nov 2023 19:32:33 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/stand-alone-table-to-pandas-data-frame/m-p/1350426#M69252</guid>
      <dc:creator>LaurencePerry</dc:creator>
      <dc:date>2023-11-16T19:32:33Z</dc:date>
    </item>
  </channel>
</rss>

