<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to use parquet and SEDF in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569675#M73376</link>
    <description>&lt;P&gt;Got it.&lt;/P&gt;&lt;P&gt;The entire conversion is based on pyarrow as far as I understand.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.esri.com/arcgis-blog/products/arcgis-pro/developers/leverage-apache-arrow-in-arcgis-pro/" target="_blank" rel="noopener"&gt;https://www.esri.com/arcgis-blog/products/arcgis-pro/developers/leverage-apache-arrow-in-arcgis-pro/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The above article was a great help to understand.&lt;/P&gt;&lt;P&gt;So the code would be sth. like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
import pyarrow.parquet as pq

arrow_table = sdf.spatial.to_arrow()
pq.write_table(arrow_table, path_dir + 'test.parquet')
retrieved_arrow_table = pq.read_table(path_dir + 'test.parquet')
sdf_pq = pd.DataFrame.spatial.from_parquet('test.parquet') &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How to use DASK in this combination I could not figure out yet. To me it seems that DASK has no geometry datatype (yet).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;constructing the sdf from parquet &amp;amp; arrow is quite slow, takes my maschine about 5min 30sec.&lt;/P&gt;</description>
    <pubDate>Thu, 19 Dec 2024 12:28:02 GMT</pubDate>
    <dc:creator>Mer-lin</dc:creator>
    <dc:date>2024-12-19T12:28:02Z</dc:date>
    <item>
      <title>How to use feather/parquet/arrow and SEDF</title>
      <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569336#M73364</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;I am using an &lt;A href="https://developers.arcgis.com/python/latest/guide/part1-introduction-to-sedf/" target="_self"&gt;SEDF&lt;/A&gt; in ArcGIS Pro 3.1.0 on a regular basis. So far I am constructing it each day anew from an Excel file, which is slow. Now I would like to safe the data as a .parquet file, so I can quickly safe the SEDF on disk and load it whenever in a fast manner.&lt;/P&gt;&lt;P&gt;I tried two approaches.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;1. with DASK&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I used &lt;EM&gt;ddf = from_pandas(sdf, npartitions=1)&lt;/EM&gt;&amp;nbsp;and &lt;EM&gt;ddf.to_parquet(path_dir + 'test.parquet'&lt;BR /&gt;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;This throwns an Error:&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;ValueError&lt;/SPAN&gt;: Failed to convert partition to expected pyarrow schema:
    `ArrowInvalid("Could not convert bytearray(b'\\x01\\x01\\x00\\x00\\x00\\x00\\x00\\x00 \\xf2\\xbd!A\\x00\\x00\\x00@^\\x9cVA') with type bytearray: converting to null type", 'Conversion failed for column SHAPE with type geometry')`&lt;/PRE&gt;&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;Any idea how I could get past the error?&lt;BR /&gt;I found the geoparquet file type, but no way to use dask for that?&lt;BR /&gt;Apparently geopandas has some compability, which at the moment I cant install as package ...&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;2. with the ArcGIS API&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;As described &lt;A href="https://developers.arcgis.com/python/latest/api-reference/arcgis.features.toc.html#arcgis.features.GeoAccessor.to_parquet" target="_self"&gt;here&lt;/A&gt;, you can apparently write parquet files containing geometry with the ArcGIS API, but I couldn't read them with the s&lt;EM&gt;df = pd.DataFrame.spatial.from_parquet('test.parquet')&amp;nbsp;&lt;/EM&gt;&lt;A href="https://developers.arcgis.com/python/latest/api-reference/arcgis.features.toc.html#arcgis.features.GeoAccessor.from_parquet" target="_self"&gt;function&lt;/A&gt;, telling me:&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;ValueError&lt;/SPAN&gt;: Missing geo metadata in Parquet/Feather file.
            Use pandas.read_parquet/read_feather() instead.&lt;/PRE&gt;&lt;P&gt;The doc allready tells me:&lt;BR /&gt;if no geometry columns are read, this will raise a &lt;SPAN class=""&gt;ValueError&lt;/SPAN&gt; - you should use the pandas read_parquet method instead.&lt;/P&gt;&lt;P&gt;Which from my perspective dosen't make a lot of sense since a geometry col is present ...&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit1:&lt;/STRONG&gt;&lt;BR /&gt;Changed the title to specify parquet.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit2:&lt;/STRONG&gt;&lt;BR /&gt;Changed the title to specify feather/parquet/arrow&lt;/P&gt;&lt;P&gt;--------&lt;/P&gt;&lt;P&gt;In case this is useful, the output of sdf.info()&lt;/P&gt;&lt;PRE&gt;&amp;lt;class 'pandas.core.frame.DataFrame'&amp;gt;
Int64Index: 293321 entries, 0 to 293324
Data columns (total 23 columns):
 #   Column           Non-Null Count   Dtype         
---  ------           --------------   -----         
 0   Date_Time_Alarm  293321 non-null  datetime64[ns]
 1   HE_Event_Num     292795 non-null  string        
 2   Call_Sign        293321 non-null  string        
 3   Station          293309 non-null  string        
 4   E_Event_Num      293321 non-null  string        
 5   Event_type       293321 non-null  string        
 6   Destination      274531 non-null  string        
 7   Orga_Name        293321 non-null  string        
 8   Orga_Group       293321 non-null  string        
 9   Event_min        293278 non-null  float32       
 10  Transit_min      267680 non-null  float32       
 11  Deployment_sec   291714 non-null  float32       
 12  X                293321 non-null  float32       
 13  Y                293321 non-null  float32       
 14  Street           293219 non-null  string        
 15  Housenumber      266766 non-null  string        
 16  Address          275289 non-null  string        
 17  DAT_TD_ND_CT     293321 non-null  string        
 18  DAT_Weekday      293321 non-null  string        
 19  Hour_CT_EVENT    293321 non-null  int16         
 20  Month_CT_EVENT   293321 non-null  int16         
 21  Year_CT_EVENT    293321 non-null  int16         
 22  SHAPE            293321 non-null  geometry      
dtypes: datetime64[ns](1), float32(5), geometry(1), int16(3), string(13)
memory usage: 51.1 MB&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 12:43:57 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569336#M73364</guid>
      <dc:creator>Mer-lin</dc:creator>
      <dc:date>2024-12-19T12:43:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to store an SEDF on disk</title>
      <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569400#M73367</link>
      <description>&lt;P&gt;some options&lt;/P&gt;&lt;P&gt;&lt;A href="https://developers.arcgis.com/python/latest/guide/part3-data-io-writing-data/" target="_blank"&gt;Part-3 Data IO with SeDF - Exporting Data | ArcGIS API for Python&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 17 Dec 2024 16:55:27 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569400#M73367</guid>
      <dc:creator>DanPatterson</dc:creator>
      <dc:date>2024-12-17T16:55:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to store an SEDF on disk</title>
      <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569659#M73375</link>
      <description>&lt;P&gt;Cheers. These are options, unfortunately they are very slow in comparison with a working DASK + parquet combination.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 08:20:08 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569659#M73375</guid>
      <dc:creator>Mer-lin</dc:creator>
      <dc:date>2024-12-18T08:20:08Z</dc:date>
    </item>
    <item>
      <title>Re: How to use parquet and SEDF</title>
      <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569675#M73376</link>
      <description>&lt;P&gt;Got it.&lt;/P&gt;&lt;P&gt;The entire conversion is based on pyarrow as far as I understand.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.esri.com/arcgis-blog/products/arcgis-pro/developers/leverage-apache-arrow-in-arcgis-pro/" target="_blank" rel="noopener"&gt;https://www.esri.com/arcgis-blog/products/arcgis-pro/developers/leverage-apache-arrow-in-arcgis-pro/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;The above article was a great help to understand.&lt;/P&gt;&lt;P&gt;So the code would be sth. like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
import pyarrow.parquet as pq

arrow_table = sdf.spatial.to_arrow()
pq.write_table(arrow_table, path_dir + 'test.parquet')
retrieved_arrow_table = pq.read_table(path_dir + 'test.parquet')
sdf_pq = pd.DataFrame.spatial.from_parquet('test.parquet') &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;How to use DASK in this combination I could not figure out yet. To me it seems that DASK has no geometry datatype (yet).&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;constructing the sdf from parquet &amp;amp; arrow is quite slow, takes my maschine about 5min 30sec.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2024 12:28:02 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569675#M73376</guid>
      <dc:creator>Mer-lin</dc:creator>
      <dc:date>2024-12-19T12:28:02Z</dc:date>
    </item>
    <item>
      <title>Re: How to use parquet and SEDF</title>
      <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569687#M73377</link>
      <description>&lt;P&gt;&lt;A href="https://dask-geopandas.readthedocs.io/en/stable/" target="_blank"&gt;dask-geopandas documentation — dask-geopandas&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://dask-geopandas.readthedocs.io/en/stable/docs/reference/api/dask_geopandas.GeoDataFrame.html" target="_blank"&gt;dask_geopandas.GeoDataFrame — dask-geopandas&lt;/A&gt;&lt;/P&gt;&lt;P&gt;geopandas seems to be the link for geometry needs&lt;/P&gt;</description>
      <pubDate>Wed, 18 Dec 2024 11:00:48 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1569687#M73377</guid>
      <dc:creator>DanPatterson</dc:creator>
      <dc:date>2024-12-18T11:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: How to use feather/parquet/arrow and SEDF</title>
      <link>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1570133#M73395</link>
      <description>&lt;P&gt;The fastet solution I could find so far would be using feather. The format is not as highly compressed as parquet, but using&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import pyarrow.feather as feather
feather.write_feather(sdf, path_dir +'test.feather')
sdf_f = pd.DataFrame.spatial.from_feather(path_dir +'test.feather', spatial_column='SHAPE', columns=None, use_threads=True)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I could write and read my sdf.shape (293321, 23) in about 45sec, as it allows to parallelize reading using multiple threads.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;To use feather:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;feather is build on pyarrow as well, your data needs to be structured accordingly.&lt;/P&gt;&lt;P&gt;[note edit below]&lt;/P&gt;&lt;P&gt;The biggest issue I had was that pandas dataframe accepts "object" classes. Meaning you could have in one "colum" mixed values. All formats, parquet, feather, arrow, wont accept these. Therefore you need to clean and eliminate the "object" datatype. If you deal with NULL values, use pd.NA for string columns and np.nan for numerical.&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.esri.com/arcgis-blog/products/arcgis-pro/developers/leverage-apache-arrow-in-arcgis-pro/" target="_self"&gt;Introduction to Apache Arrow&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://pro.arcgis.com/de/pro-app/latest/arcpy/get-started/working-with-arrow-in-arcgis.htm" target="_self"&gt;More about the data types&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
import pyarrow.parquet as pq

arrow_table = sdf.spatial.to_arrow()
pq.write_table(arrow_table, path_dir + 'test.parquet')
retrieved_arrow_table = pq.read_table(path_dir + 'test.parquet')
sdf_pq = pd.DataFrame.spatial.from_parquet('test.parquet') &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit&lt;/STRONG&gt;:&lt;/P&gt;&lt;P&gt;Apparently one could use object data type, that itself appears to be no problem. I guess the data it holds may pose issues ... just a guess really.&lt;/P&gt;&lt;P&gt;Furthermore: from_feather only returns a pandas dataframe. To recunstruct a SEDF one needs to use&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;sdf = pd.DataFrame.spatial.from_df(df, sr=25832, geometry_column='SHAPE')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-12-23 094550.png" style="width: 400px;"&gt;&lt;img src="https://community.esri.com/t5/image/serverpage/image-id/122349i72263893FAFB8300/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-12-23 094550.png" alt="Screenshot 2024-12-23 094550.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Edit 2:&lt;/STRONG&gt;&lt;BR /&gt;&lt;BR /&gt;If you are wondering how to deal with projection and re-projection see &lt;A href="https://community.esri.com/t5/python-questions/re-project-geometry-after-spatial-from-feather/m-p/1578423/highlight/true#M73627" target="_blank" rel="noopener"&gt;this post&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Aug 2025 08:25:57 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/how-to-use-feather-parquet-arrow-and-sedf/m-p/1570133#M73395</guid>
      <dc:creator>Mer-lin</dc:creator>
      <dc:date>2025-08-07T08:25:57Z</dc:date>
    </item>
  </channel>
</rss>

