Select to view content in your preferred language

How to save File Geodatabase using Databricks?

823
3
Jump to solution
11-01-2023 09:02 AM
DiegoMeira
New Contributor III

Hello,


Would it be possible to save a geodataframe as a File Geodatabase through Databricks?

My Databricks environment is connected to an Azure Blob Storage. ArcGIS Python API and GeoAnalytics Engine are installed on Databricks cluster. 

The following link clearly states that: "GeoAnalytics Engine doesn't support saving data into file geodatabases."

https://developers.arcgis.com/geoanalytics/data/data-sources/filegdb/

I know that other file formats are available through Sparks/Pandas, like GeoJson and Shapefile, but my question is regarding File Geodatabases in particular.

Thank you,

 

0 Kudos
1 Solution

Accepted Solutions
DerekGourley
Esri Contributor

Hi,

 

Thanks for the question. At this time, it's not possible to save a DataFrame as a File Geodatabase from Databricks using GeoAnalytics Engine.

 

Outside of a Databricks environment, you could convert your DataFrame to a Spatially Enabled DataFrame: https://developers.arcgis.com/geoanalytics/api-reference/geoanalytics.extensions.html#to-pandas-sdf and then use the ArcGIS API for Python, geopandas, GDAL, or other solution to save it to a File Geodatabase, but this will not perform well with bigger datasets.

 

In general, we recommend saving a DataFrame using one of the other data source formats such as parquet, geoparquet, ORC, or CSV. This is especially true for when it comes to saving bigger datasets.


Thanks,
Derek Gourley
GeoAnalytics Product Engineer

View solution in original post

0 Kudos
3 Replies
DerekGourley
Esri Contributor

Hi,

 

Thanks for the question. At this time, it's not possible to save a DataFrame as a File Geodatabase from Databricks using GeoAnalytics Engine.

 

Outside of a Databricks environment, you could convert your DataFrame to a Spatially Enabled DataFrame: https://developers.arcgis.com/geoanalytics/api-reference/geoanalytics.extensions.html#to-pandas-sdf and then use the ArcGIS API for Python, geopandas, GDAL, or other solution to save it to a File Geodatabase, but this will not perform well with bigger datasets.

 

In general, we recommend saving a DataFrame using one of the other data source formats such as parquet, geoparquet, ORC, or CSV. This is especially true for when it comes to saving bigger datasets.


Thanks,
Derek Gourley
GeoAnalytics Product Engineer
0 Kudos
DiegoMeira
New Contributor III

Thank you for the suggestions.

Our team will explore some of the alternative formats you've recommended. The primary concern is that, in the future, we intend to publish these files as layers in our ArcGIS Enterprise portal. The portal supports only some formats, and others have limitations on sizes, like CSV. 

Trying to publish big data directly using ArcGIS Python API is also giving us trouble, often causing crashes on the ArcGIS Server.

We are considering a few other approaches, but thank you for your input.

0 Kudos
DerekGourley
Esri Contributor

Hi,

Thanks for sharing more information about your workflow.

The good news is that GeoAnalytics Engine supports writing a DataFrame directly to an ArcGIS Enterprise Portal as a feature service: https://developers.arcgis.com/geoanalytics/tutorials/data/write-to-feature-services/

This is our documentation topic that gives a general overview of using GeoAnalytics Engine to work with feature services: https://developers.arcgis.com/geoanalytics/data/data-sources/feature-service/

We also have documentation that covers reading data from a feature service into a GeoAnalytics Engine DataFrame: https://developers.arcgis.com/geoanalytics/tutorials/data/read-from-feature-services/


Thanks,
Derek Gourley
GeoAnalytics Product Engineer
0 Kudos