Select to view content in your preferred language

Add Support for Apache Sedona as a Connector in ArcGIS Data Pipelines

148
0
a month ago
Status: Open
VenkataKondepati
Occasional Contributor

Idea:
ArcGIS Data Pipelines currently supports connections to sources like Snowflake, BigQuery, and cloud storage platforms. However, many organizations (including those working with large-scale spatial data in energy, utilities, and environmental sectors) rely on Apache Sedona for distributed geospatial processing on Apache Spark clusters.

It would be very valuable if ArcGIS Data Pipelines could directly connect to Apache Sedona outputs or integrate with Spark environments that use Sedona. This would allow users to:

  • Seamlessly bring massive spatial datasets processed with Sedona into ArcGIS for visualization and analysis.

  • Avoid intermediate export/import steps (GeoJSON, CSV, shapefiles) that add complexity and cost.

  • Enable end-to-end workflows where Sedona handles large-scale spatial joins, indexing, and processing, and ArcGIS Data Pipelines/ArcGIS Online handle publishing, dashboards, and sharing.

  • Support hybrid cloud + enterprise environments where Spark + Sedona are already widely used for big data analytics.

Proposed Features:

  • Native connector for Apache Sedona datasets (via Spark SQL, Parquet, or Delta Lake integration).

  • Ability to register a Spark/Sedona endpoint as a Data Pipeline input.

  • Support for both batch ingestion and near-real-time streaming pipelines.

Benefits:
This would bridge the gap between distributed geospatial processing (Sedona) and enterprise GIS visualization/sharing (ArcGIS), making ArcGIS Data Pipelines far more powerful for organizations dealing with terabytes of spatial data.