Idea:
ArcGIS Data Pipelines currently supports inputs from Snowflake, BigQuery, Databricks, cloud storage (S3, Azure), and ArcGIS layers. However, many organizations are modernizing their data lakes with Apache Iceberg as the open table format of choice. Iceberg provides schema evolution, time travel, partition pruning, and ACID transactions at scale, and it is increasingly adopted across cloud platforms (AWS, Azure, GCP) and engines (Spark, Flink, Trino, Snowflake).
It would be highly valuable if ArcGIS Data Pipelines could natively connect to Apache Iceberg tables to ingest geospatial and tabular data directly into ArcGIS for visualization, analysis, and dashboards.
Proposed Features:
Native connector for Iceberg tables stored in cloud object storage (S3, ADLS, GCS).
Ability to query Iceberg tables via SQL engines (Spark, Trino, Flink) and expose them as inputs to pipelines.
Support for incremental loads (append-only or based on Iceberg snapshots/time travel).
Schema mapping options to automatically align Iceberg fields with ArcGIS field types.
Benefits:
Simplifies workflows by eliminating the need to export Iceberg data into CSV/Parquet before loading into ArcGIS.
Supports large-scale, cloud-native geospatial analytics by bridging open data lakehouse formats with ArcGIS Online.
Aligns with modern enterprise architectures where Iceberg is becoming a standard for governed, high-volume analytical data.
Expands ArcGIS’s interoperability with open-source and vendor-neutral data ecosystems.
Use Cases:
Energy, utilities, and environmental organizations managing terabytes of sensor and IoT data in Iceberg.
Governments and nonprofits using Iceberg for open data lakes and needing to share subsets via ArcGIS Hub or Dashboards.
Financial and supply chain analytics teams combining ArcGIS spatial analysis with Iceberg-powered data lakes.