Select to view content in your preferred language

Cloud native GeoParquet ingest with ArcGIS Pro 3.5+ just got easier

1842
3
05-14-2025 01:24 PM
BruceHarold
Esri Frequent Contributor
5 3 1,842

With the release of ArcGIS Pro 3.5, the stars align a little more when it comes to the use of GeoParquet.  You can now work with local GeoParquet files for your mapping and analysis needs, but it is also much easier to ingest big GeoParquet data from an S3-API-compliant object store!

This post is about how simple it is to bring remote GeoParquet data into your project.

The enabling technology is DuckDB, now included in the default Python environment in ArcGIS Pro 3.5 - no more package management just for this spectacularly useful client technology.

Here is an example, the entire Overture Maps Foundation divisions dataset accessed from their AWS S3 object store and written to my project home geodatabase.

Overture DivisionsOverture Divisions

Automation is key to GIS happiness, so to access this data I created a simple notebook which you can find in the post download.  You'll need ArcGIS Pro 3.5 to run it, or an earlier release with your Python environment extended with DuckDB 1.1+.

It takes me about 6 minutes to download the 1m+ features to my project home geodatabase, but a big chunk of that is taken up in a couple of best-practice steps, namely sorting the features on area (descending) and repairing any geometry issues.  The sort step is so small features display on top of large features, the geometry repair is commonly needed for point-rich data that "tiles the plain' like these divisions do.

The lift and shift itself is fast.

I'll let you inspect the notebook for yourselves, but note the option to apply an attribute or spatial filter on the features you download, for example within a bounding box in lat/long or the name of a country.  Instead of manually download a set of very large parquet files from S3 you now have a simple tool to go get what you want, any time you like!

3 Comments
Youssef-Harby
Emerging Contributor

@BruceHaroldThis is big news! I have tried the new version, and the ability to explore Parquet files in ArcGIS Pro with a simple drag-and-drop is impressive. However, I do have some performance concerns

Shouldn't ArcGIS Pro natively read Arrow data in memory so it can be seamlessly shared with other Python libraries? Or am I missing something? By caching the data in a different in-memory format instead of Arrow, doesn't that go against the core idea behind using Parquet in the first place?

BruceHarold
Esri Frequent Contributor

@Youssef-Harby Hi, thanks for reading my post and for the comment!

The notebook attached to this post reads remote GeoParquet and writes to a project home file geodatabase feature class, so it isn't an in-memory view of data.  I have another blog that does create in-memory data, including from GeoParquet (requires ArcGIS Data Interoperability extension).

Core ArcGIS Pro caches GeoParquet behind the scenes, so isn't an in-memory experience either.

There are many Arrow fans at Esri, you can see how to use the format via Python like here and here.

We're still pretty early in the GeoParquet journey in ArcGIS, for example how nested data types are going to be used to push complex data models down into the base table (for sharing purposes) is a topic that interests me, and also how people might use GeoParquet to share constantly evolving big data managed in a branch versioned enterprise geodatabase, so the more feedback from customers the better!

RodneyConger1
Occasional Contributor

Thank you for creating and updating these samples Bruce. That saves us all who are working with Overture data a lot of time trying to figure out how to do it ourselves.

Contributors