Select to view content in your preferred language

Subscribing To Overture Data With One Click Refresh

484
6
2 weeks ago
BruceHarold
Esri Frequent Contributor
3 6 484

In this post I'm revisiting the ingest of Overture Maps Foundation data via ArcGIS Pro notebook, for a few reasons:

  • To show a subscription pattern for the latest data in an area of interest
  • To show the value of unnesting complex columns in parquet files
  • To show geocoding locator creation in addition to geodatabase object creation
  • To show metadata creation simultaneously with data creation
  • To show sharing notebooks via OneDrive provides Overture processing capability to your colleagues

You're not here for the cartography, and to prove it here's a map of my subject matter data:

Division Areas and Places in greater London, GBDivision Areas and Places in greater London, GB

Overlaying a Colored Pencil  basemap, there are global extent Division Area  polygons and Places point features within an area of interest (greater London, UK) defined by selected division area features.  What you can't see, but we'll get to, is that separate POI-role locators made from the division area and places features are active ArcGIS Pro project locators in the map.

The division area polygon layer (transparent with green outlines) is global, with over 1M features.  Division areas go from country down to microhood size, so to help with display I sort them by area (descending), so they look quite busy in the map, but you have flexibility in area of interest definition.  For the places points, there are over 417K features (dark blue dots) in the area of interest, which at the map scale shown (1:500,000)  makes the symbol density very high.

Together, the division area features, places points, relationships to alternate value tables and associated locators for each layer are my information products, which will be maintained by one-click automation on demand in two notebooks shared on OneDrive, one each for division areas and places.

These products will all be made in the ArcGIS Pro project default geodatabase or home folder.

Let's work through each of the aspects I want to show in this post.

Subscribing to the latest Overture data in an area of interest

The subscription concept I'm going for here is based on an area of interest for which you want to receive data refreshes on demand.  The area is defined by any number of division area features selected by SQL query where clause (you figure out the where clause after first translating the division area features and doing some map-based exploration).  Here is mine:

"country = 'GB' and id in ('e8e3f6e2-2c45-4708-805c-41d08ab38de1','89c092f8-4287-4401-b72f-4a5a067eee22','2d0e78fb-f7fa-4c8f-be95-69a60527fc97')"
Spoiler
Tip: Including the country value 'GB' in the where clause isn't technically necessary but it reduces the workload back in S3, allowing the system to skip reading parquet files that contain no British data.
The SQL query reads hive-partitioned GeoParquet files in a public S3 bucket.  Overture data is published monthly, the notebooks automatically determine the latest data to read.

 

The value of unnesting complex columns in parquet files

Overture data is available in GeoParquet format.  You may have seen discussion on GeoParquet being a candidate to be the "new shapefile" - the de facto format for sharing georelational data.  While GeoParquet isn't editable like shapefiles are (except by replacement) it has many attractive features, one being support for complex column types.  See in the schema for the places theme:

Places theme schema as seen by DuckDBPlaces theme schema as seen by DuckDB

Note that several columns are struct type, with the structs containing complex properties.  The data isn't flat, and to get the most value from the data (division area translated names, place point alternate categories and address components etc.) you must unnest the structs.  You'll see in the notebooks I don't unnest them all (because some were empty) but no useful data was left behind.  So when working with parquet, be prepared for a higher information density than shapefiles.

You'll see in the blog post notebooks that a couple of structs are unnested and relationship classes created to the resultant tables, like here where searching for ice cream shops in the places data lets you see what other categories the outlets identify as...

Ice cream shops also offering...Ice cream shops also offering...

or identifying translated names for a division area feature...

Division Area name translationsDivision Area name translations

... and translated names are a good segue into my next topic!

Geocoding locators made from the data

Many division area names have common names in other languages, see below we can identify the Japanese name for Wellington, New Zealand (ウェリントン) and use it in the division areas locator.

Geocoding from the Japanese name for Wellington, New ZealandGeocoding from the Japanese name for Wellington, New Zealand

One of the higher value themes from Overture is places data, and the locator built from that data provides a compelling map navigation experience - using place category as a hint to refine a geocode.

For example hinting I want to find a train station:

Train StationsTrain Stations

Or a restaurant in my map extent:

 

RestaurantsRestaurants

Before this I had no idea custom categories from your data could be used this way.  There will be many more use cases for this rich places data.  This brings me to...

Creating metadata simultaneously with data

Since we're using a notebook approach it is straightforward to use the metadata class to automate writing metadata to output objects - it is important to record the release and processing timestamps at minimum, and you might like to record feature counts and other observations, so they travel with the data.  I will not clutter the post with the cell code that does the job, you can surf the notebooks for that.  Which brings me to...

Sharing notebooks to OneDrive

The notebooks are suitable for external consumption.  They automatically detect the changing input path at run time - so are good candidates for sharing to OneDrive, from where your colleagues may run them on demand - the "one click" experience.  OK well maybe two clicks, open the notebook then run it 😉..

To share the notebooks, create a folder in OneDrive, copy the notebooks into it, then share the folder and notebook source files to anyone, with edit permission.  People invited to the folder can use OneDrive's browser-based controls to add the folder to their local files, then from Windows Explorer drag the folder connection into an ArcGIS Pro project folder connection.

Notebooks on OneDriveNotebooks on OneDrive

Ready-to-use notebooks are in the post attachment.  You'll need ArcGIS Pro 3.3+ standard or advanced license.  ImportCurrentDivisionAreas takes about 30 minutes to run, ImportPlacesByDivisionArea about 15 minutes for the area of interest shown, this will vary with the area you use.  Do not run both notebooks simultaneously, they have variable names that will collide.  Do comment in the post with any experiences you want to share, or questions you have, for example who would like the Addresses theme supported with a locator output?  I'm guessing many people...

Have fun with your Overture subscriptions!

 

6 Comments
Marc_Graham
Frequent Contributor

Hi @BruceHarold ,

Thanks for this.  I am getting an error installing the DuckDB Extensions: 

Marc_Graham_0-1761173128474.png

I can't work out why it is trying to get the extension from v0.0.0

have you seen this and maybe know how to force a version?

Thanks,

Marc

BruceHarold
Esri Frequent Contributor

Hi Marc, do you mean you get this error when importing duckdb or is this when adding DuckDB to a pre-3.5 Pro Python conda environment?  The spatial extension is already available in Pro 3.5+.

BTW, Overture is broken at the moment, people are looking into an issue in the registry, which the notebook uses to determine the latest release.

Marc_Graham
Frequent Contributor

Thanks for the update about Overture.  I might park this for a bit to give them time to sort it.

The errors come at the very beginning of the notebook.  This is in Pro 3.6 beta btw.

If I comment out a few lines to just try and set the region I get the following similar error for the httpfs extension:

Marc_Graham_0-1761250730931.png

I have googled and cannot find any reference to where it does not pull the latest version, and defaults to v0.0.0.

Thanks,

Marc

BruceHarold
Esri Frequent Contributor

Thanks, you might have hit a bug - I'll get this to the right people.

I'll remove the Spoiler in the post re. the Overture registry issue when it's sorted.

BruceHarold
Esri Frequent Contributor

The team tell me DuckDB was broken in Pro 3.6 Beta 1, it's fixed in later builds.

Marc_Graham
Frequent Contributor

OK, I will update to the latest Beta. Thanks.

Contributors