Using Overture Maps Data in GeoAnalytics Engine

SBattersby · ‎10-31-2023

Recently, the Overture Maps Foundation released the second version of their world-wide open map data (the 2023-10-19-alpha.0 release). In this blog post, we’ll explore these newly released data and how you can access and use them for spatial analyses using ArcGIS GeoAnalytics Engine.

In the most recent data release, a new theme has been added, so that there are now five data themes available: Places of Interest (POIs), Buildings, Transportation Network, Administrative Boundaries, and a base theme (land, land use, and water).

These datasets have an incredible amount of detail and great coverage across the world. For instance, this is the distribution of just the Places dataset:

Overture Places theme data distribution

In addition to the new theme, the recent release introduces Global Entity Reference System (GERS) IDs. GERS is a system of encoding map data to a shared universal reference. Using the GERS ID will make it easier to match data from different data providers. You can read more about the new Overture data and the benefit of GERS in “Enriching Overture Data with GERS.”

About Overture Maps

The Overture Maps Foundation is a collaboration founded last December by Amazon Web Services (AWS), Meta, Microsoft, and TomTom. Overture’s mission is to create reliable, easy-to-use, and interoperable open map data. Overture builds on the work of other open data projects, such as OpenStreetMap (OSM), to create high quality, comprehensive, curated datasets designed for use in building map products and services.

Esri is a member of the Overture Maps Foundation, and is committed to expanding access to ready-to-use map data and to building geospatial tools that help users leverage these data for the analytics that drive their business and research.

Accessing and licensing data

Overture Maps currently provides five data themes. Each is available under an open data license:

Places (57M+ place records): CDLA Permissive v 2.0
Buildings (1.39B+ building footprints): ODbL
Transportation (highways, footways, cycleways, railways, ferry routes, and public transportation): ODbL
Administrative Boundaries (multiple locality types): ODbL
Base theme (land, land use, and water): ODbL

The data is distributed in Parquet format and is available via AWS and Azure file stores. Details on accessing the data are available here from Overture Maps.

Esri is also hosting some of the layers, such as the Overture Buildings with GERS and USA Structures with GERS IDs, as feature layers in Overture’s ArcGIS Online Organization, and intends to publish more Overture data as feature layers with future releases.

In this blog post, we’ll walk through accessing the data and bringing it into ArcGIS GeoAnalytics Engine to perform spatial calculations. If you have AWS or Azure credentials, you can access the data directly from within your cloud analytics environment, otherwise you can use the AWS command line interface (CLI) to download the data locally. To download the Overture Places theme, for instance, you can use the CLI to grab the data from the alpha release from October 19 like this:

aws s3 cp --region us-west-2 --no-sign-request --recursive s3://overturemaps-us-west-2/release/2023-10-19-alpha.0/theme=place <output location>

If you are working in an environment where you can access the data directly from an AWS or Azure file store, you can bring the data into GeoAnalytics Engine without having to download it locally.

Read in the data

To read the data in GeoAnalytics Engine, it's as easy as pointing to the Parquet file. It looks like this to read in the Overture Places data directly from an S3 bucket:

s3_place_parquet = "s3://overturemaps-us-west-2/release/2023-10-19-alpha.0/theme=places"

A great aspect of accessing the data from the AWS or Azure data store is that if you know that you don’t want the entire data set – because they can be quite large – you can subset the data on ingestion. For instance, with this bit of code we can pull out just the locations that are categorized as “park.”

df_places_park = spark.read.format("parquet").load(s3_place_parquet).where(F.col("categories.main") =="park")

Or, if you have a bounding box for your region of interest, you can use ST_BboxIntersects to subset the data. For instance:

# set the coordinates for the bounding box around Seattle, WA 
aoi_bbox = (-122.442, 47.466, -122.222, 47.771) 

# read in the Overture data and select only the data inside the bounding box 
# note that we need to create a point geometry from binary 
df_places_aoi = spark.read.format("parquet").load(s3_place_parquet)\ 
    .where(ST.bbox_intersects(ST.srid(ST.point_from_binary("geometry"), 4326), *aoi_bbox))

You could add additional where clauses to further subset the data, for instance adding in the filter we used earlier to select the places where the category is “park.” By combining the two filters you would spatially restrict to your bounding box of interest, and then restrict to just the locations with your attribute(s) of interest.

Inspect the schema and create geometries

The data schemas and information about attributes are described on the Overture Maps Schema Themes page.

We can also look at the data directly in Spark to explore the structure. The Overture Places theme includes 37 attributes, many of which contain nested sets of values (e.g., common, official, alternate, and short versions of the place name, as well as language-specific variants). There are a lot of attributes available, however not every record has information provided for every attribute.

The full schema can be explored using:

df_places_park.printSchema()

And looking at a selection of attributes for one record looks like this:

Subset of attributes for one record in the Places theme

Note the "geometry" listed in the schema - it is in a binary form in the Parquet file. Before we can use it for spatial analysis with GeoAnalytics Engine we will need to convert it to a geometry type. While there is a general ST_GeomFromBinary function, if the geometry type is known, there are also specific functions for creating point (ST_PointFromBinary), line(ST_LineFromBinary), and polygon(ST_PolygonFromBinary) geometry. When we create the geometry from binary, we also need to explicitly define the coordinate system. We can do this using ST_SRID, or simply add it as an additional parameter with one of the Geometry from Binary functions. We can also explicitly set the geometry field using set_geometry_field so that it will be automatically recognized when we want to use it in analytics or plot the data.

This can all be done at the same time like this:

df_places_park = df_places_park\ 
    .withColumn("geometry", ST.srid(ST.point_from_binary("geometry"), 4326))\ 
    .st.set_geometry_field("geometry")

If you look at the schema again after converting the binary, it’ll now list as a “geometry,” or show the specific geometry type if you used ST_PointFromBinary, etc.

At this point, the data frame can be visualized or used in analytics workflows. We’ll walk through a few quick examples of using Overture data for analytics.

Using the data for analytics

Let's dig in and look at how we can incorporate the Overture data into our analytics!

We'll start with just the Overture Places dataset to explore some data in the Seattle area and look at how many parks are within 1,000 feet of a school. We might want to do this sort of analysis to identify any schools where the students are lacking in access to public greenspaces, or to find the nearby locations for running afterschool sports programs.

All of the data that we need for this analysis is in the Overture Places dataset, and we can use the Nearest Neighbors tools and ST_DWithin (within distance) function in GeoAnalytics Engine to find our answers.

If we didn’t do it when ingesting the data (as shown above), we can select out just the points that are inside a Seattle area bounding box using the ST_BboxIntersects function. Then, we will transform that geometry to a local coordinate system using ST_Transform. Using a local coordinate system will allow us to use planar calculations when we do our proximity analysis.

# set the coordinates for the bounding box around Seattle, WA 
seattle_extent = (-122.442, 47.466, -122.222, 47.771) 

# read in the Overture data and select only the data inside the bounding box 
df_places_seattle = spark.read.format("parquet").load(s3_place_parquet)\ 
    .where(ST.bbox_intersects(ST.srid(ST.point_from_binary("geometry"), 4326), *seattle_extent))\ 
    .withColumn("geometry", ST.transform("geometry", 2285)) 

# plot the results 
df_places_seattle.st.plot(**sea_zoom_style, marker_size=20, color="black", edgecolor="white")

When we subset the data we end up with about 33,000 points of interest; here is a look the distribution of all of the Overture Places data points and a zoomed in look at a portion of just the results near the University of Washington campus:

Overture Places theme; overview and zoom

Now that we have the initial subset of Seattle data, we need to subset further to create two new data frames for our analyses – one for parks and one for schools:

# Seattle schools from four different categories 
seattle_schools = df_places_seattle\ 
    .filter(F.col("categories.main").isin(["public_school", "elementary_school", "middle_school", "high_school"])) 

# Seattle parks 
seattle_parks = df_places_seattle\ 
    .where(F.col("categories.main") == "park")

We can look at the distribution to see the resulting data points for the schools (black) and parks (green). From the map, we can see that some schools are much closer to parks, and some farther away; some have many parks nearby, and some have few. We’ll do a bit of analysis to quantify these relationships.

Overture Places theme; Park and School locations in Seattle

First, let’s identify the distance to the nearest park to each school using the Nearest Neighbors tool to find the 1 closest neighbor. It’s as straightforward as setting the number of nearest neighbors to 1 and running the tool.

# nearest park to each school 
from geoanalytics.tools import NearestNeighbors 

parks_near_schools = NearestNeighbors()\ 
    .setNumNeighbors(1)\ 
    .setSearchDistance(1, "mile")\ 
    .setResultLayout("long")\ 
    .run(seattle_schools, seattle_parks) 

# Show the table with nearest park for each school 
# Note that the "names" column is a map, so we pull out the “common” name and take the first element in the list of options 

parks_near_schools.select("id",
     F.col("names.common")[0].getItem("value").alias("SchoolName"), 
     F.round(F.col("near_distance"), 0).alias("near_distance (meters)"), 
     F.col("names1.common")[0].getItem("value").alias("Park_name"))\ 
    .sort(F.col("SchoolName").asc(), F.col("near_distance").desc())\ 
    .show(10, truncate=False)

This gives us a result data frame where we can see the name of the nearest park as well as the distance for each of our schools:

Table of park locations near to schools

But, let’s go one step farther and identify the total number of parks within 1 mile of any school. In this case, we can simply use the ST_DWithin function to identify all parks near each school.

# find all parks within 1 mile of each school 
parks_1_mi_school = seattle_schools.join(seattle_parks, ST.dwithin("geometry", "geometry_park", 5280))

Then we could either look at the list of parks near each school:

parks_1_mi_school.select("id",
     F.col("names.common")[0].getItem("value").alias("SchoolName"), 
     F.col("park_names")[0].getItem("value").alias("ParkName"),  
     ST.distance("geometry", "park_geometry").alias("Distance"))\ 
     .sort(F.col("SchoolName"))\ 
     .show(10, truncate=False)

Table with list of parks near each school

Or quickly group by our school name to get a total count of parks near each school:

parks_1_mi_school\ 
     .groupby(F.col("names.common")[0].getItem("value").alias("SchoolName"))\ 
     .count()\ 
     .sort("count", ascending=False)\ 
     .show(10, truncate=False)

Table with count of parks near schools

We can also look at the results as a map – for instance, here is the result for all parks near to Adams Elementary School (ID = “67319476705”). The city parks are in green, and the parks within the 1-mile search area around the school are in white. We can see that this particular school is well situated with respect to public greenspace opportunities for the students to enjoy.

# plot the buffer around the school of interest 
plt_school = seattle_schools.filter(F.col("ID") == "67319476705").select(ST.buffer("geometry", 5280))\ 
    .st.plot(**seattle_style, edgecolor="black", alpha=0.3) 

# add all parks 
seattle_parks.st.plot(ax=plt_school, marker_size=20, color="green", edgecolor="white") 

# add the school of interest 
seattle_schools.filter(F.col("ID") == "67319476705")\
    .st.plot(ax=plt_school, marker_size=30, color="black", edgecolor="white") 

# add only the parks _near_ selected school 
parks_1_mi_school.filter(F.col("ID") == "67319476705")\ 
    .st.plot(ax=plt_school, geometry="geometry_park", marker_size=20, color="white", edgecolor="black")

Parks near one school

Visualizing the data

In addition to spatial analytic calculations, we can also use the Overture data for visualization. As a quick example, the Buildings layer (originally more than 1 billion polygons!!) can be narrowed down to just the buildings within a Seattle bounding box (~290k polygons).

df_buildings_seattle = df_buildings.filter(ST.bbox_intersects("geometry", *seattle_extent))

The result can then be written out to a spatial file that can be brought into ArcGIS Pro for 3D rendering using the Buildings dataset height attribute.

df_buildings_seattle\ 
    .coalesce(1)\ 
    .select("id", "updatetime", F.col("names.common")[0].getItem("value").alias("Name"), "level", "height", "numfloors", "class", "geometry")\ 
    .write.format("shapefile").save(f"{data_local}/buildings_seattle.shp")

3d rendering of building heights

The sky is the limit!

In this post we’ve looked at a few ways to bring Overture Maps data into Esri’s ArcGIS GeoAnalytics Engine for analytics and to fuel visualization in GeoAnalytics Engine and other Esri products. The open data provided by Overture Maps Foundation is a great way to add additional value to your analytics workflows and to supplement the existing datasets across your organization.

Let us know what you’re doing with Overture Maps data and how you’re adding new power to your analytic workflows with GeoAnalytics Engine!