It is always satisfying to share powerful new ways to solve problems, especially so when the solution has been "hiding in plain sight" for a while. This time I'm showing how the combination of ArcGIS Enterprise branch versioning and cloud native data sharing delivers not only fast data access, to people without portal access, but the ability to ask the accessed data to travel back in time to when it was younger. Like these parcels, see a previously undivided parcel and now its three subdivisions.
Parcel subdivisionParcel subdivision
Picture a dataset with millions of features and under heavy daily maintenance, like branch versioning is built to handle, your customers can access all or any part of the default version for any moment in time. Forever. Without extra load on your Enterprise portal.
So, how did I get there? I simply noticed that the insert-only transaction model of branch versioning is a fit for incrementally creating GeoParquet files in cloud storage that jointly preserve data state over time and can be queried spatially and temporally to make local data on demand for your area and time of interest.
It is however a very fancy query! The good news though is you don't have to figure it out, the blog download has a notebook with examples for my parcel subject matter, just plug yours in.
I didn't have to invent the query approach, Esri publishes workshop materials on the topic. For example, if you go to around minute 18 in this presentation you'll see what such a query looks like.
I did have to make GeoParquet files that I can query, and a maintenance workflow for initial and incremental parquet file creation . It all starts with the source branch versioned Enterprise geodatabase feature class. Normally you can't see the system fields that power branch versioning of a feature class, but if you add the archive class to the map they are available:
Archive class added to the mapArchive class added to the map
A couple of things to note in the fields map: ObjectID is demoted to an ordinary long integer (values are not unique any more) and various fields named GDB_* are added. They power viewing the data at a moment in time, which is how branch versioning works - the latest state for a feature wins, which may be a deleted state, but the data history isn't lost (unless you drop it), which makes time travel possible.
The archive class is also good for discovery of what edit moments are in your data.
With the archive class providing visibility to all fields, the sharing and maintenance workflow was possible. It goes like this:
Now, I'm advertising this as cloud native data distribution, but at writing I'm still setting up my AWS account so the attached notebook is using a local filesystem path, I'll update that when I have a public S3 URL path available. In the meantime you can download sample data for testing here, here, here and here. They are the initial bulk version copy and a few incremental delta files, with a few days edits each. Change the notebook pqPath variable to suit your environment until I get the S3 path in place.
You'll see in the notebook I supply a template for extent and time travel queries. I find I can extract all 2.7 million parcels in my data in a little over 3 minutes, from local disk. Access from S3 I would expect to be a little slower, we'll see when I have that set up. Try out the notebook for yourself.
You might have some questions about the notebook, I'll see if I can anticipate a few:
Now I'm going to be a little selfish. To make my sample data and parquet files I built a few ETL tools (Pro 3.4), which I could have scripted. These tools are not in the blog download. If you are interested in them please message me and I can share. It will help the team here if we hear how many people are interested in this data sharing paradigm, so please help us to help you.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.