Change Data Capture Using An ArcGIS Online Notebook

BruceHarold

In a previous blog I explored the performance of no-code change data capture versus view source swap in a speed test when the goal is maintaining a hosted feature layer - and declared it a tie! Now I have a new entrant from the coded solutions world - using an ArcGIS Online hosted notebook to calculate and apply the delta transaction for a hosted feature layer where the source data changes daily.

My subject matter data is the same as the previous blog, street address points for the city of Los Angeles, updated daily.

Los Angeles Address Points

There are a little over a million points in the dataset, with a few hundred changes daily: inserts, updates and deletes. I've been keen to write about an upsert use case (the combination of insert and update in a single transaction) for a while, as it is now supported with the Append geoprocessing tool in Pro and hence in ArcPy in the notebook advanced runtime.

I'm getting ahead of myself, so let's first set the scene. To consider a coded ETL workflow you need to be confident your data is well managed, with little or no need to make fixes or apply transformations during the ETL process - because while you can view data in a notebook it is very difficult to do deep data inspection and discover data problems. If you can't trust your data then you should be using ArcGIS Data Pipelines or ArcGIS Data Interoperability.

In this case the city is delivering well-curated data so I'm happy to recommend a coded lift-and-shift process.

Back to the tools used. In the blog download you'll find an ArcGIS Pro 3.6 toolbox with a model. The model creates a file geodatabase feature class named Addresses using CSV data it downloads from the city Open Data site. The feature class has a tuned schema (see the field map control in the Export Features tool) and also a primary key field House_Number_ID with a not-null constraint and an index, requirements for using upsert transactions.

Model Making Addresses Data

I published my feature service from ArcGIS Pro after applying some symbology and popup behavior and made sure that House_Number_ID has a unique index in the feature service. Now for the notebook that maintains the service.

I'll let you step through the notebook code (in the blog download), but the processing steps are:

Use DuckDB to read the source CSV file from the open data site download URL
While reading, enforce a schema and make geometry in a memory relation (table)
Using ArcPy, create a memory feature class from data in the DuckDB relation
Merge existing and incoming address data into another memory feature class
- This contains both old and new records
Use Find Identical to find identical sequences in the merged data across geometry and all fields
- The data is in Web Mercator so a tolerance of 1m allows for coordinate precision differences
Run Frequency to support finding rows that are unique
- Edit features do not have identical matches
Use set mathematics to determine the upsert and delete records
Run Append and Delete Rows functions with the correct records

If you inspect the notebook cell messages you'll see the whole job took 9 1/2 minutes (quite big data is read into the notebook and geoprocessed) but the actual write commits were small and took only a few seconds - pretty good in my book.

After setting up scheduled processing weekday mornings I now have a continuously maintained information product!

Please do comment in this post with any observations or questions.