Change Detect or View Source Swap? The Winner Is....

BruceHarold

The answer is both for me, but you be the judge for your situation! Read on for the decision criteria...

If you want to maintain a large hosted feature service from external data it is best practice to avoid complete overwrites at each refresh, for two reasons:

Large write transactions can be fragile
Large write transactions can have significant service downtime

To avoid both issues it is preferable to implement a CDC (change data capture) approach and write only deltas to the target feature service. This blog will describe two ways to do this:

Writing deltas directly to the target feature service
Maintain a hosted feature layer view pointing it alternately at two services
- Write delta transactions to the service not currently the source then swap it to be the view source

In the usual situation where a period delta is a small fraction of the data, a direct delta write might take several seconds, while for a view source swap the downtime can be milliseconds, but has twice the storage cost. We'll do a worked example so you can choose between the approaches, but either way you're the winner using CDC!

Here is my subject matter data, about a million street address points in Los Angeles, California, maintained daily:

Los Angeles Address Points

The job is to calculate and apply the daily delta transaction (typically the low hundreds of features) with low downtime, and while our candidate write modes (direct, view source swap) insulate the job's service downtime from the calculation time of the delta, it's always good to build in any optimizations you can. The city's open data site supports CSV download, and CSV is a performant format in spatial ETL tools, so that is half the delta calculation step. The other half is reading the current state of the feature service/view.

Here is my optimization for feature service reading, in LAChangeDetection.fmw (in the blog download):

Direct Write After Change Detection

While the Esri ArcGIS Connector package supplies a feature service reader, in the quest for speed I implemented reading the target service using multiple concurrent Query calls with HTTP. I found that the default maximum record count per call (2000) in 4 concurrent requests gave optimal performance, roughly double the packaged reader's rate. The ChangeDetector transformer calculates the delta in seconds once it has the data, then writing the delta takes 3-4 seconds for a typical daily changeset (if you inspect the workspace you'll see I instrumented it with Emailer transformers to call home with some timestamp information).

For people not satisfied with a few seconds service downtime, implementing view source swap is only slightly more challenging, see LAViewSourceSwap.fmw in the blog download:

View Source Swapping

You'll see logic in the workspace to toggle between "A" and "B" services for reading, writing and source swapping. For this reason changes are detected a little differently; the same public URL accessing the address data as CSV is read, but the delta is calculated versus the hosted feature layer that is not the current source for the hosted feature layer view, and the delta is applied to that feature layer.

Then the updated feature layer must be swapped into being the source for the feature layer view. How?

The answer requires some detective work, inspecting how ArcGIS natively handles view source swap in item settings:

View Source Swap

What you're looking at above is me manually doing a source swap but with the browser developer tools active, filtered to record POST transactions in big request row view. As I clicked through the view source swap I could see the system uses two calls, deleteFromDefinition and addToDefinition. Even better, if I inspect any POST call I can see the JSON payload used in it - which is lucky because the REST API documentation is a bit challenging for a no-code person like me 😉.

The deleteFromDefinition payload is trivial, but the addToDefinition JSON payload is huge. However, as I made my services with default settings I'm not looking to change, I cut the JSON down to objects I thought worth keeping, and of course the required pointer to the desired source. Here is the JSON:

{
"layers": [
{
    "currentVersion": 11.5,
    "id": 0,
    "name": "LosAngelesAddresses",
    "type": "Feature Layer",
    "cacheMaxAge": 30,
    "displayField": "Street_Name",
    "description": "",
    "copyrightText": "",
    "defaultVisibility": true,
    "adminLayerInfo": {
            "viewLayerDefinition": {
                    "sourceServiceName": "@Value(_nextSourceName)",
                    "sourceLayerId": 0,
                    "sourceLayerFields": "*"
            }
    },
    "geometryType": "esriGeometryPoint",
    "objectIdField": "OBJECTID",
    "uniqueIdField": {
            "name": "OBJECTID",
            "isSystemMaintained": true
    },
    "useStandardizedQueries": true,
    "minScale": 0,
    "maxScale": 0,
    "extent": {
            "xmin": -13210040.1828,
            "ymin": 3989386.3054,
            "xmax": -13153020.1132,
            "ymax": 4073637.6182,
            "spatialReference": {
                    "wkid": 102100,
                    "latestWkid": 3857
            }
    },
    "spatialReference": {
            "wkid": 102100,
            "latestWkid": 3857
    },
    "globalIdField": "",
    "maxRecordCount": 2000,
    "standardMaxRecordCount": 32000,
    "standardMaxRecordCountNoGeometry": 32000,
    "tileMaxRecordCount": 8000,
    "maxRecordCountFactor": 1,
    "capabilities": "Query"
}
]
}

In production I could edit the JSON to tweak things if desired, like extent or display field, but it's probably a better investment to get your layer design right before the fact.

One key thing I learned about the payload is at line 15 where I inject a feature attribute into the JSON at runtime, sourceServiceName is a property that keys the service being swapped in, there is no reference to its item ID or its service URL. In my case the source service name toggles between "LosAngelesAddressesA" and "LosAngelesAddressesB" in consecutive runs. If any delta transaction contains no edits then no service swap occurs.

So now we have squeezed as much downtime out of a feature service update as we can, it's your call if the average period's delta transaction is big enough (many thousands of features?) to justify the extra storage cost of view source swap and guaranteed minimum downtime.

While I'm focusing here on downtime minimization, not run time for the whole job, if anyone is curious it's taking 3-5 minutes to refresh the million points I'm dealing with. I'm guessing the variability is coming from server load conditions where the data is coming from and going to.

Acknowledgements: I was inspired to write this post by my Esri colleague @sashal who first explored this workflow, and to whom I'm grateful, and some prior art in a related workflow where file geodatabases are republished, see the first presentation here.