Change is certain - unless it's fake!

BruceHarold · ‎09-01-2022

My first flight since the pandemic hit was a trip to Vancouver, Canada last week, to attend our partner Safe Software's 2022 FME User Conference, where the no-code integration & ETL community could connect and learn. I took a question at the event that prompted this blog - why does my change detection workspace think every feature has changed when I know only a few features really changed?

In case you're new to this, ArcGIS Data Interoperability inherits a very powerful function from FME, namely change detection, embodied in the ChangeDetector transformer, or less frequently, the Matcher transformer. Change detection is usually done between a source dataset and a derived dataset that is maintained as a mirror on a schedule, and the ability to write only data changes (inserts, updates, deletes) without downtime is a great advantage, especially for targets like feature services, which can remain in use during maintenance.

This blog is about avoiding surprises when performing change detection.

Back to my story, I had a ticketing problem that caused a delay at the gate for my northbound flight; the airline offered me a seat upgrade, so I chose a port side window seat to ~~enjoy the view~~ do some ground truthing on some data I was working on, namely fire locations in California. My flight was in daytime but here is a dark theme map of fire locations during my trip.

Fires in California

There are a few places to get near real time fire data, the source isn't relevant to the topic and what I relate isn't tied to any data format. I happen to be reading JSON and maintaining a hosted feature service. Let's tackle the #1 cause of fake change detection - geometry storage and retrieval.

First I will illustrate the issue. The blog attachment contains a CSV file with a single row, add the CSV table to a map then use the table context menu Display XY Data tool to create a layer from the XY fields, using WGS84 coordinate system. Throw a little Python at the resulting layer and you will see what happens to the XY coordinates used to create the layer:

with arcpy.da.SearchCursor('Fires_XYTableToPoint',"*") as cursor:
    for row in cursor:
        pass
row
(1, (-120.70132999999998, 39.331022000000075), 1169, -120.70133, 39.331022)

So, what went in as -120.70133, 39.331022 was stored as -120.70132999999998, 39.331022000000075. The data isn't being randomly shifted, it's just an artefact of how ArcGIS manages coordinates when it has to make some assumptions, and it has no practical implications except for our change detection case, because unless you say otherwise, coordinate changes will be detected with total strictness.

I'm maintaining a mirror of the source fire event JSON feed as a feature service and I have a Spatial ETL workspace built to do it. I'm reading the JSON on a schedule and writing only changes to the target feature service, the process uses a ChangeDetector. Below is a screenshot of my ChangeDetector properties, take a note of the geometry handling. I am checking for 2D differences, my data are points so lenient point order isn't relevant, I am not checking coordinate system names (note identical coordinate systems on inputs using different names will cause "fake change") but most importantly I use a vector geometry tolerance of 0.000001. If I used the default of zero any difference right down to subatomic particle size will be detected as a change, and we have seen above how that kind of number change can be introduced.

Change Detector

How did I arrive at a vector tolerance of 0.000001? My "personal defaults" for what are real geometry changes are at 3 decimal places for projected data and 6 decimal places for geographic data. You may work to tighter tolerances but I have never seen such precision in production use. My data are in WGS84 geographic coordinates so I use 6 decimal place tolerance. If you prefer to make this aspect more visible in a workspace you can use a CoordinateRounder in each input data stream with the same tolerances and use a vector tolerance of 0 in the ChangeDetector.

So that lets you avoid the #1 change detection pitfall, fake geometry "changes". Are there similar precision-based ones out there? Yes! You'll see in my ChangeDetector I have set a name for the Generate Detailed Changes list parameter. This causes the transformer to output a list that exposes the actual data values driving change. I have a field "date_created" in my data and I notice every feature seems to change, but only at the microsecond level - definitely fake change. Here is a log excerpt:

Attribute(string: UTF-8) : `delta{1}.action' has value `modified'
Attribute(string: UTF-8) : `delta{1}.attributeName' has value `date_created'
Attribute(string: UTF-8) : `delta{1}.originalValue' has value `20220816155935.732000'
Attribute(string: UTF-8) : `delta{1}.revisedValue' has value `20220816155935.732935'

I have no idea why this timestamp field is moving around, and I cannot know, you'll find yourself in this situation occasionally too, so just figure out ways to work around it.

The simplest approach is to drop the fractional seconds from each date_created value. It may be tempting to round the value but datetimes are tricky things, they are not always valid if rounded up (think 59.999999 seconds rounding up to 60 seconds, which should be 0 seconds in the next minute).

So I chop off the fractional seconds with a StringReplacer defined as below in both input streams to my ChangeDetector:

$Remove fractional seconds$ Remove fractional seconds

Sorry to throw regex at you but sometimes it's necessary.

So that was another precision-based issue we avoided!

There are other things to remember when performing change detection; the detailed change list is your friend, it will help you track down what is really changing. Be sparing with the fields you use to detect changes too, accidentally leaving in format attributes for example will cause bulk change to be found when in fact the changes aren't really in your data.

Once you have mastered change detection you'll be able to author very efficient workspaces, for example here is mine finding real changes in the fire data and writing only the updates. You may spot that all features changed in this run (29 out of 29) which is a hint I may have more work to do...

Fire feature service maintenance with change detection

One more key point. The ChangeDetector transformer outputs a format attribute fme_db_operation with values of INSERT, UPDATE or DELETE; when writing to feature services or databases this row-level attribute can be used by the writer to determine feature handling, set this behavior by setting the Feature Operation property in the writer - you don't have to have multiple writers for each mode.

Using fme_db_operation

Don't put up with fake change!