Cache-busting Auto-Increment fields - Observations from Technical Support

Leo_Lerner · ‎08-29-2024

Pre-population

Hello Survey123 users! Checking in from technical support here, with an issue that we’ve seen increasing volumes of tickets for. Many users want to produce convenient, unique, identifiers for their Survey123 records, and have them pre-populate a form. This post outlines some preliminary things to consider, outlines a commonly requested and increasingly fashionable workflow, and explains how to circumvent the tendency of the pulldata() function to draw on cached information.

For a more detailed and perfectly reasonable explanation of why not to do what follows, please see Ismael’s blog post from 2017 on this topic, linked here: Creating 'Ticket Numbers' in Survey123 for ArcGIS.

Unique IDs

The app builds in a field to each submission called Global ID, which is a universally unique identifier (UUID). These are hexadecimals with 32 digits, grouped into five sections separated by hyphens. They are great for universal uniqueness, but not so great for humans reading the forms.

Take a look at these UUIDs, and time how long it takes to tell the difference between them:

550e8400-e29b-41d4-a716-446655440000
550e8400-e29b-41d4-a716-446655440001
550e8400-e28b-41d4-a716-446655440001
550e8400-e29b-42d4-a716-446645440001
550o8400-e29b-41d4-a716-446655440001

Unless you are fluent in hexadecimal, probably more time than its worth. Fieldworkers might not have the capacity to pore over the string of letters and numbers to check whether they are filling out the right form.

ObjectIDs

So, we need something user friendly. How about sequential numbers. Much quicker.

Survey123 calculates these too, in the first field of the results, called ObjectID. You might recognize these from any other ArcGIS product. They’re pretty handy for a few reasons:

Unique – within a single dataset, with each ID corresponding to a specific record
Automatic generation – assigned by ArcGIS when a new feature is created
Data integrity – ensure that each record is uniquely identified

But. There is limited control. ObjectIDs are managed by the system, and users are not able to control their generation or format. This can be restrictive in cases where custom IDs or specific ID formats are required for data management.

They are also only created once the feature is input – you can’t de facto have them pre-populate a form within Survey123. And there is a very good reason for this, explained again by Ismael nearly seven years ago:

ObjectIDs only get created when the record has been submitted to the feature service. This is a problem when the life cycle of the event starts even before there is a chance to add a record into the feature service. Think for example that a city employee wants to use Survey123 to record an infraction of the city's code. The person receiving the ticket must know what the Ticket Number is even if it has not yet been officially recorded (submitted) into the City's database (feature service). If the city employee happens to be working on a remote area with no connectivity, there will be no chances of waiting for ArcGIS to give back a Ticket Number...

So. What if you know all this, are aware of the dangers of auto-incrementation, but are still dead set on a pre-populated number in your form which goes up by one with every survey?

It’s possible.

Pulldata(@layer)

Let’s use pulldata(@layer) to retrieve a value. In this case, the feature layer which you are collecting records for. All we need to do is get a count of the number of records already in the layer – how about just adding up all the ObjectIDs using an excel-friendly statistic definition? That looks like this:

pulldata("@layer", "getValue", "attributes.TotalCount", concat("**INSERT REST URL OF FEATURE LAYER HERE**", '?outStatistics=[{"statisticType": "count","onStatisticField": "objectId","outStatisticFieldName": "TotalCount"}]'))

What is that doing?

pulldata("@layer", "getValue", "attributes.TotalCount", ...):

This function is used to retrieve data from a specified layer.

The parameters used here are:

@layer: Specifies that the data is being pulled from the current layer.
"getValue": Indicates that the function is retrieving a specific value.
"attributes.TotalCount": Specifies the attribute field from which the value is being retrieved, in this case, "TotalCount".

2. concat("INSERT REST URL OF FEATURE LAYER HERE", ...):

This function is used to concatenate (or combine) multiple strings together.

The parameters used are:

**“INSERT REST URL OF FEATURE LAYER HERE**: Put your URL here…
'?outStatistics=[{"statisticType": "count","onStatisticField": "objectId","outStatisticFieldName": "TotalCount"}]': This is a query string that defines the statistics to be calculated on the feature layer. It specifies that a count of the object IDs should be calculated and returned as "TotalCount".

In short, the formula is pulling data from the current layer in Survey123. It concatenates the REST URL of the feature layer with a string that calculates the count of values in the ObjectID column.

Sounds good so far. With this though, you’re just going to get the number of surveys submitted up to this point. For the prepopulated answer, you need to add one. Just create a new row in the .xls form, and put ${*Name of your total count field*}+1.

The layout for this workflow in an .xls form

AutoID Form.png

How it looks in the form - the feature class has 22 records, and the second question is the "+1"

It’s probably best to hide the first field and set the second one to read only (which will get rid of the blue Xes). Your use case may vary, but usually the simplest option here is going to be best.

Cached values

This is the bare bones of the good-enough way to get auto-incrementation. But there is a prevailing issue, which was at one point registered as a defect; BUG-000158969. The bug was closed for expected behaviour but heralds an important clue in its title: Calculation does not recognize that a new feature is submitted unless the ArcGIS Survey123 mobile app is fully closed and re-opened.

The public explanation is even more elucidatory: The pulldata("@layer") function intentionally caches query responses to optimize network requests and more efficiently work with the data.

Golly! Pulldata() uses caching?! That’s so efficient!

This will mess up our auto-incrementation. Instead of checking the most current total count of ObjectIDs, it will just use the last time it cared to pull data. Multiple users were calling in to Esri technical support remarking that this workflow was resulting in duplicate values in their pre-populated question.

We need to bust this cache.

The fix is straightforward: append the query to make it non-duplicative. That way, pulldata() cannot rely on using the most recent pull, and won’t be able to rely on now outdated values. Rather, if we append the string to add a timestamp, it will have to go and look up the time, every time.

The extra parameter listed in the bug documentation is ("layerUrl?t=", now()). This makes sense – now() requires the string to collect the time… right now! We’re going to force the pull to check the time whenever a new survey is brought up, and whilst its there, check the latest ObjectID counts too. Appending it to the string we have so far looks like this:

pulldata("@layer", "getValue", "attributes.totalCount", concat('**INSERT FEATURE URL HERE**?outStatistics=[{"statisticType": "count","onStatisticField": "objectid","outStatisticFieldName": "totalCount"}]&t=',now()))

Cache busted query.png

The cachebusting query in an .xls form

We now have a non-duplicative query. Including the timestamp prevents the caching mechanism from serving the same results for the formerly identical query. This tiny crumb of variability, which makes each pulldata(@layer) unique, is perfect for ensuring that our query fetches fresh, non-duplicative information from the dataset.

In sum

Note that the information returned is not guaranteed to be non-duplicative. We’re working with a count still, rather than any unique IDs, or even ObjectIDs themselves. If you took this to a mobile device, and the device lost connection before sending in the form, or anything similar, you’d start getting duplicate values. The count would reset once those features had been added to the layer, as their ObjectIDs would be added to the totalCount eventually, but until then, their pre-populated values would all be the same.

This workflow does work in some fairly specific cases and is becoming increasingly popular. It has distinct limitations which affect how suitable it may be, and we are seeing increasing numbers of support tickets accordingly. This should only really be done when you are sure that the survey will be recorded and uploaded immediately – i.e., when you will definitely have network connection. It breaks down as soon as you introduce any asynchronous collection, as soon as things start piling up in the outbox, or as soon as there are multiple users submitting surveys. Definitely consider all of that before putting this workflow into practice.

If you’ve used something like this, or have any ideas/improvements to this workflow, please do share your thoughts in the comments here!