Polling feature services for "Incremental Updates" (Updated August 2023)

11741
10
08-20-2015 03:25 PM
RJSunderman
Esri Regular Contributor
4 10 11.7K

I've configured a 'Poll an ArcGIS Server for Features input to 'Get Incremental Updates'. Is there a way to prevent the input from polling all of the features in a feature class when GeoEvent Server is restarted?

(Updated August 2023) 

The short answer is: No.  When GeoEvent Server's services are restarted (or the server on which GeoEvent Server is running is rebooted), inputs which poll an ArcGIS Server map/feature service for feature records lose a key value (maintained in memory) used by the input to determine which feature records are new or recently edited. Following a system restart the input must retrieve a complete feature record set from the source map/feature service in order to iterate through the data records, find the greatest object identifier or date/time value, and cache this value for use when making the next query. This means that feature records ingest, adapted, and processed previously will be processed a second time.

This key value can also be lost simply by editing the input's configuration. Suppose that an existing Poll an ArcGIS Server for Features input were modified, or deleted and replaced with a new input. These actions may also cause the in-memory key used to poll for feature records incrementally to be lost resulting in feature records previously processed to be ingest, adapted, and processed a second time.

When is this likely to be a problem?

The capability to 'Get Incremental Updates' is unique to the Poll an ArcGIS Server for Features inbound connector. Do not confuse this input's incremental polling capability with the 'Receive New Data Only' parameter available when configuring a Poll an External Website for JSON input for example. The latter polls external server's web services (vs. an ArcGIS Server's map/feature services) and relies on the external web service to include a specific HTTP property (Last-Modified) in its response header. (More information on this is available in comments in the thread Re: Receive RSS Inbound Connector.)

The issue we are exploring here deals only with the Poll an ArcGIS Server for Features input, or potentially a custom input you develop using the GeoEvent Server SDK which uses the FeatureService transport to poll an ArcGIS Server map/feature service. The input's unique ability to poll only for newly added or recently updated feature records can be useful when you do not want to ingest, adapt, and process feature records which have been processed previously and you do not want to have to delete previously processed feature records from the data source. This capability is limited, however. The ability to poll incrementally is not resilient when it comes to service restart or server machine reboot.

Not all real-time event record processing solutions you configure will exhibit a problem if a server machine is rebooted or edits to an input obliterate a key value cached in memory. Solution architects must recognize, though, that when an input loses its cached key all available feature records from a feature service must be polled and processed a second time. This duplicative event record processing might not be a problem for a solution configured to update feature records using data from processed event records. Such a solution is simply going to update the target feature records with data already held in those feature record’s attributes.

A solution which sends e-mail notifications, on the other hand, is different. If a server machine were rebooted, and a GeoEvent Server input configured to poll only newly added (or recently updated) feature records, and the input were to ingest and adapt a feature service’s complete feature record set a second time, then re-processing some number of dozen (or hundred, or thousand) feature records would generate duplicate e-mail notifications for every data record that was re-processed. Sending duplicate e-mail notifications every time a server machine is rebooted is obviously not ideal.

Are system reboots the only time polling incrementally is likely to be a problem?

No. The Poll an ArcGIS Server for Features input is also vulnerable when using object identifiers to determine which feature records have been recently added. A solution configured to poll only newly added feature records based on a database feature record’s object identifier can fail when ArcGIS Server invokes a mitigation strategy intended to support concurrent editing.

In order to support multiple concurrent editors, ArcGIS Server assigns each editor a different block of object identifiers. One editor might create feature records with object identifiers in the range 1, 2, 3, ... 100. A second editor will be assigned a different range of identifiers allowing feature records to be created with object identifiers 401, 402, 403, (etc).  A third concurrent editor will be allowed to create feature records with object identifiers 801, 802, 803 to avoid race conditions where each editor asks what the next available OBJECTID is and proceeds to create a feature records with (potentially) an identifier being used by another editor concurrently.

If you configure a Poll an ArcGIS Server for Features input to poll incrementally based on OBJECTID, and the value ‘803’ from the third contributor is cached as the key to use when determining newly added feature records, it is possible that GeoEvent Server’s input will never poll feature records created by the first or second editors whose assigned object identifier range(s) are less than the cached key.

Recommended Approach

When a solution needs to be more resilient to system restart, or operate independently of feature record object identifier values the recommended approach, rather than configuring a Poll an ArcGIS Server for Features input to conduct incremental polling, is to write data into the feature records being polled which mark individual feature records as having been processed.

This way, regardless of if or when GeoEvent Server’s services are restarted, its server machine is rebooted, or edits are made to feature records by concurrent editors, the history of which feature records have been processed is stored in the geodatabase.

What you want to do, essentially, is add an attribute field to your map/feature service’s schema named something like hasBeenProcessed. Configure the feature service to write a default value ‘0’ into this field when new feature records are created, then as part of a GeoEvent Service configure your event record processing to overwrite the hasBeenProcessed attribute field’s value with a ‘1’ to mark it as a feature record you do not want a GeoEvent Server input to include in a future poll. You do this by changing the input’s Query Definition from its default 1=1 to hasBeenProcessed < 1. If you ever do find that you want to re-process a feature record, perhaps because some important attributes of the data record have been updated or changed, just make sure the feature editing workflow returns the hasBeenProcessed attribute back to its initial/default value ‘0’ and GeoEvent Server’s input will automatically include that feature record in its next poll.

- - -
If you have other approaches you have developed to deal with this particular behavior, your comments are welcome. Please also consider information and user comments in these other threads:


As always, I hope this information helps.
- RJ

10 Comments
KDeVogelaere
Occasional Contributor

I found this blog article helpful and would like to share how this can be used in other ways such as creating statistics to pass into the output.  Read more on Grouping and Summarizing GeoEvents

Best Regards,

K

KDeVogelaere
Occasional Contributor

Has the Geoevent team considered adding a raw table input type?  While working through this exercise it became apparent the notifications would be stored in a raw table format not in a feature class table. Could a raw table inbound/outbound transport be used for future planned GeoEvent operations (Eg. Caching, Statistics, or storing Metadata)?

Best Regards,

K

TimHensley
New Contributor II

For several weeks I’ve been experimenting with the Citizen Service Requests (CSR) template and GeoEvent Processor (GEP) to send email and sms notifications whenever someone adds, updates or comments on an incident.

To get this to work without re-sending ALL notifications whenever the GeoEvent Processor is rebooted, I added a FLAG integer field to a CSR Hosted Feature Service.

Then I slightly modified the Citizen Service Request Template to add -1 to the FLAG field whenever a new incident is added.

I have a GEP Service Polling for new incidences.  Once found the feature is immediately evaluated for STATUS = Unassigned and FLAG = -1.  The -1 is then recalculated to Zero (0) using Update a Feature and Notifications are sent out.

I have two other GEP Services, one for Assigned and one for Closed, Polling for TimeStamp.

If STATUS = Assigned and FLAG != 1, FLAG is recalculated to 1 using Update a Feature and appropriate Notifications are sent out.

If STATUS = Closed and FLAG != 2, FLAG is recalculated to 2 using Update a Feature and appropriate Notifications are sent out.

When GEP is restarted NO Incidences match these initial conditions, and NO Notifications are sent out.

RJSunderman
Esri Regular Contributor

Hey K -

What I think you're asking for is a ODBC / JDBC connector capable of retrieving data directly from an RDBMS table. No, there are no plans to offer such a connector. Connectors such as these would be database specific and we are working to keep GeoEvent database agnostic.

The GeoEvent existing inbound connector which polls an Esri feature/map service to retrieve data from a feature layer uses the feature service as an interface to the database. There are a couple of alternatives described in the threads cross-linked below.

- RJ

See Also:

Direct connection to RDBMS

GeoEvent process  Oracle/SQL connector​

BrianLocke
Occasional Contributor II

So we have been using Poll an ArcGIS Server for existing features.  The main service has 1.2 Million records and we finding that once we process around the 250 K to 300K things start really bogging down.  Can see that the Java.exe running around 5Gbs.  We still poll the newest and latest features just trying to get the historical has proven difficult.  Question is does this Feature service just have too many records?

Another question would be we are doing things this way so we can circumvent their 1000 record limit.  If we could put a dynamic time in the query we would be able to use the Poll an External Website for JSON

KDeVogelaere
Occasional Contributor

Brian, do you need all features to display on your map?  We have found by summarizing the data into smaller datasets and publishing a new FeatureService to ArcGIS Server this greatly helps to convey the info we are interested in.  We have also set zoom limits for viewing the detailed data points, so we do not query the full (in our case 65 million) features all at once.  Adding filters as you suggest, is a good way to limit the number of returned features. Hope this helps.

-K

MarcGraham1
Occasional Contributor

HI RJ Sunderman‌ and GeoEvent Team I am experiencing some very odd behaviour with GeoEvent Extension 10.4.1 Patch 1.  I have polled a rest service for json and stored it in a hosted feature service in Portal.  I have 1168 features.  Once I have polled the service once, I turn off the input, service and output so no more data is flowing through.  That dataset is now to all intents and purposes static.  I have then configured another input that polls the first hosted feature service for features.  This is set to incremental updates using the updated_time field of the feature service it is polling, and it runs every 10 seconds.  After I start this input, it polls all the features, as you would expect, so it can build it's cache of datetimes to look for new features.  However, from this point on every 10 seconds it collects exactly 14 features.  Even though this data is static and unchanging, it keeps grabbing 14 features, and for the life of me I don't know why.  Any ideas?

TimHensley
New Contributor II

I have slightly modified this procedure so that the initial FLAG field value defaults to -1.  Consequently, there is no need to modify the Citizen Service Request Template to add -1 to the FLAG field, as described above.

RobertMarros
Occasional Contributor II

RJ,

Has there been any advancement in developing a solution/fix within GeoEvent to resolve this specific issue or do we still need to employ the "work around" options listed in the posts above?

Thanks

Sorry, no, there have not been any changes made to the Get Incremental Updates capability for the Poll an ArcGIS Server for Features input. The input still relies on a key (either a date or objectid) that it maintains in memory and uses when querying the feature service for a set of feature records. So if GeoEvent Server is restarted the key is lost, the input polls, receives all feature records and establishes a new key value from the feature set it receives.

There hasn't been a lot of interest in changing this behavior until just recently. Several customers have contacted the product team asking if the input can at least persist its key once a minute to disk or the zookeeper configuration store and then - if/when the input finds that it has lost its key - retrieve the last-known value rather than polling all feature records and deciding on a new key. We're re-evaluating whether we can implement this behavior for the 10.7 release.

In the meantime, please see my comments to Dale, below, logged in the Re: Question regarding "Incremental Update" workarounds, custom components?‌ thread.

- RJ

daleward
New Contributor II

If I'm understanding the workarounds correctly - I don't think they'll work for us. We're not handling streams of real-time data, but are trying to respond to changes in the data sources - "Somebody edited a field, which changed data the contents of this FeatureData Service". We are potentially looking at millions of records - even retrieving these from the feature data services so that we can apply the workarounds will take a large amount of time.

As a way around this, so that GeoEvents Service remembers the 'last polled' datetime between service restarts, would you recommend that we write custom component that serializes the datetime?

If so - which component? A custom transport component?

Thank you.

Reply posted to Question regarding "Incremental Update" workarounds, custom components?