Polling feature services for "Incremental Updates"

5744
10
08-20-2015 03:25 PM
Esri Regular Contributor
4 10 5,744

Another recurring question:

I've configured a 'Poll an ArcGIS Server for Features' input to 'Get Incremental Updates', is there a way to prevent the input from polling all of the features in a feature class when GeoEvent Server is restarted?

The short answer is: No.  When GeoEvent Server is restarted (or the server on which it is running is rebooted), inputs which use the out-of-the-box FeatureService transport to poll an ArcGIS Server map service (or feature service) lose whatever value they've cached which enables them to query for features which are "new" relative to the last poll conducted by the input.

The capability to 'Get Incremental Updates' is unique to the 'Poll an ArcGIS Server for Features' input connector and should not be confused with the 'Receive New Data Only' parameter, exposed by the HTTP transport, which requires event data include the HTTP "Last-Modified" header. (Refer to comments in the thread Re: Receive RSS Inbound Connector.)

The issue we're exploring here deals only with the 'Poll an ArcGIS Server for Features' input connector -- or a custom input you may have configured which uses the FeatureService transport to poll an ArcGIS Server map / feature service.

An input configured to poll a map / feature service and retrieve only incremental feature updates maintains an in-memory cache. The value in this cache depends on whether your 'Method to Identify Incremental Updates' is ObjectID or Timestamp. In either case the input incorporates the largest value observed from its last poll into a WHERE clause so that only features whose OID (or date/time) is greater than the greatest value (from the last query) are returned (by the next query).

If you stop the input the cache is maintained, so that when the input is restarted it will be able to poll for features whose specified attribute value is greater than the value in the cache. If you stop the ArcGIS GeoEvent Server Windows service, or reboot the server, the cache is destroyed and the input has no way of knowing which features were polled previously. The next poll conducted by the input will retrieve all of the items in the map / feature service.

This becomes painfully obvious when one of the notification outputs (e.g. 'Send a Text Message' or 'Send an Email') are included in a GeoEvent Service which polls a map / feature service for event data. When the GIS Server is rebooted an e-mail recipient can potentially receive hundreds of messages if event data polled by the input satisfy filtering and/or processing criteria designed into a GeoEvent Service.

The motivation behind this behavior is that a cache persisted within a system file on disk could be difficult to find, might only be editable by a user with administrative credentials, and unnecessarily involves file I/O in a potentially high-volume event processing scenario. Locating and deleting a system file-based cache was deemed more burdensome than requiring that GeoEvent Server outputs be stopped in order to prevent unwanted notifications from being sent. Basically, this behavior is by design.

As a best practice, if you find you are frequently restarting GeoEvent Server (or having to reboot your server), make sure to stop all notification outputs, or any outputs you do not want to process events based on "old" features, when GeoEvent Server is restarted.

You can also employ a strategy of writing notification messages to a secondary feature layer, rather than directly to a notification output. The secondary feature layer acts as a notification message cache. You could then design a second GeoEvent Service (or extend your original GeoEvent Service) to poll this "notification message cache" and as event messages are sent to a notification output, use an 'Update a Feature' output to flag the notification message as having been sent. This will enable a filter to discard messages which have been sent and avoid sending repeat notifications.

If you have other approaches you have developed to deal with this particular behavior, your comments are welcome.

As always, I hope this information helps.

- RJ

10 Comments
Occasional Contributor

I found this blog article helpful and would like to share how this can be used in other ways such as creating statistics to pass into the output.  Read more on Grouping and Summarizing GeoEvents

Best Regards,

K

Occasional Contributor

Has the Geoevent team considered adding a raw table input type?  While working through this exercise it became apparent the notifications would be stored in a raw table format not in a feature class table. Could a raw table inbound/outbound transport be used for future planned GeoEvent operations (Eg. Caching, Statistics, or storing Metadata)?

Best Regards,

K

New Contributor II

For several weeks I’ve been experimenting with the Citizen Service Requests (CSR) template and GeoEvent Processor (GEP) to send email and sms notifications whenever someone adds, updates or comments on an incident.

To get this to work without re-sending ALL notifications whenever the GeoEvent Processor is rebooted, I added a FLAG integer field to a CSR Hosted Feature Service.

Then I slightly modified the Citizen Service Request Template to add -1 to the FLAG field whenever a new incident is added.

I have a GEP Service Polling for new incidences.  Once found the feature is immediately evaluated for STATUS = Unassigned and FLAG = -1.  The -1 is then recalculated to Zero (0) using Update a Feature and Notifications are sent out.

I have two other GEP Services, one for Assigned and one for Closed, Polling for TimeStamp.

If STATUS = Assigned and FLAG != 1, FLAG is recalculated to 1 using Update a Feature and appropriate Notifications are sent out.

If STATUS = Closed and FLAG != 2, FLAG is recalculated to 2 using Update a Feature and appropriate Notifications are sent out.

When GEP is restarted NO Incidences match these initial conditions, and NO Notifications are sent out.

Esri Regular Contributor

Hey K -

What I think you're asking for is a ODBC / JDBC connector capable of retrieving data directly from an RDBMS table. No, there are no plans to offer such a connector. Connectors such as these would be database specific and we are working to keep GeoEvent database agnostic.

The GeoEvent existing inbound connector which polls an Esri feature/map service to retrieve data from a feature layer uses the feature service as an interface to the database. There are a couple of alternatives described in the threads cross-linked below.

- RJ

See Also:

Direct connection to RDBMS

GeoEvent process  Oracle/SQL connector​

Occasional Contributor II

So we have been using Poll an ArcGIS Server for existing features.  The main service has 1.2 Million records and we finding that once we process around the 250 K to 300K things start really bogging down.  Can see that the Java.exe running around 5Gbs.  We still poll the newest and latest features just trying to get the historical has proven difficult.  Question is does this Feature service just have too many records?

Another question would be we are doing things this way so we can circumvent their 1000 record limit.  If we could put a dynamic time in the query we would be able to use the Poll an External Website for JSON

Occasional Contributor

Brian, do you need all features to display on your map?  We have found by summarizing the data into smaller datasets and publishing a new FeatureService to ArcGIS Server this greatly helps to convey the info we are interested in.  We have also set zoom limits for viewing the detailed data points, so we do not query the full (in our case 65 million) features all at once.  Adding filters as you suggest, is a good way to limit the number of returned features. Hope this helps.

-K

Occasional Contributor

HI RJ Sunderman‌ and GeoEvent Team I am experiencing some very odd behaviour with GeoEvent Extension 10.4.1 Patch 1.  I have polled a rest service for json and stored it in a hosted feature service in Portal.  I have 1168 features.  Once I have polled the service once, I turn off the input, service and output so no more data is flowing through.  That dataset is now to all intents and purposes static.  I have then configured another input that polls the first hosted feature service for features.  This is set to incremental updates using the updated_time field of the feature service it is polling, and it runs every 10 seconds.  After I start this input, it polls all the features, as you would expect, so it can build it's cache of datetimes to look for new features.  However, from this point on every 10 seconds it collects exactly 14 features.  Even though this data is static and unchanging, it keeps grabbing 14 features, and for the life of me I don't know why.  Any ideas?

New Contributor II

I have slightly modified this procedure so that the initial FLAG field value defaults to -1.  Consequently, there is no need to modify the Citizen Service Request Template to add -1 to the FLAG field, as described above.

Occasional Contributor II

RJ,

Has there been any advancement in developing a solution/fix within GeoEvent to resolve this specific issue or do we still need to employ the "work around" options listed in the posts above?

Thanks

Sorry, no, there have not been any changes made to the Get Incremental Updates capability for the Poll an ArcGIS Server for Features input. The input still relies on a key (either a date or objectid) that it maintains in memory and uses when querying the feature service for a set of feature records. So if GeoEvent Server is restarted the key is lost, the input polls, receives all feature records and establishes a new key value from the feature set it receives.

There hasn't been a lot of interest in changing this behavior until just recently. Several customers have contacted the product team asking if the input can at least persist its key once a minute to disk or the zookeeper configuration store and then - if/when the input finds that it has lost its key - retrieve the last-known value rather than polling all feature records and deciding on a new key. We're re-evaluating whether we can implement this behavior for the 10.7 release.

In the meantime, please see my comments to Dale, below, logged in the Re: Question regarding "Incremental Update" workarounds, custom components?‌ thread.

- RJ

New Contributor II

If I'm understanding the workarounds correctly - I don't think they'll work for us. We're not handling streams of real-time data, but are trying to respond to changes in the data sources - "Somebody edited a field, which changed data the contents of this FeatureData Service". We are potentially looking at millions of records - even retrieving these from the feature data services so that we can apply the workarounds will take a large amount of time.

As a way around this, so that GeoEvents Service remembers the 'last polled' datetime between service restarts, would you recommend that we write custom component that serializes the datetime?

If so - which component? A custom transport component?

Thank you.

Reply posted to Question regarding "Incremental Update" workarounds, custom components?