Using a GeoTagger Processor with more than 1000 unique track identifiers

ChrisBeyett · ‎10-25-2016

When processing event records which include a large number of unique track identifiers you might notice that some spatial relationships evaluated by GeoEvent filters and processors do not behave as expected. We will focus on the “Enter Any” and “Exit Any” spatial operators as they apply to a GeoTagger Processor when more than 1000 unique TRACK_ID values are present. I will explain in detail the behavior reported to me by a few users and a potential product configuration you can make to better accommodate your data.

Consider, for example, a GeoTagger Processor configured to enrich event records with the name of a geofence. The processor will evaluate a set of geofences and add a new field with the name of a geofence to an event record whenever the event’s geometry enters or exits an area. As you observe the output from a GeoEvent Service however, you notice that events are being dropped at random from the processor. Events that you observed several minutes ago are being removed or are severely delayed in displaying at all within the GeoEvent Service and are not included in the output. While there may be other reasons for your observations, we’ll assume that the GeoTagger Processor is the root cause.

"Enter Any" and "Exit Any" Spatial Operators maintain state

The majority of spatial operators do not require GeoEvent to maintain state information. The “Enter Any” and “Exit Any” operators are the exception, and require GeoEvent to track both geometry and track identifier as events are moved forward through the processor. Maintaining state requires a prior knowledge of each event’s location; each event is treated as a dependent of its previously observed location. Without maintaining state however, the previous positions of the events remain unknown to GeoEvent and all events are treated independently. This means that a GeoTagger Processor configured with the “Enter Any” or “Exit Any” operation must utilize a cache of unique track identifiers for every observed event. In short, this is what is known as a cache‑aware processor node.

GeoEvent does have a maximum cache size enabled which is in place to maintain the state of all events as efficiently as possible; the default value for this property is 1000 events. When an event arrives at a cache-aware node and the event’s TRACK_ID is not contained in the cache, one of the previously observed TRACK_ID values must be discarded if the cache is currently full. This is done to make room for the newly observed event and respect the 1000 event limit.

What does this mean for your data? Conceptually it means that the processor will forget that it has ever encountered an event with the discarded TRACK_ID. When observing the behavior in real-time, events will spontaneously be dropped from the processor and certain events that you observed several minutes ago may not be displaying at all in the output destination.

Though the logic remains the same for both, the definitions used for “Enter Any” and “Exit Any” are in fact different. In order for GeoEvent to recognize that an event’s geometry has entered a geofence, its prior location must have been observed outside that geofence. Conversely, GeoEvent will not recognize that an event’s geometry has exited a geofence unless the geometry’s prior location was observed inside that geofence. These definitions are honored by default, but can produce different results if an additional property that GeoEvent uses to evaluate the spatial relationships is changed.

"First GeoEvent triggers Enter" and "First GeoEvent triggers Exit"

When a cache-aware node must make an “enter or exit” decision, it respects the settings of two properties within the GeoEvent Manager. These are the "First GeoEvent triggers Enter" and "First GeoEvent triggers Exit" properties, which determine the importance of the “enter” or “exit” operations. By default GeoEvent assumes that “entry” is more important than “exit” and that most event geometries are already outside of a geofence. The defaults for these properties are true and false respectively. You can change the values for each, but it is recommended that the change be made only with deliberate care. If not careful you could easily configure GeoEvent to start generating unwanted analysis and notifications, particularly after a restart of the server machine. For example, even if event geometries are expected to move around inside a geofence such as an administrative boundary, they will be located outside every other geofence that is registered with GeoEvent. If you change the "First GeoEvent triggers Exit" property from false to true you may unexpectedly get a significant number of “exit” evaluations every time the server machine is rebooted.

We’ll look at an example of a default configuration with GeoEvent using “entry” as the main importance. Let’s assume that a GeoTagger is set to an “Enter Any” operation and the event TRACK_ID is not in the cache. If the event’s geometry lies inside of a geofence, then the GeoTagger will read the event as having entered. Alternately, if this same GeoTagger is set to “Exit Any” and the event’s geometry is already outside of a geofence, the processor will determine that the point did not exit. Sound tricky? Just remember that GeoEvent assumes that geofences are empty 99% of the time, and that points are expected to enter at some future point. That doesn’t mean that exits are ignored, but they are placed with less importance and require more observations to determine.

Changing the maximum cache size for cache‑aware nodes

The default setting for the maximum cache size is exposed through a specific product configuration file. It is important to note that this cache size is a maximum for each cache-aware node, and it is not a system-wide limit. You must have administrative privilege to edit this file. Changing the default value can result in your GeoEvent Server consuming significantly more RAM. The Java process in which the GeoEvent Server runs, by default, is limited to 4GB of system RAM. If every cache-aware node begins caching event data for significantly more than 1000 unique track identifiers, a larger portion of the 4GB will be consumed leaving less room for other more basic functions.

To promote system stability it is recommended that you estimate of the total number of unique track identifiers expected from your event data and set the maximum cache size value slightly higher than the estimate (to allow for more features to be added over time). Keep the value as small as possible and do not specify an arbitrarily high maximum cache size.

The cache value is contained within the com.esri.ges.manager.servicemanager.cfg file located in the following directory on a default system:

“C:\Program Files\ArcGIS\Server\GeoEvent\etc”

Keep in mind that this location may change if GeoEvent Server was installed to directories other than the product’s default system folder.

Incident Detection and the 1000 event cache limit

Within GeoEvent, you could also observe that open incidents are being dropped from the output destination in a similar manner to how the GeoTagger discards events. The Incident Detector Processor is another cache-aware node in GeoEvent, and utilizes an incident cache particularly when evaluating conditions for concurrently open incidents. Much like the GeoTagger, the Incident Detector will discard open/ongoing incidents if its incident cache is full and a new event triggers an incident to open. The opening/closing of incidents is managed by the Incident Manager program, which runs in the background to maintain state. The Incident Manager has a 1000 incident cache limit enabled by default, but it is exposed through the GeoEvent Manager rather than a configuration file. This default size can also be changed, but the same recommendations as the GeoTagger Processor apply. Obtain an estimate first of how many unique features GeoEvent will be processing, and then determine how many probable incidents could be open at one time. Set the number of Open and Closed incidents to slightly higher than the calculated maximum.