GeoEvent Server: Field Enricher (Feature Service)

EricIronside · ‎05-13-2019

I often get questions about the settings for the Feature Service Field Enricher. Specifically the cache options availble with that processor. Below is a discussion of the properties and how various settings may affect your processor.

Cache Operation

When an event record is received by the processor, the processor checks its cache to see if it has a feature record matching the event record’s TRACK_ID (or primary key). If so, and if that feature record is not old/expired, then the enrichment is performed using the cached value.

If no such feature record exists in the cache, or the existing cache item is old/expired, then the processor makes a focused query to obtain just that one feature record from the feature service. Each cached item maintains its own expiration time. An item in the cache is either:

Retrieved as is because an event needs the data and the cached item has not expired.
Refreshed via the feature service because an event needs the data but the cached item has expired.
Removed from the cache because the cache size has exceeded its limit and the cached item is expired.

In cases #1 & #2, the cached item is promoted to the top of the cache queue (regardless of expiration time or whether the item was fetched from the feature service or not). In case #3 above, the cache queue is pruned from the bottom (so records that haven't been used to enrich an event recently are removed first).

Cache Memory Management

From a memory management perspective the processor is loading feature records on-demand rather than batch-loading a bunch of records into memory “just in case they are needed”. It also avoids a nasty problem of trying to decide which records to load when a cache size is smaller than the total number of records in the feature service (for example, when the default cache size of 1,000 is used, but there are tens of thousands of feature records in the feature service). There are three downsides to this approach: initialization, short cache expiration times, and large enrichment pools. Initializing the cache can be expensive because, on startup, each event causes a call to the feature service. But once the cache is loaded, the processor will operate very quickly. Situations where the cache item expiration time is small/short (when the data being enriched changes often/quickly) the cache must reach out to the feature service as each item expires. If you find you are in this situation, you should factor this knowledge into your performance expectations (for example: on my test machine, a request to a feature service took on average around 100 ms). The final situation will occur when the cache size is too small. If you have a large set of data from which to enrich from, you should increase your cache size while monitoring your memory usage.

Disabling the Cache?

As mentioned above, I've seen situations where the enriched data needs to be read in from the feature service every time (regardless of the performance penalty). The underlying enriched data is changing just as frequently as the event data running through the GeoEvent Service. In this case, you cannot set the cache expiration to anything less than 1 minute (0.5 wont work, and 0 causes the cache to never expire). But you can set the cache size to 1. Assuming your events are all mixed up (e.g. the TRACK_ID is not the same for two successive events) then the cache will never hold the right value and the processor will have to fetch from the feature service every time.

Field Enricher Cache Notes:

Cache expiration time (in minutes) is:
- Stored and consumed as an Integer value.
  - So 0.5 (30 seconds) is not a valid value.
- It can be set to 0:
  - data never expires.
  - Once an item is read in, it will not be refreshed
  - To reset the cache, you can restart or re-publish your GeoEvent Service.
Each cache item (values, expire time) is maintained separately
- If a value is not found or found to have expired, it is queried for directly from the feature service ("where <uniqueID>=<value>")
- When a new value is retrieved it is assigned its own expire time (now()+x minutes)
The cache size can be any integer value > 0
- Setting the value to 0 will result in a default value of 1.
- Setting the cache size to something small (like 1) would force cache updates potentially faster than 1 minute
  - Assuming your stream of events don't have the same TRACK_ID, then each new event will have to be fetched from the feature service.
  - If your enrichment data changes very often, then this can be a valid strategy to use to force the enricher to get new data every time.
  - This will impact performance so you should test to be sure how much of an impact it will be in your case.
- When the cache size exceeds the max cache size, the least used records are pruned from the list.
  - The expiration time of a record has nothing to do with cache pruning.
  - Each time a record is used to enrich an event, it is promoted to the top of the queue.
  - Records that are not used to enrich an event fall to the bottom of the list.
  - The records at the bottom of the list are pruned first.