Geoevent Processor Recovery Model

DarrylWilson · ‎11-12-2013

Hello,

I am wondering the best practice when recovering from a feed outage.

For example, imagine an external feed delivering data to the GEP every minute (an input). This external feed goes offline for 15 minutes. There is a gap in the data of 15 minutes, and the system needs to recover this data. Can the GEP handle this?

On the other hand imagine a GEP output, and is sending data to this output every minute. The network connection goes offline for an hour and the data can not be sent. What is the best way to get the two systems in sync again?

Many thanks

Darryl

RJSunderman · ‎11-13-2013

Hello Darryl -

When questioning the availability and reliability of systems or servers external to GeoEvent Processor, please keep in mind that GEP was developed as a processor of real-time event data.

Considering your first example, when a data provider goes off-line, recovery depends on the actions taken by the data provider when they come back on-line. GEP listens for and polls data providers in real-time, receiving and processing immediately whatever a provider sends or has published. GEP does not manage its inputs with a record of the date/time a particular input last received event data; it is not designed to observe a lapse and issue a request for data it thinks it might have missed.

If your GEP input were watching a system folder for files and a provider elected to send multiple files to account for a period of time the provider was offline, then GEP will receive the files and process their data as "new" events.

If your GEP input were binding to a TCP or UDP port listening for event data, then GEP will receive whatever data the provider sends to that network port. If the provider is offline and not broadcasting then there is nothing for GEP to receive.

If your GEP input were polling an external HTTP or RSS feed and the site/feed was unavailable when GEP conducted its poll, GEP simply receives no events for that polling interval. On the next polling interval it will poll and receive whatever data the site/feed has published once the provider comes back online.

Considering your second example, when a network outage prevents a data sink from receiving data sent by GEP, the only recovery would be for GEP to resend event data it previously sent -- but you would have to assume that the receiver would be able to identify and discard duplicate event it received from GEP since there is no way for GEP to know whether the data it previously sent was received successfully or for GEP to determine whether data it has sent should be resent. There is no acknowledgement returned to GEP by a data receiver. For example, suppose you were using GEP to push JSON to an external website, but the server hosting your IIS went down. The GEP output would attempt to POST the data over HTTP, but the requests containing the data from GEP will not be received by the external server and there is no way for GEP to know this.

I suppose you could design some redundancy into your GeoEvent Services by having GEP cache a copy of events it outputs, using the 'Publish GeoEvents on a REST endpoint' output. Then you could have a second GeoEvent Service poll this cache to re-ingest the cached data as new events and periodically re-send the data ... but there's no mechanism to determine whether resending the data was actually necessary, or whether the second time you sent the data was any more successful than the first time you sent it. Also, you'd have to manage the items in the GEP cache so that you didn't send a particular event more than 'N' times. Worse, in high volume cases, you would have to worry about GEP's cache maintenance discarding events as the cache fills up such that a particular set of events are not available to be resent.

Best Regards -
RJ

ColleenFarrell · ‎12-02-2013

If we used the future ActiveMQ input connector, wouldn't this allow for caching new events during an outage?

I currently have two geoevent processor services in production and would like to have the capability to caching of events for a short period of time. For example, when I have to add a column to one of the featureclasses used in the feature service, I have to stop the feature service in order to remove the locks so that I can add the column. I would like to stop the geoevent service and allow it to cache any inputs while I am working on the featureclass.

During one of my outages, I did write the events to a text file, but I found this to be awkward. Also, I wondered while restarting the service when adding the new text file output, did I miss any events?

Am I missing something? Is there a better workflow?

Thanks.

RJSunderman · ‎12-03-2013

I have a workflow in mind "? but it"?s not perfect. Perhaps if I share it with you, you or someone else can suggest an improvement.

I"?m assuming that what we are trying to address is a proposed maintenance cycle which requires a service admin to stop a feature service (for a period of time) which a GeoEvent Service is actively updating. I"?m further assuming that the maintenance will be to add attribute fields to the feature service"?s schema "? not to change data types of existing fields or remove existing fields.

What we want to do is actively cache events received by GeoEvent Processor so that when the admin brings the feature service back on-line it will automatically receive any events which came in during the (planned) outage.

If you were to incorporate a second Output connector into the GeoEvent Service which was updating features in the feature service, you could start/stop that Output independently, effectively turning caching "on"� / "off"� "� you could use any Output, but I"?ll stick with a CSV or JSON file Output for purposes of this discussion.

So the workflow might be that you start the "cache"� Output so that whatever event data is being sent to the Output updating the feature service gets copied to a system file. Then stop the feature service, which will result in event data being sent to the Output updating the feature service (e.g fs-out) failing to reach the target feature layer. But the event data is being cached in the system file "� so we"?re OK so far.

Once the planned maintenance to the feature service is complete, the admin can restart the feature service, which will resume feature service updates, and then stop the secondary "cache"� Output so that data is no longer being written to the system file. The admin would then copy the system file to a folder being watched by a different GeoEvent Service so that the "cached"� event data would get read into GEP and used to update the target feature service.

But we have a race condition. Depending on the rate at which feature data is being received, I can imagine a situation in which "live"� data output to the feature layer gets overwritten by older "cached"� event data from the system file "� once the system file is copied and subsequently read.

I"?m not sure that we"?ll be able to identify a solution which guarantees both (a) no data loss, and (b) also guarantees the most recent "live"� event data sent by a data provider is given priority over data from a "cache"�. Perhaps you or someone else can suggest a modification which would address this concern.

- RJ

AdamMollenkopf · ‎12-03-2013

If you have an an input on GeoEvent Processor that is consuming events off of a JMS Queue from ActiveMQ then it would be plausible to stop your GeoEvent input during your maintenance window and then restart it when your maintenance is complete. Your JMS Broker, in this case ActiveMQ, would accumulate events on the queue during the time the input is stopped and when the input is restarted it would consume all pending messages off of the queue.

Hope this information helps,
Adam Mollenkopf
ArcGIS GeoEvent Processor for Server - Product Lead