GeoEvent and High Availibility

4388
24
07-25-2017 03:47 PM
AzinSharaf
Occasional Contributor

This would be our Portal and ArcGIS Server high availability deployment.

We want to add a HA GeoEvent Server to this architecture. What would be the best solution based on the following limitations/recommendations?

1- Don't use multi-machine sites with GeoEvent

2- Don't install GeoEvent in your base GIS site

If we create two independent GeoEvent site, we should sync them manually? Any better solution?

24 Replies
RahulMetangale2
New Contributor II

Hi RJ Sunderman 

I went through this entire post, thank you for sharing detailed information!

This is regarding ArcGIS Geoevent Server 10.7.x deployment. Could you please review below diagram and comments and let me know if my understanding is correct.

Below points are based on Geoevent servers multi machine single site deployment.

  1. When external source is sending information to Geoevent server over REST endpoint, my assumption is when one of the geoevent server goes down, other instances will not be able to get the data. We will have to bring external load balancer to address this issue. Load balancer will be responsible for forwarding that request to individual geovent/ArcGIS Server instead of ArcGIS Server site
  2. When Geoevent server is fetching the information from external messaging brokers like ESB, RabbitMQ etc, we don't need load balancer as geoevent server/ArcGIS Server will be able to start the adapter on available instance.
  3. In this deployment (multi instance, single site) although stream service will run on jvm of geoevent server used to publish the stream service. When same geoevent server instance goes down, stream service broadcasts are forwarded to other available GeoEvent Server instances in an ArcGIS Server site. 
  4. Mapping client should consume stream service through reverse proxy so that it does not have to worry about specific geoevent server going down. 

Thank you,

Rahul 

0 Kudos
RJSunderman
Esri Regular Contributor

Hello Rahul -

My apologies for the delayed reply.  Generally, yes, the understanding expressed in the points you call out is correct.

>> In a data push scenario, when an external web server, web service, or data provider sends data to GeoEvent Server, the data is sent to a specific resource endpoint via a fully-qualified hostname and port. If this resource is one of three GeoEvent Server instances you have configured – that represents a single point of failure. You will need a load balancer which is smart enough to continue sending data to a specific endpoint as long as that endpoint is available and responding HTTP/200 when data is received. The load balancer will be the solution component responsible for redirection to a different resource / endpoint when the primary receiver is not responding.

>> In a data pull scenario, when a GeoEvent Server inbound connector is polling an external web server or web service, GeoEvent Server (in a "site" deployment) should detect when a machine has left the site's configuration and allow another GeoEvent Server instance to adopt and begin running the inbound connector's polls. This resilience, allowing event record ingest to fail-over to another instance, is one advantage to the "site" approach. You will want to administratively monitor and determine why an instance has failed or left the ArcGIS Server site configuration and confirm that another running instance of GeoEvent Server has adopted the running input. This is one of those "trust but verify" scenarios.

>> Yes, each instance of GeoEvent Server runs within its own JVM, and the web socket used to broadcast data for a stream service is run from within the GeoEvent Server's JVM container (not by ArcGIS Server as a SOC process). The stream service outbound connector implements a fan-out strategy which uses an internal message bus to forward copies of processed event records to other stream service web sockets so that client applications can subscribe to any GeoEvent Server's web socket and get all of the event records regardless of which instance(s) actually processed the event data. You will need to monitor the event record velocity / volume of each subscribing client. Too many clients subscribing to any single web socket instance will reduce data throughput to all subscribing clients.

>> Implementing some sort of reverse proxy to allow web mapping applications and clients to subscribe to stream service web sockets without knowing the specific resource they are connecting to is one way for you to take control of client subscription distribution. The ArcGIS Server stream service, as I understand it, directs client subscription requests to an available server within the site using a round-robin mechanism. But once a subscription connection has been made the client web application is communicating with the web socket GeoEvent Server is running, not with the stream service. The problem here is that if / when a GeoEvent Server instance fails there is no notification to a client application to signal it to unsubscribe and re-subscribe, giving them an opportunity to connect to an available / running GeoEvent Server instance. A brute force work around one of our distributors decided to try was to have their client apps actively, periodically, unsubscribe and re-subscribe. That way, if a client were connected to a "dead" web socket the periodic unsubscribe / re-subscribe would allow automatic recovery and reconnection. This obviously depends on the velocity and volume of event records being broadcast in your solution. A second mitigation the distributor adopted was to train users that if they had any reason to feel that their web map's display was stale to manually refresh the web page which explicitly causes the same unsubscribe / re-subscribe to occur.

I am not a system architect. You will probably want to work with your Esri Technical Advisor or request to speak with a Technical Account Manager through Esri Technical Support for help identifying a resource in Esri Professional Services which can help you with system / solution architecture best practices.

Best Regards --

RJ

EduardoFernandez1
Esri Contributor

Hello

The GeoEvent Tutorial says that for high availability, a minimum of three machines and a maximum of five machines are reccomended. I am curious to know why five machines is the maximum?

Cheers Ed

RJSunderman
Esri Regular Contributor

Hello Eduardo Fernandez –

The reason we recommend a maximum of five machines has more to do with system administrative burden than any hard technical limit in the GeoEvent Server implementation. Inherent administrative challenges with a 'site' deployment (where you deploy multiple ArcGIS Server instances, each running GeoEvent Server, in a single ArcGIS Server site) multiply with the number of machines you add. Also, testing has shown that GeoEvent Server does not scale linearly and you approach the point of diminishing returns when trying to architect a solution with more than five machines.

Hope this information is helpful –
RJ

EduardoFernandez1
Esri Contributor

Thank you RJ, this information is helpful and insightful.