I'd like to get some input from the community and from Esri's RJ Sunderman on my proposed architecture for one of my clients. This client is looking to implement a highly available GIS architecture that includes GeoEvent Server. Below is a simplified architecture diagram showing the anticipated flow of traffic between users, servers, clusters, and load balancers. Note that we will not be using web adaptors but instead will be relying on the load balancers. My main questions are around the GeoEvent Server portion of the diagram. I'd like to make GeoEvent Server "highly available", or as close to it as possible. Would it make sense to create a multi-node ArcGIS Server site and configure GeoEvent Server the exact same way on all of them? Will the GeoEvent Gateway properly handle the traffic between nodes in this case? We're specifically talking about the 10.7.1 version of the software, by the way. The goal here is redundancy, not necessarily scalability, so I wouldn't need to have different data inputs/outputs on each GeoEvent node. Could I then federate that entire site of GeoEvent Servers with the Portal and still see expected behavior? If I take this approach, will I see two sets of services (one from each GeoEvent Server) within my Portal once I publish a service from GeoEvent Server? I've been reading, in previous versions of the software, that the recommendation was to create siloed ArcGIS Server sites each containing GeoEvent Server where they would run independently from one another. I'm trying to avoid having multiple services for the same content and I'm trying to get as close to a single endpoint for users as possible. Let me know if I need to clarify anything in particular. Thanks for your time and input.
Thanks for your inquiry about GeoEvent high availability. There is a wealth of information about the concepts and implementation steps for a multiple-machine site approach to high availability in the following tutorial: GeoEvent Server 10.6.x Multiple-Machine Site. Note: the tutorial is labeled as 10.6.x but applies to later versions as well. This tutorial answers many of your questions in detail, but I've addressed them directly below:
Would it make sense to create a multi-node ArcGIS Server site and configure GeoEvent Server the exact same way on all of them? Will the GeoEvent Gateway properly handle the traffic between nodes in this case?
Yes, GeoEvent Server supports multi-machine sites since ArcGIS Server 10.6.1. In this configuration, the GeoEvent Gateway acts as a distributed configuration store and message broker for all the machines in the site. The minimum recommended number of machines is three, to provide the best consistency in event of a single machine's failure.
Could I then federate that entire site of GeoEvent Servers with the Portal and still see expected behavior?
Yes, you could federate the site with Portal to take advantage of the Portal security/SSO, though it is not required.
If I take this approach, will I see two sets of services (one from each GeoEvent Server) within my Portal once I publish a service from GeoEvent Server?
No. Stream services will only run on a single machine in the GeoEvent site (see "Other considerations regarding GeoEvent Server multiple-machine sites" in the tutorial). Other services, i.e., hosted feature services or federated services, should be published to the hosting server.
Feel free to reach out to me if you have additional questions.
Esri Professional Services
Hello William –
To add to what Brad says above, there are two approaches to approaching a multi-machine GeoEvent Server deployment. The first I refer to as the 'site' approach, the other as the 'silo' approach.
There are pros and cons to both the 'site' and 'silo' approach. When choosing one over the other you should carefully consider your specific objectives – resiliency, scalability, fault-tolerance, reliability. Architects need to decouple these specific objectives from a more generic "high-availability" objective.
Brad is correct that with the introduction of the GeoEvent Gateway in the 10.6.x release architects have the option to follow a 'site' deployment and allow GeoEvent Server to handle machine fail-over when a machine participating in a site fails. We've found on the product team, however, that when a multiple machine approach is necessary, accepting the technical debt of learning how to deploy, configure, and administer Apache Kafka and Zookeeper and following a 'silo' deployment model gives administrators better visibility into operational failures and more control over recovery. For this reason, more than any other, I am more comfortable recommending a 'silo' approach over the 'site' approach.
I have written up some thoughts and advice on resiliency, scalability, reliability, high availability, and pros / cons to consider when taking on a multiple machine deployment. Brad or I can share this with you if you schedule some time with one of us to discuss your approach, concerns, and objectives. I will offer that, realistically, folks who are happy with GeoEvent Server are those who are able to get what they need out of a single machine deployment; folks who are unhappy with GeoEvent Server are those who try to architect solutions which push the technology on which GeoEvent Server was built beyond what it’s able to do by trying to design “highly available” solutions with multiple machines.
Customers looking for more resilient solutions with auto-scaling and built-in fault-tolerance are encouraged to look at the new ArcGIS Analytics for IoT – a SaaS offering for ArcGIS Online. Its architecture and implementation are completely different from GeoEvent Server. You can read more about ArcGIS Analytics for IoT at the following links:
Moving forward, the GeoEvent Server product team is not going to be recommending multiple machine deployments. We won't be taking away an architect's options to deploy using a 'site' or 'silo' approach, but we will be encouraging customers who have needs beyond what a single instance of GeoEvent Server can support to consider the ArcGIS Analytics for IoT SaaS offering over the on-premises ArcGIS GeoEvent Server.
Hope this information is helpful –
Hi RJ, Brad,
The initial post of William was about ArcGIS GeoEvent Server 10.7.1. But what about the new version 10.8.1. The new documentation talks about deployment strategies only in 'Silo' architecture term but no reference to multi-machine site. Does that mean the multi-site approach is not anymore recommanded for this version?
Thanks for your help.
Hello Rene --
You are correct. Documentation newly updated with the 10.8.1 release, particularly the on-line help topics Load Balancing and strategies for scalability, reliability, and resiliency, no longer discuss what I have referred to as a 'site' deployment approach (where multiple ArcGIS Server instances, each with a GeoEvent Server, are organized in a single ArcGIS Server site). Please take a look at the newly prepared on-line documentation; we've added quite a bit and have plans for more by the end of the year.
Moving forward, the product team is recommending that all GeoEvent Server deployments follow a 'silo' deployment pattern. The distinction comes when you decide whether you have to deploy the same configuration to each machine, so you can use an external mechanism to route or distribute event records to two otherwise identical instances in parallel, or whether you can deploy a different configuration to each machine (allowing one instance to ingest event records of one type and another instance to handle event records of some other type).
Update: February 2021
Beginning with the 10.9 release, every instance of GeoEvent Server you deploy must run beneath its own ArcGIS Server with its own ArcGIS Server site. This extends the single-machine high-availability active/active and single-machine high-availability active/passive deployment patterns promoted by ArcGIS Server.
You will still be able to deploy multiple instances of GeoEvent Server which run independently from one another. These do not share a common configuration which GeoEvent Gateway must synchronize across a “cluster” of GeoEvent Server instances. (Recall that GeoEvent Gateway encapsulates the Apache Kafka message handler and the Zookeeper distributed configuration store used by GeoEvent Server.)
We made the decision to remove support for multiple-machine / single-site deployments because we have found over time that GeoEvent Server deployments which coordinate through a single ArcGIS Server site do not meet reliability objectives. In rare cases of complete hardware failure -- where a single server node in a multi-machine deployment went permanently offline -- the deployment pattern we have deprecated did provide fault-tolerance. More frequently, however, when a deployment was challenged by a disadvantaged network, or a machine was temporarily unavailable, or servers were restarted out of sync, the whole deployment could become effectively unusable.
System recovery for multitiple-machine / single-site deployments was tedious and error prone, which led to promises that a system architecture would provide high-availability failing to meet expectations. We therefore refactored the GeoEvent Gateway to provide better resiliency and overall system stability for the majority of our users by removing cluster leader election and in-sync replication between peer brokers/consumers. This means that multiple instances of GeoEvent Server (beginning with the 10.9 release) will no longer be able to synchronize a shared configuration or support a "clustered computing" architecture. But in the end, we achieve a better, more resilient, and more stable product.
If you would like an informal write-up I prepared looking at some concerns surrounding this, please e-mail me directly: rsunderman .at. esri.com
Hope this information is helpful,