GeoEvent Server High Availability

573
4
02-25-2020 07:02 AM
Highlighted
MVP Regular Contributor

I'd like to get some input from the community and from Esri's RJ Sunderman on my proposed architecture for one of my clients.  This client is looking to implement a highly available GIS architecture that includes GeoEvent Server.  Below is a simplified architecture diagram showing the anticipated flow of traffic between users, servers, clusters, and load balancers.  Note that we will not be using web adaptors but instead will be relying on the load balancers.  My main questions are around the GeoEvent Server portion of the diagram.  I'd like to make GeoEvent Server "highly available", or as close to it as possible.  Would it make sense to create a multi-node ArcGIS Server site and configure GeoEvent Server the exact same way on all of them?  Will the GeoEvent Gateway properly handle the traffic between nodes in this case?  We're specifically talking about the 10.7.1 version of the software, by the way.  The goal here is redundancy, not necessarily scalability, so I wouldn't need to have different data inputs/outputs on each GeoEvent node.  Could I then federate that entire site of GeoEvent Servers with the Portal and still see expected behavior?  If I take this approach, will I see two sets of services (one from each GeoEvent Server) within my Portal once I publish a service from GeoEvent Server?  I've been reading, in previous versions of the software, that the recommendation was to create siloed ArcGIS Server sites each containing GeoEvent Server where they would run independently from one another.  I'm trying to avoid having multiple services for the same content and I'm trying to get as close to a single endpoint for users as possible.  Let me know if I need to clarify anything in particular.  Thanks for your time and input.  

4 Replies
Highlighted
Esri Contributor

Hi William,

Thanks for your inquiry about GeoEvent high availability. There is a wealth of information about the concepts and implementation steps for a multiple-machine site approach to high availability in the following tutorial: GeoEvent Server 10.6.x Multiple-Machine Site. Note: the tutorial is labeled as 10.6.x but applies to later versions as well. This tutorial answers many of your questions in detail, but I've addressed them directly below:

Would it make sense to create a multi-node ArcGIS Server site and configure GeoEvent Server the exact same way on all of them?  Will the GeoEvent Gateway properly handle the traffic between nodes in this case? 

Yes, GeoEvent Server supports multi-machine sites since ArcGIS Server 10.6.1. In this configuration, the GeoEvent Gateway acts as a distributed configuration store and message broker for all the machines in the site. The minimum recommended number of machines is three, to provide the best consistency in event of a single machine's failure.

Could I then federate that entire site of GeoEvent Servers with the Portal and still see expected behavior? 

Yes, you could federate the site with Portal to take advantage of the Portal security/SSO, though it is not required.

If I take this approach, will I see two sets of services (one from each GeoEvent Server) within my Portal once I publish a service from GeoEvent Server? 

No. Stream services will only run on a single machine in the GeoEvent site (see "Other considerations regarding GeoEvent Server multiple-machine sites" in the tutorial). Other services, i.e., hosted feature services or federated services, should be published to the hosting server.

Feel free to reach out to me if you have additional questions.

Regards,

Brad

Esri Professional Services

Highlighted
Esri Regular Contributor

Hello William

To add to what Brad‌ says above, there are two approaches to approaching a multi-machine GeoEvent Server deployment. The first I refer to as the 'site' approach, the other as the 'silo' approach.

  • The ‘site’ approach deploys multiple ArcGIS Server instances, each with a GeoEvent Server, in a single ArcGIS Server site. The GeoEvent Gateway is utilized more heavily in this configuration as the component responsible for event record distribution across the site.
  • The ‘silo’ approach relies on an external broker or load balancing component such as Apache Kafka for message distribution. In this approach you are essentially configuring multiple independent GeoEvent Server instances and taking on the challenge of routing a portion of the inbound data you need to process to different instances.

Brad mentioned one tutorial, GeoEvent Server 10.6.x Multiple-Machine Site, which covers the 'site' approach. There is another tutorial, GeoEvent Server Resiliency, which covers the 'silo' approach.

There are pros and cons to both the 'site' and 'silo' approach. When choosing one over the other you should carefully consider your specific objectives – resiliency, scalability, fault-tolerance, reliability. Architects need to decouple these specific objectives from a more generic "high-availability" objective.

Brad is correct that with the introduction of the GeoEvent Gateway in the 10.6.x release architects have the option to follow a 'site' deployment and allow GeoEvent Server to handle machine fail-over when a machine participating in a site fails. We've found on the product team, however, that when a multiple machine approach is necessary, accepting the technical debt of learning how to deploy, configure, and administer Apache Kafka and Zookeeper and following a 'silo' deployment model gives administrators better visibility into operational failures and more control over recovery. For this reason, more than any other, I am more comfortable recommending a 'silo' approach over the 'site' approach.

I have written up some thoughts and advice on resiliency, scalability, reliability, high availability, and pros / cons to consider when taking on a multiple machine deployment. Brad or I can share this with you if you schedule some time with one of us to discuss your approach, concerns, and objectives. I will offer that, realistically, folks who are happy with GeoEvent Server are those who are able to get what they need out of a single machine deployment; folks who are unhappy with GeoEvent Server are those who try to architect solutions which push the technology on which GeoEvent Server was built beyond what it’s able to do by trying to design “highly available” solutions with multiple machines.

Customers looking for more resilient solutions with auto-scaling and built-in fault-tolerance are encouraged to look at the new ArcGIS Analytics for IoT – a SaaS offering for ArcGIS Online. Its architecture and implementation are completely different from GeoEvent Server. You can read more about ArcGIS Analytics for IoT at the following links:

Moving forward, the GeoEvent Server product team is not going to be recommending multiple machine deployments. We won't be taking away an architect's options to deploy using a 'site' or 'silo' approach, but we will be encouraging customers who have needs beyond what a single instance of GeoEvent Server can support to consider the ArcGIS Analytics for IoT SaaS offering over the on-premises ArcGIS GeoEvent Server.  

Hope this information is helpful –
RJ

Highlighted
New Contributor

Hi RJ, Brad,


The initial post of William was about ArcGIS GeoEvent Server 10.7.1. But what about the new version 10.8.1. The new documentation talks about deployment strategies only in 'Silo' architecture term but no reference to multi-machine site. Does that mean the multi-site approach is not anymore recommanded for this version?

Thanks for your help.

Regard,


RM

Reply
0 Kudos
Highlighted
Esri Regular Contributor

Hello René

You are correct. Documentation newly updated with the 10.8.1 release, particularly the on-line help topics Load Balancing and Strategies for scalability, reliability, and resiliency, no longer discuss what I have referred to as a 'site' deployment approach (where multiple ArcGIS Server instances, each with a GeoEvent Server, are organized in a single ArcGIS Server site). Please take a look at the newly prepared on-line documentation; we've added quite a bit and have plans for more by the end of the year.

Moving forward, the product team is recommending that all GeoEvent Server deployments follow a 'silo' deployment pattern. The distinction comes when you decide whether you have to deploy the same configuration to each machine, so you can use an external mechanism to route or distribute event records to two otherwise identical instances in parallel, or whether you can deploy a different configuration to each machine (allowing one instance to ingest event records of one type and another instance to handle event records of some other type).

Nothing in the product's implementation changed with regard to allowing system architects to pursue the previously documented strategies of deploying multiple ArcGIS Server instances, each with a GeoEvent Server, in a single ArcGIS Server site. However, the added complexity in system administration and occasional recovery required when failures in hardware or software components occur tend to outweigh system design objectives for resiliency and high-availability. If you would like an informal write-up I prepared looking at some concerns surrounding this, please e-mail me directly:  rsunderman .at. esri.com 

Hope this information is helpful –
RJ