System scalability and performance considerations

RJSunderman · ‎11-11-2015

What is GeoEvent doing when it receives event data

When an inbound connector (input) receives an event payload, an adapter parses the event data and constructs a GeoEvent. It uses a GeoEvent Definition (which specifies the data structure of each event) to do this. An input passes the GeoEvents it constructs to a GeoEvent Service for processing. GeoEvents are passed from an input to a GeoEvent Service using a message bus. The message queuing implementation is internal to the product and abstracted from the user (the 10.3 / 10.3.1 releases use RabbitMQ). The GeoEvent Service applies event filtering and processing to the events it receives then delivers the processed GeoEvents to an outbound connector (output) for broadcast.

In order to add/update features in a Geodatabase, a GeoEvent outbound connector leverages REST endpoints exposed by a published feature service. The features are actually stored in an underlying feature class, in a Geodatabase. So adding and updating features actually involves HTTP calls and backend database transactions.

When the GeoEvent extension was first released, event data had to be persisted in a GeoDatabase and exposed through a published feature service before a client could visualize the data on, say, a Web Map. At the 10.3 release of the product we introduced the concept of a Stream Service … but more on that in a little bit.

Event Throughput / System Performance

When considering system scalability and the GeoEvent extension, most folks are thinking about the number of event messages they need to process each second. A simple GeoEvent service with a single TCP/Text input sending data out to a TCP/Text output can process thousands of events per second. But such a simple service isn’t doing very much … it isn’t undertaking any computational analytics or spatial relationship comparisons (which can be CPU intensive), it isn’t transacting with a feature service to enrich event data with attributes from a related feature (which involves network latency), and it isn’t doing anything with GeoFences (which can use a lot of available RAM).

Improvements made to the product for the 10.3 release focused on maintaining a high level of GeoEvent throughput when real-time analytics such as spatial filtering (e.g. GeoEvent INSIDE GeoFence) were incorporated into a GeoEvent Service.

Just as important than the number of events you expect to receive each second, or the filtering/processing you plan on performing, is the overall size of each event in bytes. You can easily process 1000 events per second on a single server when the events are fairly small – say just a few simple attributes with a point geometry. If your events include several polygon geometries each with many dozens of vertices and/or complex hierarchical attribute structures with lists/groups of values, then each event's "size" will be much larger. More system resources will be required to handle the same number of events and your events per second throughput will be reduced.

Most folks want to use the event data to update features in a feature class through a published feature service. Transactional throughput with a feature service is an acknowledged bottleneck. When you configure a GeoEvent output to add/update features in a feature service you can expect to be limited to just a couple hundred events per second. There are a couple of parameters you can adjust on your GeoEvent outbound connector to throttle event output as you prototype your event transactions, but you typically don’t need to if you are below the 200 – 300 events per second threshold we recommend for updating features in a locally hosted feature service. Please note that event throughput is further restricted when updating a hosted feature service within an ArcGIS Online Organization account.

Regarding system performance, say you were to receive a burst of 1000 events on an input. If you knew, from testing you had conducted, that your local server and database could only update 240 events per second, then you can assume that GeoEvent will need to create a backlog of events and work to process events off the backlog. The 10.3 product release does a better job of this than the 10.2.2 release.

Say your data was not a single burst, but that you expected to receive a sustained 1000 events per second, and your output was still only handling 240 events per second. At 10.3 you can expect that the GeoEvent performance will degrade sharply as the backlog begins to grow. GeoEvent will work to clear the backlog, but it will continue to grow until a system resource becomes saturated and the system becomes unresponsive. This is the behavior we observed during the GeoEvent 10.3 Holistic Event in DC. It could be that you run out of system RAM, it could be that you saturate the network card. If you are not processing events as fast as you are receiving them you will have a problem.

Stream Services

Stream Services, available with the GeoEvent 10.3 release, provide an alternative to updating features in a feature service. The key when thinking about Stream Services is to separate in your mind data visualization and data persistence. We have another tutorial, Stream Services in GeoEvent, which you might want to take a look at if streaming data out for visualization without persisting the data in a geodatabase is something that you are interested in pursuing.

Stream Services rely on WebSockets. The JSON broadcast from a Stream Service can be received by a custom JavaScript application; the Stream Service concept is supported by a developer API. They can also be added to Web Maps as feature layers to display the event data in near real-time.

The reason behind the development of the Stream Service was that, without them, your only option for data visualization was to first persist your event data as features in a Geodatabase feature class, then have your client application query the feature class through a published feature service. An ArcGIS Server instance running on a local GIS Server with a local RBDMS supports up to about 240 events per second. That’s a 1/10^th of what a typical GeoEvent Service is able to process each second. Streaming the event data out as JSON was one way to provide data visualization for folks who didn’t require data persistence.

WebSockets also allow a client to specify spatial and/or attribute filters which the server will honor. This allows network bandwidth to be preserved by limiting the data being passed over the socket connection. It is up to the client application to know what it can support and specify filters which limit the amount of data the client will receive.

In the case of a custom JavaScript web application, if too much information is sent from GeoEvent to a browser's web application, the browser will simply freeze and may crash. The socket connection will close when the client dies and GeoEvent will continue broadcasting events through the Stream Service for receipt by other clients (currently connected or in the process of connecting).

System Resources / System Sizing

There’s actually quite a bit you need to consider when thinking about scalability.

Physical Server
When leveraging virtualization, in my experience, it is most important that the physical server hosting the virtual machines adequately support the virtualization. The physical server is going to need considerable memory and a substantial processor if you plan on instantiating multiple VMs each with 4 CPU cores and 16GB of RAM. Also, the more physical servers you incorporate in your architecture, the more important the network card will become. At the 10.3 product release we support high-availability and high-velocity concepts by configuring an ArcGIS Server cluster with multiple servers and installing GeoEvent on each server in the cluster. We have a Clustering in GeoEvent tutorial which introduces these concepts.

RAM
In a Holistic lab we conducted back in September, we discovered that the type of RAM is as important as the amount. It's not sufficient to consider a server as simply having 16GB of RAM. Premium DDR4 SDRAM will provide 2133 – 4266 MT/s (million transfers per second) whereas DDR3 RAM from just a couple years ago will provide only 800 to 2133 MT/s. You may not have any control over the type of RAM in the physical server hosting the VMs being provided to you – but it matters.

If you are importing GeoFences into the GeoEvent processor, know that these geometries are all maintained in memory. If you have thousands of complex GeoFences with many vertices, that is going to consume a lot of RAM. Significant improvements were made at the 10.3 product release to allow GeoEvent to better handle the spatial processing needed to determine the spatial relationship between an event’s geometry and a GeoFence, so event throughput is much better – but a high volume of spatial processing can consume significant CPU.

CPUs
The number of CPU cores is generally important once you being designing GeoEvent Services with actual processing – it is not as important when benchmarking raw event throughput. For example, a benchmark taking event data from a TCP socket and sending the data to a TCP socket doesn't require much CPU; a large amount of premium RAM is more important in this case. Projecting an event's geometry, enriching events, calculating values – these analytics will all place load on your CPU resource.

I wouldn't be surprised to learn that a physical server with only 4 cores and 32GB of premium RAM outperformed a virtual cluster of three VMs each with 4 cores and 16GB of RAM. The hosting server might be an older machine with DDR3 or DDR2 generation RAM. The hosting server might be supporting other virtualization. Network connections between physical machines might benefit from an upgrade.

Given the above, you can probably understand why we recommend that you dedicate an ArcGIS Server site for real-time event processing. You might have heard this referred to as a "silo'd" approach to your system architecture in which one ArcGIS Server site is set-up to handle your map service, feature service, geoprocessing, and map tile caching with a second ArcGIS Server site set-up for real-time event processing and stream service publication.

There are many factors you will need to consider when making system architecture and hardware purchasing decisions. Videos from technical workshops presented at our Developer Summit in Palm Springs as well as the International User Conference in San Diego are available on-line at videos.esri.com ... search for geoevent best practices to find a video which presents some of these considerations.

The above is, of course, specific to the ArcGIS GeoEvent Extension for Server. A much more comprehensive look at system architecture design strategies is provided by Dave Peters in his blog: System Architecture Design Strategies training reference material.

Hope this information is helpful -

RJ

ErikLanhammar · ‎11-18-2015

Wise words RJ, much appreciated!

/Erik

RJSunderman · ‎07-19-2016

When conducting our benchmarks for the upcoming 10.5 release we encountered an unexpected drop in the raw number of events per second we can ingest in a TCP/Text input and send out a TCP/Text output (without conducting any processing or filtering on the event data).

The issue turned out to be that we had accidentally staged the software on a different type of Amazon EC2 than what we had used in previous benchmark tests.

It turns out that the c4.2xlarge class supports Intel® AVX2 whereas the c3.2xlarge class does not. Support for AVX2 can double the number of floating-point operations per second an application can perform. For our benchmarks that meant that event throughput for simple events which included coordinates for a point geometry fell from over 4,000 events per second to well below 3,000 events per second.

Instance Type	vCPU	Memory (GiB)	Storage (GB)	Networking Performance	Physical Processor	Clock Speed (GHz)	Intel AVX†	Intel AVX2†	Intel Turbo	EBS OPT	Enhanced Networking†
c3.2xlarge	8	15	2 x 80 SSD	High	Intel Xeon E5-2680 v2	2.8	Yes	-	Yes	Yes	Yes
c4.2xlarge	8	15	EBS Only	High	Intel Xeon E5-2666 v3	2.9	Yes	Yes	Yes	Yes	Yes

I mention this because, looking at the two class specifications in the table above, it would be very easy to overlook something "minor" like support for Intel AVX2 technology. (Scroll the table display to the right to see that the "C3" instance does not support AVX2.)

As with our discovery that different types of RAM (e.g. DDR3 vs. DDR4) are very significant - capabilities supported by your chosen server's processor can be very significant.

Cross Reference:

- https://aws.amazon.com/ec2/instance-types

- http://www.intel.com/content/www/us/en/benchmarks/performance-xeon-e5-v3-advanced-vector-extensions-...

JohnDye · ‎12-29-2016

RJ, this is such an incredibly valuable write up. Thank you so much for posting it.