ArcGIS Server Performance Strategies

AaronLopez · ‎10-06-2023

Performance: Challenges and Strategies

ArcGIS Enterprise provides a robust and scalable platform for delivering GIS resources for users through services and web applications. Sometimes however, deployments may experience slow performance from the published resource endpoints.

Since ArcGIS is very versatile, there can be different ways, configurations and options when making these resources available for consumption.

Obtaining good performance is not always as easy and straight-forward as just toggling a "fast = true" setting.

What might be a handy feature for some administrators could be a configuration, that while helpful, may interfere with the performance for others.

In the real-world, performance is often a function of several items and configuration settings. Having strategies to understand and overcome the most common ones may help put the deployment on the road to achieving faster performance.

What is Performance?

Performance is a description of how fast (or slow) a server or service operates for a particular function of interest. How long it takes an ArcGIS Server service to complete an operation like query, applyEdit or export and then send the response back to the client that requested it would be an example of performance.

This duration is measured in seconds or milliseconds and is commonly referred to as the response time.

Although called "response time", there are several important steps that make up the overall time a client like a web browser or ArcGIS Pro spend waiting for a requested server resource:

DNS lookup of the server's hostname
SSL handshake between client and server
TCP/IP connection between client and server
Sending the request to server
1. Server processing the request
Receiving response from server
1. For large responses, Time-to-first-byte (or TTFB) can be used instead to measure response time

Typically, the bulk of the time is spent at 4.1. This is where the server is working on the response.

This Community Article will explorer areas that can impact this portion of the response time.

Why is Performance Important?

Simply put: the faster the performance, the lower the response time.

The lower the response time, the more requests the server can support at one time.

This higher concurrency of requests translates to greater scalability which ultimately means support for more users.

Performance is measured as a unit of time need to complete a single operation at a time (e.g., 0.238 seconds to execute a feature query request).

Scalability on the other hand, is frequently measured as transactions or operations over time (e.g., requests/sec or operations/hour).

When talking about getting better, faster or "more" performance, it implies achieving lower response times. On the other hand, better scalability implies reaching a higher rate of throughput (e.g., more operations/hour).

Note: Of course, for obtaining better performance or throughput, the operations of interest need to execute appropriately and respond with the expected content (e.g., map image, json or pbf data). Errors messages, for example, can be a fast, simple response whose delivery can reach a high rate of throughput. As testers and analysts, this type of throughput is not what we're after...we are interested in the throughput of successful requests.

What is Acceptable Performance?

It depends.

The criteria or requirement for classifying an item as having fast (or slow) performance can vary greatly by organization, the published services and expected operations users will be calling.

It is not uncommon to have different response time goals for ArcGIS Server functions or user application workflows (several requests grouped together to represent one operation).

Any number of seconds is fine for a requirement but keep in mind, it may take more hardware as well as more extensive tuning and strategies (this Article) to achieve aggressive goals.

How is Performance Measured?

Response time is the key metric for determining if performance is meeting or staying within or under a target requirement.

Common strategies for measuring are:

Single user interaction
- Through the web browser or ArcGIS Pro
  - This is the easiest place to start
    - If there is no understanding yet on performance, start here
Statistically analyzing large volumes of response times
- Through log analysis tools, the ArcGIS Server Manager Statistics page, or other observability utilities
  - These approaches have the benefit of analyzing real-world requests that users have already executed against the deployment
- This is discussed in more depth later in the Article
Load Testing
- Can provide an understanding on performance and scalability
- More time consuming to setup
- Online resources exist for getting started
  - Performance Engineering: Load Testing ArcGIS Enterprise

When analyzing logs or running tests, a common strategy is to leverage statistics for breaking down large amounts of response times. The Average and 90th (or 95th) percentile, the Minimum, and Maximum help provide an understanding of the performance the user may have been experiencing.

Capturing Response Times -- Web Browser

How response times are captured through single user interaction is a fun topic of discussion.

The easiest approach to capture response times of REST requests from a web application is with the browser's "developer tools" functionality. All major browsers offer some view of the requests, responses and times being sent and received. This duration can give an idea of how fast a request or operation (potentially multiple requests) performed. Decisions can then be made if this is acceptable or needs to be improved.

Capturing Response Times -- ArcGIS Pro

While ArcGIS Pro also communicates with ArcGIS Enterprise via REST, it does not have a built-in equivalent of the developer tools. For capturing responses times, you'll need a separate HTTP debugger. There are many available, a popular choice is Fiddler.

With Fiddler installed to the same machine as ArcGIS Pro, it can be configured to intercept the traffic. Request parameters, response content and times can be similarly captured and examined.

Are Goals Required for Improving Performance?

Absolutely not. GIS Administrators can always analyze, tune and apply best practices to the system even if no official performance requirements are in place.
However, it is highly recommended to understand what performance your system is initially delivering (e.g., these are typically referred to as baseline response time numbers) before adjustments are made. This way you can determine if the applied changes are having a positive effect.

Common Performance Challenges and Potential Strategies

Service Pool Types and Instances

One of the most frequent areas that ArcGIS administrators encounter performance challenges with is setting the appropriate number of instances for dedicated services. But first, let's review the different types. There are 3 service types, each with their own strengths.

Dedicated
Hosted
Shared

As a GIS Administrator it is important to be able to identify the instance type for a service.

This can be easily viewed within ArcGIS Server Manager, under Manage Services:

Selecting the Appropriate Type

Choosing a dedicated service type is ideal when achieving the most performance and scalability control is desired. With this type, the administrator can:

Set the maximum number of instances to the number of CPU cores to take full advantage of the available processing capability ArcGIS Server machine (via the maximum)
Conserve memory when idle (via the minimum)
Adjust for predicable performance by setting the min and max to the same value

Dedicated services are ideal for heavily requested services or services where performance is paramount.

Hosted services do not utilization ArcSOC instances and auto-scale as needed. However, the ArcGIS capabilities available to it (Hosted) are limited as it is used primarily with feature queries.

Shared services are great for accessing items that are requested less frequently. They typically have more ArcGIS capabilities available to them but not all ArcGIS functionally is available (e.g., branch versioning). A shared instance pool is the default type when publishing a service with ArcGIS Pro.

Note: It is important to reiterate that if the selected service type is set to dedicated, the (minimum and maximum) number of instances should be evaluated to ensure they are optimal. The default when publishing in ArcGIS Pro is to only use a maximum of 2 (instances). This might be too low and inadeqaute for services where performance/scalability are important.

Focus the Map

Optimizing the map is not a new strategy but a relatively easy one to follow for getting better performance from your deployment.
When the map is focused on its primary purpose and presentation, the system does not have to do unnecessary work. Remember, the web is a multiuser platform. Making the display of the dynamic data as streamlined as possible is key to good performance (and scalability). With potentially many requests occurring at the same time, the content being shared needs to be as efficient as possible.

Map Strategies

Choose an Optimal Default Extent
- If the map is providing data on Los Angeles, the default extent should not be showing all of California
- If many different map scales are required, use scale dependencies and generalization to limit the detail to when you need most (e.g., the largest scales)
Remove unneeded layers
- Consider removing nice-to-have data layers
  - At the very least, unselect them and have user opt-in to enable
Purposely limit what users can do with a service
Avoid projecting on the fly
- Use the same coordinate system for data frame and data
Definition queries
- Ensure indexes are in place if comparison logic is applied to attribute columns

Note: "Focusing the map" applies to maps, apps and services

Software Releases

A particular version of ArcGIS Enterprise (and its related solutions) can have a handful of patches after its initial base release. These patches can offer performance improvements as well as functionality and security fixes.

It is highly recommended to periodically check the Esri Patches and Updates site or run the "Check for ArcGIS Enterprise Updates" tool. Then, apply the updates at the appropriate time.

Resource Contention and Expansion

There are times when best practices and strategies for performance are applied but lower response times and higher scalability are still required. Perhaps the current hardware running ArcGIS Server is simply exhausted where the processing power or memory have become the bottleneck for improving the user experience.

For such situations, you need to consider expansion and/or upgrading the hardware.

Scalability

For the ArcGIS Server and ArcGIS Web Adaptor tiers of the deployment, you generally have the following options for improving scalability using hardware:

Scaling up
- Adding more resources to the existing machine (e.g., additional processing cores)
  - Additional memory can also assist with scaling by allowing the deployment to have more ArcSOC instances running concurrently
- Ideal for deployments such as: cloud, virtualization, Kubernetes
Scaling out
- Adding more machines of equal resource capacity
- Ideal for deployments such as: on-premise, cloud, virtualization, Kubernetes

Performance

To improve performance with hardware, there is typically just one option:

Obtaining faster processing cores
- Ideal for deployments such as: cloud, Kubernetes

The system's paging configuration can also impact scalability even when ample memory resources have been added to a system. Although this is operating system software setting and not hardware, it can play a crucial part in the running of many concurrent ArcSOC instances. Be sure to set this accordingly to handle a workload with many instances or instances with a large memory footprint.

There can be situations where performance is limited and appears to be CPU bound (e.g., a bottleneck due to limited processing power). Further inspection may reveal the "culprit" to be one or more slow queries which were suboptimal to begin with or an expensive operation called too frequently (through a periodic administrative tasks). In such a case, it may be more effective to address to the "bad" queries to improve performance and scalability.

Note: These expansion strategies are for overcoming general processing and memory limitations. But, disk and network (bandwidth, latency) resources can also be bottlenecks. For some environments, these can be more complicated to expand upon requiring additional steps to upgrade.

A General Approach to Scaling

For ArcGIS Server it is can be easier to scale up first, then out. The reason is due to the Site configuration and directories which would need to be migrated to shared storage if the architecture is switched from a single machine to a multi-machine deployment.

How much should you scale up...two servers, five servers? Without details on average response times and the anticipated number of users to support, it’s difficult to provide a concise answer.

However, a simplistic, general approach would be to just try doubling the current amount (memory and/or physical processing cores) and observing the impact.

For many cases, this probably works well up to 16 CPUs. At that point, adding another machine may be more advantageous.

Note: Your current product license may impact how many physical cores you can use with ArcGIS Server. Check with your Esri Account Manager for more details.

Note: For support with capacity planning or architecture design (which can help provide a detailed understanding and estimate on the required hardware resources for a given set of workflows), contact Todd Jarrard (tjarrard@esri.com) in Esri Professional Services.

Observability

"Quantifying ArcGIS" is a great strategy...a personal favorite. It defines what resources were being requested and how fast were the responses to fulfill these requests. It is important for obtaining an understanding of general system performance. If system resource utilization can also be captured, the analysis can be further elevated.

There are many utilities available for periodically examining your system. Some tools may read the access logs and primarily focus on the statistical request performance made by users. Others may poll Server's statistics page and capture the CPU utilization (of ArcGIS Server or the database) for that duration of time. Which approach is best? If observability is currently not taking place then most likely, any one of them. would be a good addition. They all help provide some type of insight to the performance and health of the deployment.

Once analysis has been conducted for a deployment, reports can be typically generated that highlight which map services might be of interest due to:

The observed response times being slower than expected
The number of requests issued for the resource
Both response time and number of requests

For an ArcGIS Site with many services, knowing which ones are statistically slow or are consuming the most resources help focus tuning efforts. With such reports, the GIS administrator has turned data into valuable information and are now better informed at making decisions for improving the user experience. That said, examining logs and statistics is just one (important) slice of the analysis pie.

A Challenge with Common Observability Tools

Many tools for system observability and monitoring focus the analysis on requests and responses for services. This is a good approach and definitely assists administrators with quantifying ArcGIS, but it can have a limitation. The limitation can appear with an assumption that a slow map service can be "fixed" by simply adding processing cores. More cores might improve some aspects of the situation, but it is recommended that the service be examined (or even reexamined) in more depth before more resources are obtained.

This goes back to the "Focus the Map" section, for example:

Ensure the data for the service is not being shown at too small of scales
Avoid suboptimal queries

Detailed query analysis can help show the occurrence of these behaviors that might get masked with general service reporting. However, while the break down of service request parameters and the underlying queries can improve the analysis it can add complexity to the reporting itself (e.g., more time to execute, more views of what to look at, the understanding of the views). Additionally, not all observability tools perform this type of inspection.

Some recent efforts that are gaining traction are attempting to tackle this issue. They are based on a bottom up approach of analysis where the starting point is on the underlying database queries themselves through mechanism known as "query datastore". Query datastore analysis is powerful and does not impact the database performance like a trace can but it does require some knowledge of the queries themselves and their purpose. Look to this type of analysis capability in the future to help get the most from your observability tools.

Conclusion

There is no single item to easily adjust for boosting performance and scalability of an ArcGIS Enterprise Site. However, this Article lists some common strategies that can be applied together for improving it. It is also important to understand these are items that should be periodically revisited and acted upon. User habits change over time as does the popularity of a web application or service. Resources that were assigned to a particular service can be reevaluated or reduced to make room for the next featured item in your Site.

ArcGIS performance analysis can be fun but it also a continuous effort for maintaining the best user experience.

Attribution

Resource: File:Grayson_running_the_4x100.jpg

Description: English:Grayson running the first leg of the 4x100 at the 2010 Tigered invite

Author: Graysonbay

Created: 02:02, 29 November 2010

License: This file is licensed under the Creative Commons Attribution 3.0 Unported license

Resource: File:Kurvimeter_1_fcm.jpg

Author: Frank C. Müller, Baden-Baden

License: This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

Resource: File:My_Opera_Server.jpg

Description: A server used for the My Home

License: The copyright holder of this file allows anyone to use it for any purpose, provided that the copyright holder is properly attributed. Redistribution, derivative work, commercial use, and all other use is permitted.

Resource: File:Samsung-1GB-DDR2-Laptop-RAM.jpg

Description: A 1 gigabyte stick of DDR2 667 MHz (PC2-5300) laptop RAM, made by Samsung and pulled from a 2007 MacBook laptop.

Author: Evan-Amos

Created: 1 August 2018

License: Public Domain

ZachBodenner · ‎03-01-2024

Hi @AaronLopez, thanks for putting all of this in writing! I'm having some server performance issues and I'm using some of your suggestions to help out. I have a question that I hope you can answer in regards to optimal web map config. Let's say I have a web map that has 20 layers. 10 of them are from web service A which used exclusively in this map and 10 are from web service B that is used in a few more different maps. I have the ability to re-publish/overwrite/otherwise change the configuration of the services themselves. Which scenario makes more sense:

1. Leave the set up as-is; consuming the two different web services in one map does not meaningfully impact performance.

2. Republish web service A to also contain the layers present in web service B; it boosts performance to have all of the layers coming from the same service.

AaronLopez · ‎03-01-2024

Hi @ZachBodenner,
My take is that it would be more efficient (and potentially faster) to have all of the layers coming from the same service. This assumes all the layers are are using the same connection to the data under the hood.
With this approach the web map can be more easily managed as there is just one service to optimize and tune (e.g., number of instances). Granting permissions in Portal for ArcGIS should also be simpler.

Of course, the elephant in the room is 20 layers. lol.
If all 20 layers are required to be there for functionality that is one thing. But, if possible, consider opting-in for some of them or enabling some based on the map scale.

Hope that helps.
Aaron

ZachBodenner · ‎03-04-2024

It does thanks (20 is hypothetical - we don't have many maps with that many services, and those that do definitely have scale dependencies). You're definitely right about the easier to tune/manage part, though I wonder if you could possibly expand on why it would be "potentially faster." Is that just because it's easier to devote dedicated instances to the service?

AaronLopez · ‎03-08-2024

Hi @ZachBodenner,

> could possibly expand on why it would be "potentially faster."
> Is that just because it's easier to devote dedicated instances to the service?
Yes, but I think it would have more to do with not splitting time across different services instances for retrieving the same data. If all 20 dedicated instances are from one service, then there is a greater chance of improved performance from the benefit of "cache hits". There is "cache" all over but the one I am thinking is at the ArcSOC-level (depending on the service there can be workspace cache that can be taken advantage of).
In the end, the performance of both configurations is probably really close, but if I were to go with one (without performance testing the differences of the two), I would pick the layers coming from the same service.

Aaron