Recommended Strategies for Load Testing an ArcGIS Server Deployment
The load testing of ArcGIS Server and its services can be conducted for several reasons: to understand how published services perform, to determine if there are any known hardware limitations or bottlenecks, to observe the stability of a service over time or to measure the scability of a service.
The following list of testing strategies is a common a practice within Esri Professional Services followed by many analysts.
This list can be used to work with any load test software to help act as a guide for the tester/analyst to achieve the best results and to maximize the testing effort.
- Start with a test plan
Most testing software can produce some type of report once the test has completed. This report should try to answer questions the test plan was asking.
Could the ArcGIS service utilize all the CPU (e.g. CPU-bound)?
Could the ArcGIS service deliver enough throughput?
Did the highest level of throughput deliver a response time that met our performance requirement?
Defining a purpose for a test helps keep the testing effort focused on a goal.
- Interact with your application (and ArcGIS services) before load testing them
If you are the only user on your system and the ArcGIS services respond very slowly, there is no need to conduct a test under load. The next step should be to tune and optimize the ArcGIS services for performance.
- Start a load test at step 1
Using 1 as an initial load step can assist your post-test analysis.
Step 1 (or one concurrent test thread) represents the best-case scenario for your test. This is your baseline and is a good measuring stick to understand how well or poorly the ArcGIS service scaled as pressure increased (e.g. when additional test threads were added).
- Collect hardware utilization from all machines involved in the test
Most test software typically provides the ability to collect hardware utilization from the servers and test client itself. This can be valuable for understanding resource consumption and identifying bottlenecks (e.g. the resources on the test client can also be a limiting factor).
However, despite having this feature, collecting hardware utilization through WMI is not always possible due to permission or network access (e.g. connecting through firewalls/routers).
- Test individual ArcGIS services first
If a specific web application uses more than one ArcGIS service, test and tune each one separately.
This approach makes it easier to identify which services may have bottlenecks or limitations that prevent them from utilizing the available hardware.
If an ArcGIS service cannot utilize all the available CPU hardware of ArcGIS Server, the tester/analyst should notify the appropriate person that a tuning opportunity in the deployment exists
Also, avoid starting the testing effort with the full application workflow as it can be difficult to spot potential bottlenecks when many ArcGIS services are tested at once.
- Test as physical close to the deployment as possible
Try not to have the test “simulate” the Internet. Testing as physically close to the deployment as possible can help provide the best understanding of what the server hardware can deliver.
Purposely introducing network latency or poor bandwidth will add noise to a test and make it difficult at recognizing the full capability of the ArcGIS services and the servers they are running on.
- Step and test duration
Tests do not need to run for 8 hours to provide useful information on the ArcGIS service in question. However, it is recommended to avoid running too short a load test. This comes down to choosing a length of time for each step duration that provides the right amount of information. In other words, it is about recording enough request samples to get a “good” average.
The length of time to use is typically tied to your response time, a fast response time can deliver many requests with a 5-minute step load. A slow response time may need a 15-minute step load to record just as many values.
As a tester, you may not always get the step and test duration right with your first estimation and may need to adjust and rerun the test.
- Be mindful of the ArcGIS Server log level
Although the ArcGIS Server logs can provide a great deal of information to analyze, it is important to understand the higher log levels of VERBOSE and DEBUG can slow down the performance of a very busy site and are not a recommended setting for a Production environment. Whereas, the value of WARNING (the default) provides the best possible service performance.
However, a setting of FINE is a good compromise between useful analytical information and speed.
- Traditional ArcGIS services can be tuned within (ArcGIS) Server
Before load testing a traditional ArcGIS service to tune it or understand how it performs, try setting the value of its ArcSOC instance maximum to the number of CPU cores available on ArcGIS server.
After restarting the service, the setting will allow it to take advantage of the available hardware and can help show it in the best possible light (this assumes the service is CPU-bound).
Increasing the value of the ArcSOC instance maximum will also allow the service to utilize more memory. Please ensure there is adequate memory available on the ArcGIS Server machine to accommodate the adjustment.
- Not all ArcGIS services are CPU bound
If an ArcGIS service is CPU-bound, it means the amount of throughput it can deliver (or capacity it can support) is limited only by the number of CPUs on the ArcGIS Server machine(s). In many ways, this can be a good thing.
However, this is not always the case. Sometimes, you can have bottlenecks in other hardware like network, for example. Occasionally, you can encounter a bottleneck in a software component which can be by design or unintended.
Therefore, the collection of hardware metrics during the test is very important. Observing utilization of the CPU, Memory, Network and Disk can help provide the tester/analyst vital information for understanding if there is something limiting the scalability of the ArcGIS Server and whether it is the server’s (or test client’s) hardware.
- It’s all about throughput (not users)
Throughput is measured, users are calculated…they are two different artifacts from a test.
In a test, throughput is typically defined as transactions/second (or operations/second), and this is a value that should be measured by the test client software. On the other hand, the definition of a “user” can vary but is something that is usually calculated off throughput.
Since throughput is observed directly from the results of a load test, it is one of the best metrics for determining the scalability of a deployment.
On a related note, a test thread (e.g. the item that increases in correspondence to more pressure being added to a load test) is not the same as a user either. The number of test threads that are utilized and their duration are typically configured in the step load definition of a test.
- Verify that the test was successful
The completion of a test does not necessarily mean it was a “good” and capable of successfully answering the questions in the test plan. It is important to verify and validate the test was sending the right requests where it was supposed to and getting the expected responses.
A quick manual quality control (QC) check of the request composition in the test can help with the former.
While monitoring the average content length (per response) can help with the latter.
Most test software provides a why to capture the average content length. The general rule of thumb is that the average value for this metric should hold constant throughout the test. If it increases or dips drastically, further investigation is recommended as the expected response for the requests may not be coming back (e.g. errors instead of content).
Additionally, it is important to determine if the requests themselves were successful (e.g. HTTP 200). Some test software may allow the analyst to configure validation checks on the responses within the test itself. That said, the profiling of average content length metric, usually provides a more accurate view on the expected response from the server.
- Test results are not a guarantee of the support for X number of users
Test results only validate the tested workflow. This tested workflow will show throughput for a specific type of request with a corresponding response time. It does not promise or guarantee that the deployment will support X number of users.
Remember, the definition of a user can vary, and can mean different things for different deployments.
- Avoid testing shared resources like ArcGIS Online or Google Maps
Free and public service offerings from ArcGIS Online or Google Maps are there for the “community”. Such resources are quite robust and scalable but cannot be performance tuned with respect to every user.
Since they are not directly part of an on-premise deployment, they should be considered an “external” resource. As a result, requests to them should be removed from a load test as the test should focus solely on the capabilities of its own hardware.
- Repeatable test results
If the results for a load test against an ArcGIS Server service show similar trends lines across test runs (e.g. the same throughput is achieved around the same point during the test), the resource is generally considered “stable”. Being able to repeat the results for a test is a good characteristic.
When results are not immediately repeatable, the tester/analyst must look deeper and try to understand the inconsistent behavior. It could be that the hardware was being used to service requests other than the test (e.g. another user on the system). Or, if the deployment was on shared infrastructure (e.g. virtualization), the underlying hardware was being utilized for another purpose (other virtual machines were executing resource intensive tasks). For cases such as this, conducting the load tests during non-peak hours might yield more reproducible results showing that the service has the potential to be stable.
- Keep it simple
Sometimes the most informative tests are simple and not overly complex.