Hello
We are running ArcGIS Enterprise 10.9.1 on Windows Server 2016 Standard (single machine deployment) with all available patches installed.
We have been experiencing intermittent performance issues for the past several months. These performance drops appear to have no correlation with usage, and seem to randomly happen at any given time of the day.
We have been working with our IT network team and, thus far, all of our available tools tell us everything is fine in terms of the server/network load. We were able to install ArcGIS Monitor (2023.2), which has given us some metrics quantifying how bad the performance gets but we're still unable to identify the cause. I have attached a graph from Monitor showing the average request response time versus request rate for all services over the past 7 days.
Does anyone have suggestions where to go with this in terms of pinpointing the issue?
Michael,
Are your users noticing performance problems? If all you have is some graphs showing some slow response times you might still be satifying 99% of your users. And just 1% of your users are doing stupid things and causing their own problems / slow responses.
To look into the problem I would look at the web adaptor logs. My ArcGIS Server deployment has the web adaptors running on a separate Linux server under Apache. On your single machine deployment your web adaptor is likely running under IIS but I am not certain. Go into the IIS logs and at look at the incoming requests. The web adaptor log file should include the duration of the request. Examine the long running requests and identify what it is your user was trying to do. The web adaptor log records should include the REST URL of the request. Are they requesting a lot of data? Are they trying to match a search term against all attributes in a map service instead of just querying against a single attribute? Is there one service that is causing most of the slow requests? Are all of your slow requests coming from the same web app? Is someone trying to scrape data from your map services? You should be able to get a lot of info from your web adaptor logs.
I find the web adaptor logs very useful for identifying long running requests. Every day I run a report to identify all requests that exceed 60 seconds. I usually have to examine about 100 records each day. We get close to 1 million requests each day. Most of the time I just look at the time stamp on the request:
If I see a slow request outside of these times I will look more closely and try to determine why the request was slow. I'll copy the ArcGIS REST URL from the log file and run it in my browser. I'll examine all of the parameters in the URL and determine what the user was trying to do. This helped me identify a Web App Builder app with a poorly configured Search widget. It was one that I had configured myself.
I hope this helps,
Bernie.
What type of services do you have running on the server? Are they all map services, or do you have print/geoprocessing services as well?
If our servers are creating map images or returning JSON/PBF most of the time, then we're typically dealing with lots of quick fire/short turnaround requests all day and everything looks good. But imagine that you have a big geoprocessing task that can be called that takes 'an age' to crunch some data. While that is running, it's hogging resource and making it difficult for maps to be produced, because they can't get that resource that is now locked. So things bottleneck, slow down, and in worst case scenarios they'll time out.
It's not bad performance per-se, it's just an impact of using a single server for trying to do everything.
A single machine deployment of ArcGIS Enterprise is definitely doable for clients of a certain size, and many clients of mine are in the same boat - but they have additional ArcGIS Servers to deal with longer running or heavy lift processes. Throwing everything into one basket is a recipe for disaster.
I'd strongly recommend finding a trusted partner that can perform an architectural and health review for you.
Our Enterprise deployment was plagued by intermittent performance issues earlier this year. Turns out some of our feature services had some pretty hefty definition queries running against them while others had these crazy long values in some procedurally generated fields that weren't really needed. Once they were killed off performance got right back to normal.
Hi Michael,
How does the "performance issues" present itself?
There isn't enough of a time window from your graphs to make any specific / meaningful diagnosis;
I'd suggest a 14 (maybe up to 21 day) window and see if there's a pattern (for example, a spike every Tuesday?) and then drill down with more granular graphs (hour resolution) to see if there's a consistent start time for when the utilization spikes are. If there's a consistent start time, that's often a strong indicator of a scheduled system/human process ... if no pattern, pick the worst spike and then pull out all resource consumption (cpu, memory, disk I/O) for a +/- 12 hour window of the spike, related ArcGIS Enterprise component logs and database server logs for the time period.
Good luck!
-Derek