does anyone know of any major issues with ArcGIS for Server javaw consuming massive amounts of memory and almost bringing down an entire server instance?
There is a bug at 10.2.2 that's resolved with a patch:
ArcGIS for Server Security 2016 Update 2 Patch
BUG-000082423 - Under consistent load, the javaw.exe process at ArcGIS 10.2.2 for Server consumes 25% of the server's RAM, and any further request forces the process to use 100% of the machine's CPU.
Do you have this patch installed?
thank you so much for the help, will get this patch in place ASAP!
do you know if this bug was fixed in 10.3.1?
Yes, it does appear to be fixed at 10.3.1. Definitely install the patch, hopefully it'll help.
We are seeing similar behavior in 10.4.0 (build 5524): javaw.exe (with logging on the command line) taking 70-90% of the CPU (it climbs over time but seems to start @ 70%) and 2.5GB of memory (which is not really a big deal). We are running VMWare and have around ~60 services. I recently took over the server and honestly am unsure of how long it has been doing that, although there haven't been any major changes, that I know of, recently.
Rebooting seems to fix it for a couple of hours and then it starts again. From what I can tell, there is nothing in the error logs that coincides with the problem starting.
Has anyone seen something similar or is there a fix or a patch?
i would be very interested in knowing if you figure out more....
if you see above, this issue was supposed to be fixed in 10.3x, and was supposed to be fixed in 10.2.2 with their recent security patch. We have installed the patch for 10.2.2 i have noticed some spikes still...upwards of 2gb of memory, after patch was installed and machine was rebooted.
please keep us updated.
Will do Dave - thanks for the additional info!
Regarding the javaw.exe processes, RAM usage increasing isn't a problem as long as it stays lower or equal to 25% of the available physical RAM on the machine. The main problem is the CPU usage which shouldn't happen even if the RAM usage is at at 25% mark. It'd be useful to get a better understanding of load on the server before and during the time that your CPU is at 70-90%. Is the Server handling at lot of requests at that time? If so, what type? Export map requests, GP services, etc.
The server load, in my opinion, is not over the top but I could be wrong as I am not entirely sure what constitutes a large load on ArcGIS server (sorry, I am new to the admin side of ArcGIS Server - long time developer). I have uploaded an image of the last 30 days on our server to give you an idea of what our usage looks like.
Our services are almost entirely pre-cached (the cache is built on a different server) Map Services and a couple of Dynamic and Editable Map Services and a handful of Image services. We are also running ArcGIS Portal on this server and access it via AGOL. I have confirmed that the copy of javaw.exe that is giving us the problem is the version running for ArcGIS Server not Portal.
Our logging was previously set to warning, although I have bumped it up to Fine to try and determine if there is a specific service causing the issue. It honestly appears to be irrespective of use (although I have nothing empirical to prove this) and the CPU usage just grows over time.
We have gone through our service catalog and shutdown about 25% of the services that were running but no longer being used and I will wait to see what happens with a restart. I am totally open to suggestions on how to collect more data or try and fix things. Also totally open to opening a support ticket.
Thank you again Jonathan - much appreciated!
Quick update: I believe I have tracked down the culprit. It appears to be Google bot requesting kmz files for our services that is driving the CPU usage on the javaw.exe process through the roof. Interestingly enough, some of the links were throwing ASP.NET errors and I was able to perform a reverse DNS lookup of the IP to trace the request back to Google. From there I was able to repeatedly call the kmz request URL from my browser and watch the CPU usage climb by 10%, even for a request that didn't throw and error and actually returned the KMZ.
Jonathan - is there a way to disable KML/KMZ downloading? The KML checkbox in the "Select and configure capabilities" section of "Capabilities" in the Service edit screens is not checked, yet it still allows downloading the KMZ. Or perhaps somewhere I can put a robots.txt file to tell Google to stop trolling our services? Or??
As it stands I feel pretty good that this is the culprit, but like all troubleshooting of this nature, I could be wrong...
I apologize as I feel like I have sort of hi-jacked this thread and that wasn't my intent!
Thanks for reading,
Great find! The first thing you should do is consider adding a robots.txt file, like you mentioned, within your web server.
On the software side, I think it would be a good idea to get in touch with Support and have them investigate why you can still download a KMZ when the capability is not checked and also, why the CPU usage seems to jump so much when making those requests. Those seem like areas for improvement within the software.
I know that if you have something on OpenData, there is no way for you to prevent KML as an option, even if you have it turned off on the ArcGIS Server service side. (say that 10 times fast!) It would be nice if that was an option to turn off.
Justin, I'm not sure if you have your services in OpenData or not, but something to keep in mind. I haven't run into this issue yet since we don't have much out there yet, but my guess we will at some point).
our vm cpu spiked to 98-99% and was sustained at that usage. (we were forced to kills arcgis services and reboot the machine)
at the time of spike we were running GP services and our script failed:
ExecuteError: ERROR 001470: Failed to retrieve the job status from server. The Job is running on the server, please use the above URL to check the job status.
Failed to execute (ManageMapServerCacheTiles).
like i said above, this is concerning since we installed the 2016 Security Patch update 2 last week.
What is the memory usage of the javaw.exe process when the CPU usage is high? The symptoms do sound like BUG-000082423, (Under consistent load, the javaw.exe process at ArcGIS 10.2.2 for Server consumes 25% of the server's RAM, and any further request forces the process to use 100% of the machine's CPU), but you're right in that it should be resolved in that patch.
Quick follow up for anyone interested.
I would say with 99.9% certainty that the KMZ download combined with Google's relentless crawling of our services resulted in our incredibly high CPU usage and interestingly enough our high RAM usage. The javaw.exe process now spikes sometimes to 10-12% but immediately drops back down and is consistently consuming under 1GB of memory at the moment.
The fix for me was a bit drastic - I tried the robots.txt approach but was unsure where that should go on the hard drive. I put a copy all over the place but it seemed to have no effect. I then blocked, in a software firewall, the IP range that Google uses to crawl sites. Fortunately, we really could care less if Google crawls this server as the few number of web applications are all direct links and the server's primary function is as an ArcGIS Server.
I have to believe that this is a bug in the ArcGIS Server platform, but perhaps it is a configuration or data issue on our end... I will try and submit a report as Jonathan suggests above, but unfortunately this already consumed several days of my time that I didn't have.
I hope this helps someone experiencing (whether known or not) a similar issue.
This can also be addressed by disabling the html representation of the services directory.
Retrieving data ...