AnsweredAssumed Answered

ArcGIS (10.1 SP1) Site and Web Adapter randomly crash and stop responding

Question asked by geonetadmin on May 21, 2013
Latest reply on Feb 19, 2015 by Buddhatown
Original User: btelliot

We are struggling with two main problems since moving from ArcServer 10.0 to 10.1.

1. Poor Performance
2. Constant Server downtime / general site instability.

Our ideal server architecture would be to have a multiple virtual machine site, with 2 clusters, and a single web adapter running only using SSL.  See attached image for configuration.
[ATTACH=CONFIG]24566[/ATTACH]

We currently have about 350 services running on our site. 

  • ~300 of which are configured with a minimum instance of 0 (should turn themselves off) and a max instance of 2.

  • ~20-30 are cached

  • All running in High-Isolation

  • licensed for 4 cores per machine and additional staging license (12 cores total).

  • 16GB ram per machine.

  • web adapter is running with 1 core.

Performance

Our main issue with performance comes from administering / publishing services.  Since we have multiple machines, we need to reference the config store from a UNC path.  This is a known bug that should be fixed in SP2.  (Why they haven't released a hotfix for this is beyond me).  For more details see thread:  http://forums.arcgis.com/threads/66388-Slow-performance-administering-services-in-ArcCatalog-and-ArcGIS-Server-Manager-10.1

However, we also have performance issues on the web client side of things.  These issues are intermittent and and difficult to replicate.  We can measure this latency using the Network tab of the "Developer Tools" in google chrome.  It will sometimes take 3-5 minutes for the server to return the data to the web browser, even on cached services that are already running

Depending on our configuration and the complexity of the MXD, publishing a service usually takes around 5 minutes at the best of times.  At the worst, republishing an existing map document can take up to 30 minutes.  If anyone else has experienced any of these issues please let me know!

We have monitored our system resources on the virtual machines, and we rarely hit upwards of 30% CPU usage, unless caching or restarting the machines.

Stability

Since moving to 10.1, we have maybe had a maximum of 1 week go by without a server outage / issue.  As we are growing as a company, more people are relying on our services in their workflows, and downtime becomes less and less bearable. In theory, a multiple machine site should be more stable.  One server goes offline, the web adapter recognizes this and redirects the traffic to a different server.

Main Issue:

  • We have noticed that ArcServer running on one of the machines will periodically crash and stop working.

  • We don't see a spike in system resources, or any other telltale signs on the vms, it just stops responding as it should.

  • We will experience this at least once a day.

Our normal fix is:

  • Check to see if the web adapter is responding; if not, restart the VM

  • Check to see if each individual machine is responding (try to log-into the ArcGIS Service Manager); if not, restart the VM

  • Reboot whichever server is crapping out, if that doesn't work, try the other one(s).

  • If it still doesn't work, try stopping the machine from https://[machinename]:6443/arcgis/admin, and then starting it again.

If anyone has some insight into what may be causing this issue, please let us know.

Thank you for taking the time to read this!

Brett



TL;DR: ArcServer 10.1 SP1 is still very buggy.  ArcServer will randomly stop working, and we will need to reboot the virtual machine it is running on.  We have to do this A LOT.

Outcomes