10.6.1 HA ArcGIS Server Site Crashing

Jump to solution
01-30-2019 08:30 AM
Frequent Contributor

Since upgrading to 10.6.1, we have seen instances of machines in our ArcGIS Server site crashing (its not always the same machine).  After doing some deep diving into the logs and ArcGIS Monitor, we are seeing spikes in the available memory (going from 5GB available to almost the full 32GB).  Usually these are just spikes, but on occasion we see the available memory flatline until we go in and restart the service on the server.  These spikes I have correlated with warnings of the machine synchronizing with the site.  When this happens, it typically stops all the services and brings them all back up, which appears to be what synchronizing with the site does.  The issue seems to be when one of the SOC processes hangs, killing this process (or restarting the ArcGIS Server service) brings the machine back online. Unfortunately it is never the same map service that is hanging.

I do have a premium support ticket open for this, but wanted to see if anyone else has experienced this.  Premium support's recommendation was to increase the number of cores on both servers, which was done, but we continue to see the crashing.

Since we are coming from 10.3.1, I am not familiar with the new optimized app server architecture that was introduced at 10.6.  I am thinking the synchronizing with site function was part of this.  I have not found any documentation or anything into the logs as to when or why the server needs to synchronize with the site.

Some details on the site:

10.6.1 ArcGIS Server running on 7.5 RHEL

32GB RAM and 6 CPU cores

Federated with Portal as the hosting server

Running about 52 map image services (currently no hosted feature services yet)

There are no reports of synchronizing during the night or weekends, so it is definitely happening when there is heavy traffic.

54 Replies
Frequent Contributor

Hi Peoples,

We took a fair bit of time to look into the active-active deployment.  For the most part it worked.  However when it came down to the Geoprocessing services, and the Sync process that are asynchronous - not easy to solve with the sticky sessions.  Given the time frames - had to ditch this approach.



0 Kudos
Occasional Contributor

Hi there

we recently upgraded to 10.6.1 back in June on brand new servers. We have 3 sites of multi server clusters. One site is so flakey. The arcgis server service keeps stopping. The servers keeping crashing. From reading above our situation sounds very similar.

I originally thought I could tie this down to a time and day but recently it has become more frequent at crashing. The physical memory on the machine spikes then dives until everything crashes.

One thing I have noticed when checking through the windows event viewer is that is often a windows update that has gotten into trouble some how and wants to reboot but cant. Has anyone else noticed this?

0 Kudos
by Anonymous User
Not applicable

Hi, have you applied the ArcGIS Server Unintended Service Restart Patch? We experienced similar issues where we would see a memory spike then dump. Initially this was when services were published, however it started to occur more randomly on one of our sites. The patch appears to have resolved the problem.

New Contributor

Hey Mitch

Yes we have.  We have caught up with all server patching now and everything seems to running smoothly at present. Thanks

Frequent Contributor

We are seeing this issue frequently when publishing and replacing Vector Tile Services on a multi-machine federated Windows 10.8.1 ArcGIS Server Site. It only occurs on the second machines.

After synchronization, sometimes the service folders are missing (\arcgisserver\config-store\services\Hosted and arcgisserver\directories\arcgiscache\VectorCache\Hosted) causing the whole service to be corrupted.

On the ArcGIS Server JRE side, I always see some sun.nio.fs.WindowsException exceptions (probably on config store files?).

Is there any patch for 10.8.1?


0 Kudos