pfoppe

ArcGIS Server: Missing layers in services... Cleared Rest Cache Fixed Issue (v10.3.1)

Discussion created by pfoppe on Feb 18, 2016
Latest reply on Feb 1, 2017 by tstrelt@sfwmd.gov_sfwmd

FYI only... posting to community in case others run into the issue...

 

Problem:

Every once in a while (month or two) we get reports from users that some layers in some services are missing.  Our monitoring does not catch this condition since its very sporadic.  Here is an example of the available layers in a service (normally):

service_layers.PNG

The layers as they looked today during this condition:

service_layers_trimmed.PNG

Resolution:

Bottom line... Clearing the rest-cache in the admin API resolved our issue:

rest_cache_cleared.PNG

Background:

We have quite a few ArcGIS Server (AGS) deployments.  A mix of web-tier and token based security...  This problem has occurred on both (confirmed v10.3.0 using token and v10.3.1 using web-tier).  All AGS deployments are Virtual Machines (VM's) running WIN Server 2008 R2 and configured with access from web-adaptors hosted in IIS.  Not federated with portal (stand-alone).

 

We have our config-store/directories on a dedicated file server (running multi-machine sites)... and configure the arcgis server to use a DFS path (we have had to move it around a few times... and this makes it really easy for us to move it). 

 

We normally publish web-services with the source data in a File Geodatabase (FGDB) residing on disk, mapped with a DFS path as well (but those are usually on a different file server). 

 

User-store is configured to use "Windows Domain" and role-store is built-in. 

 

We are doing basic host based monitoring (every 60 seconds) using the IPSwitch Whats Up Gold (WUG) product based on standard PING, windows services (ArcGIS Server) and some HTTP Content monitors (for some high-use services and the root on the rest-endpoint over port 6443)

 

Further Thoughts:

I really do not like relying on server/service reboots to solve our issues... Our servers are already rebooted too often (mostly for patches being applied) and I think that is what ultimately caused this issue.  I would prefer to find the 'component' degraded and fix that individually without an aggressive reboot.  The site impacted today has 2 back-end AGS hosts, both of which went offline for ~6 min around 3:30am:

wug_ags_node1.PNG

wug_ags_node2.PNG

The back-end file servers (both config-store/directories and FGDB hosting) were still online during this time period:

wug_config_store.PNG

wug_fgdb_hosting.PNG

Before clearing the rest cache... I stopped both of the machines from the AGS manager (web-based).  I watched all the ArcSOC.exe's go away and it left a handful of .exe's behind including 1 javaw.exe.  Starting both machines back up brought back all ArcSOC.exe, but testing from multiple machines showed the missing layers still.  I did not execute a windows service re-start, and suspect the issue would have been resolved that route (since all .exe's would have disappeared). 

 

If this continues to be a problem we will most likely script a REST cache clear for all the ArcGIS Server deployments and schedule run in the early morning.  This is a hard condition to identify since we have so many services hosted and IT does not know what layers are in what services (that is managed by the GIS users). 

Outcomes