pfoppe

ArcGIS for Server - Adding Machine to Site Error: 'Unable to access the config store' (SOLVED)

Discussion created by pfoppe on Nov 7, 2016

We resolved this issue and wanted to share with the community (Knowledge Base)...

 

Our agency has quite a few ArcGIS for Server (AGS) deployments ranging from a 1 machine site with a local config store to multiple machine sites with a shared configuration store.  We recently went to add a machine to our *DEVELOPMENT* environment and ran across an error that the new machine was unable to access the configuration store: 

 

{"status":"error","messages":["Failed to register the server machine '<HOSTNAME>'. Server machine 'http://<HOSTNAME>:6080/arcgis/admin' returned an error. 'Unable to access the config store on '\\\\<server>\\<share>\\config-store'.'"],"code":500}

 

This site was originally built with 4 machines, 1 in each cluster (with a shared config store)... but over the years they were all yanked out for other purposes and we needed some additional capacity for the amount of services hosted.  We are currently running the Esri ArcGIS for Server v10.3.1 with the Security Update 2 patch applied (JULY 2016)

 

We spent a fair amount of time troubleshooting this...  Most of our AGS deployments access the config-store on a DFS path that is served up from a file share on a windows server.  This has worked pretty well for us in the past as our file servers are rolled into our back-up procedures; we have successfully restored configuration stores to an alternate location and bring up ArcGIS server to a previous point in time.  

 

To troubleshoot... we started by restoring our config-store to an alternate server and were able to get the site up and operational without issue, but had the same problem trying to add a new server... "Unable to access the config store...".  We loosed all sharing and NTFS permissions without resolve.  We sifted through all the known logging locations and could not find any valuable information to chase.  We logged into our AGS machine with the service account used to run AGS, connected to the config-store with success.  We shut down the site and cleaned up any existing .lock files (which have been troublesome for us in the past)

 

Assuming there is some corruption in the config-store... and not having much success troubleshooting it over a few days, we were about to give up and make our GIS business side re-build all their dev services from scratch (this is a shared development environment with ~120 services with ~30 publishers).. we decided to look at the network level using wireshark as one last ditch effort.  We completed a packet capture during the attempt to add the site and found the AGS machine communicating with the file server: 

 

and on the SMB Create Request File it specifically had: Disposition set to FILE_OPEN

FILE_OPEN

0x00000001

If the file already exists, return success; otherwise, fail the operation. MUST NOT be used for a printer object.

And the file server responded with STATUS_OBJECT_NAME_NOT_FOUND.  At that point the AGS machine responded with an HTTP 500 status code (quoted above).  

 

We ran this attempt a few times and noticed that it was trying to open the same file every time.  Reviewing other config-stores showed a file present (with a different value in the file name) which all appear to be empty. 

 

THE FIX:

We copied over a <obscure_number>.dat file from a different installation into this config-store and named it the file it was trying to open: 0350a677b607f5f86226be7b50ca4073d4710b8f.dat

 

Attempting to add the ArcGIS Server to the site no longer threw that error message (we ended up with a different error message about not being able to validate data stores... but we have previously and frequently run into that and just delete data stores that are no longer valid).  

 

I hope someone finds this useful if they run into a similar issue.  We will be watching the site for stability, but a quick cursory test shows success and high confidence of this fix.  

Outcomes