Select to view content in your preferred language

Failed to get the configuration of the server machine...

9174
17
05-22-2014 06:58 AM
RoyceSimpson
Frequent Contributor
Hi All,
For the past couple weeks we've been getting some AGS 10.2.2 errors that cripple our map services until I reboot the servers.  Have a look at the attached log screenshot.

We have two AGS servers that share their config-store and other directories on a network appliance.  I've configured both servers to use UNC pathing as in:  \\netapp\gisserverdata\arcgisserver\config-store.  This has been working great with no issues for well over a year.  I upgraded from 10.2 to 10.2.2 in early May and for the past couple weeks, we've been sporadically getting this "can't connect to the config-store" error, which in turn totally borks our map services.  The only remedy I've found is to reboot the two AGS servers.  The problem is resolved and everything works great, until it happens again... usually within a day or two.

I've reinstalled both ArcGIS Server software on the two servers to no avail.  I've checked numerous times with our sys admins to see if they are having connectivity issues with our network appliance with no issues there.

Thanks for helping out.
Tags (2)
0 Kudos
17 Replies
MichaelVolz
Esteemed Contributor
Maybe you can check what the differences in the dev/test environment is compared to your production environment with your admins.  They could have possibly applied an OS patch to dev/test that was not applied to production (or vice/versa).

Do you know if your dev/test servers are absolutely identical to your production servers?  I ask because my organization builds our servers at different times so there could be slight differences (Not an ideal situation for a controlled environment) between the environments which could explain why dev/test is working as expected and you have a problem with only your production environment.

Do you have a heavy load on the production servers when this issue occurs?
0 Kudos
RoyceSimpson
Frequent Contributor
Maybe you can check what the differences in the dev/test environment is compared to your production environment with your admins.  They could have possibly applied an OS patch to dev/test that was not applied to production (or vice/versa).

Do you know if your dev/test servers are absolutely identical to your production servers?  I ask because my organization builds our servers at different times so there could be slight differences (Not an ideal situation for a controlled environment) between the environments which could explain why dev/test is working as expected and you have a problem with only your production environment.

Do you have a heavy load on the production servers when this issue occurs?


Load is never at a point where I'd think the servers would struggle to communicate properly with the config-store but I'm monitoring things very closely to make sure of that.

I'll also look more closely at our test/dev env vs our production env.  This is a very tough nut to crack.
0 Kudos
RoyceSimpson
Frequent Contributor
On Friday, we found the our AD ArcGIS Sever account had been removed from our server admin group.  We added it back in and haven't had the "config-store" communication issue yet.  Will update this thread in a week or so with a "yay or nay" on this being the fix.
0 Kudos
RoyceSimpson
Frequent Contributor
On Friday, we found the our AD ArcGIS Sever account had been removed from our server admin group.  We added it back in and haven't had the "config-store" communication issue yet.  Will update this thread in a week or so with a "yay or nay" on this being the fix.


This didn't fix the issue.  We continue to have this issue about every 4 days or so.  I'm beginning to think this isn't a config-store issue but some sort of issue with Server 10.2.2 is getting all gummed up and not serving up map service cache tiles in a timely manner.  I've been monitoring our web server and prior to the servers being disconnected to the config-store, the web server's "active requests" start going through the roof.  What that means is, when someone fires up one of our map service maps, the requested tiles are not retrieved and the request stays "active" on the web server.  The only fix has been to reboot our AGS servers.  Once that is done, the "active requests" goes to a very low number and the site is peppy again.  Until a few days pass an this issue rears up again.

Also, one of the AGS errors that comes up when this happens is:
"SEVERE May 31, 2014, 5:33:49 PM Failed to get the configuration of the server machine 'COL.LC.GOV'. Too many open files"

Time to open a ticket.
0 Kudos
deleted-user-UxcAu2PHrDQp
Deactivated User
did you get this resolved?  Same issue here.
0 Kudos
RoyceSimpson
Frequent Contributor
did you get this resolved?  Same issue here.


No resolution to the issue.  But, we are onto a new lead...  We are thinking that AGS 10.2.2 is not releasing files properly (or something like that)... such as our cached map service tile files, which are stored on a "netapp" network drive appliance. 

We are noticing that the netapp is showing a constantly rising "open files" number for each server of our two server based AGS site.  Every other computer connected to the netapp is showing zero to a few open files at any given time.  The number goes up for a bit then goes back down.  For our AGS servers, the number just keeps going up and up and up.  As I write this, AGS server 1 is at 60,000+ open files and server two is at almost exactly the same, for a total of about 120,000 open files for the two servers. 

Our netapp sys admin has been on the phone with netapp tech support and apparently the netapp doesn't like it when that number gets to (I'm not sure of the exact number but we'll say...) 260,000.

So, we are watching those numbers and waiting.  In the past it has taken about 4 or so days to have this "config-store" crash.  That meshes with what we are seeing with the netapp open-files thing.  We'll see.

Anyway, we are looking at two choices... downgrade to 10.2 and see if that clears this up or move all our shared files (config-store and cached tiles) off to a different form of shared storage, such as Compellent.

I'll keep this thread updated as we go.
0 Kudos
RoyceSimpson
Frequent Contributor
I downgraded one of our servers to 10.2, added it back to the site and now we have no locked files on the netapp for the 10.2 machine and a ton for the 10.2.2 machine. 

That pretty much wraps this up.  Time to revert the other machine back to 10.2.

Conclusion:  AGS 10.2 = 🙂

AGS 10.2.2 = 😞

(click emoji for more detail)
0 Kudos
JustinGreco
Frequent Contributor

I just got alerted that a patch was released that addresses the "Too many open files" issue.  It has been resolved at 10.3.1.  It is an issue at 10.2.2 and 10.3.

http://support.esri.com/en/downloads/patches-servicepacks/view/productid/66/metaid/2197