I have a Python script running on a 64-bit Windows Server 2012r2 machine with ArcGIS Server 10.2.2. The script has a list of twelve services (stored as a Python dictionary) that it stops and starts. 90% of the time all services in the list stop and start successfully. The other ten percent: not so much. The JSON error message returned says
Could not undeploy services from one or more machines. 'com.esri.arcgis.discovery.admin.AdminException'.
The log message in ArcGIS Server Manager shows up as Severe level and says pretty much the same thing. I haven't been able to notice a pattern in how often or which service it fails on. Sometimes (like this most recent time) it failed to stop a service in the middle of the list, then continued on to successfully stop the rest of the services. Later on in the script the same services are started and all started successfully. So I know the script works but it's just sometimes that it fails. My script does the following:
The code I use to stop and start services is based on this work from Kevin Hibma:
ArcGIS Server Administration Toolkit - 10.1+
AdministeringArcGISServerwithPython_DS2014
Here are the Python functions I came up with:
getToken()
def getToken(adminUser, adminPass, server, port, expiration): # Build URL url = "http://{}:{}/arcgis/admin/generateToken?f=json".format(server, port) # Encode the query string query_dict = { 'username': adminUser, 'password': adminPass, 'expiration': str(expiration), ## Token timeout in minutes; default is 60 minutes. 'client': 'requestip' } query_string = urllib.urlencode(query_dict) try: # Request the token with contextlib.closing(urllib2.urlopen(url, query_string)) as jsonResponse: getTokenResult = json.loads(jsonResponse.read()) ## Validate result if "token" not in getTokenResult or getTokenResult == None: raise Exception("Failed to get token: {}".format(getTokenResult['messages'])) else: return getTokenResult['token'] except urllib2.URLError, e: raise Exception("Could not connect to machine {} on port {}\n{}".format(server, port, e))
serviceStartStop()
def serviceStartStop(server, port, svc, action, token): # Build URL url = "http://{}:{}/arcgis/admin".format(server, port) requestURL = url + "/services/{}/{}".format(svc, action) # Encode the query string query_dict = { "token": token, "f": "json" } query_string = urllib.urlencode(query_dict) # Send the server request and return the JSON response with contextlib.closing(urllib.urlopen(requestURL, query_string)) as jsonResponse: return json.loads(jsonResponse.read())
I get the token once at the beginning of the main script and call the serviceStartStop() function repeatedly in a for loop iterating through a list of services.
Have you tried putting in a sleep call to pause the script in between calling serviceStartStop()? I wonder if the server sometimes gets bogged down and file locks aren't released right away. Have you tried re-starting or re-stopping the service that just failed, does it work the second time right after it failed the first time?
I did try putting in a wait time (tried 5 seconds and 12 seconds) between each call but it didn't seem to make a difference. However, I have not tried it again since doing the url open calls in the with statement. But I don't see that it would be too different. I haven't tried every combination of these things.
I have tried putting another for loop using range 3 so it will try three times to complete the action. If one failed, it was never able to successfully complete on the second or third try either (whether start or stop). Again though, I haven't tried this in combination with the wait time or the new code posted above with the separated functions.
What I notice is that it usually gets through at least the first three services when something fails. When one service in the list actually fails, it always fails on all remaining services in the list as well. This is the same as when I was trying three times as well. It also seems to usually be on start that it fails (rather than stop).
Were you able to ever find a solution to this problem?
If not, did you re-architect your environment so this functionality was no longer needed so therefore the issue went away?
I've since retired these services so also retired the script interacting with them. However, I did try republishing all the services with the option to automatically acquire locks disabled and I don't recall having the issue after that (or maybe I just stopped paying attention, can't remember).
How exactly do you setup a service to acquire locks disabled?