10.4.1 vs 10.5

DaveTenney · ‎04-03-2017

All,

looking for some real world input from those who have used 10.4.1 for server and have recently upgraded to 10.5

we have been trying to vet 10.4.1 in our staging environment to be used in our production environment and at this point in time we cannot vet 10.4.1. i dont want to get into details here on that, we are providing a lengthy response to this directly to ESRI.

what i would like to know is how people feel about 10.5 so far? were you experiencing issues with 10.4.1 that seem to be resolved with 10.5? i would assume we are fairly close to a 10.5.1 release (around the time of the UC). again, just looking for some real world users and your input.

thanks,

dave

MichaelVolz · ‎08-23-2017

David:

Are you still using this process to rebuild your address locators? If so, do you have your geocode services being load balanced in AGS across at least 2 servers? Or do you serve out your geocode services from only 1 server?

DavidColey · ‎08-23-2017

Hi Michael, yes. I stop my locator services, perform the rebuild, start the services. Yes the services are being load balanced in my 2-machine mapCluster

MichaelVolz · ‎08-23-2017

How do you stop the locator services? Do you use AGS admin python tools found below for a default install location?

C:\Program Files\ArcGIS\Server\tools\admin\manageservice.py

If not, can you go into detail how services are stopped as that is the current problem because orphaned soc processes for the locator service are still running even after the locator service has been stopped? This is still occurring at 10.5.1 at my organization.

What do you have your minimum and maximum number of instances per machine set to for your locator services?

DavidColey · ‎08-23-2017

no not exactly. I run the agsAdmin.exe in a bat:

chdir /D V:\BatchDBScripts

agsAdmin.exe servername 6080 username password start CompositeLocator.GeocodeServer

as a scheduled task.

I can use either machine name since they are clustered

DaveTenney · ‎04-27-2017

something i would really like to know is how long does it take the ESRI server to completely stop a service and truly recognize that the service is indeed down?

below is a snippet of our script that goes and stops a service and then we have a powershell component that goes in and looks for any instances of that service still running after waiting 30sec from the time we got a notice from the server that the service "stopped".

2017-04-27 13:50:03.903000 getting token for siteadmin on "esri server instance"
200
{"token":"really long esri token number goes here..","expires":"1493319003933"}
Processed folder information successfully. Now processing services...
Service Roads.GeocodeServer STOP successfully.
2017-04-27 13:50:05.456000 Thread sleeping for 30 seconds
2017-04-27 13:50:35.458000 Checking for zombie ArcSOCs on "esri server instance" for Roads.GeocodeServer
2017-04-27 13:50:35.458000 Checking if "esri server instance" has Roads.GeocodeServer processes still running
4/27/2017 1:50:35 PM

4/27/2017 1:50:35 PM Looking for processes under name *Roads.GeocodeServer*

4/27/2017 1:50:36 PM 3 process(es) found

4/27/2017 1:50:36 PM 3556 5816 11316

4/27/2017 1:50:36 PM Looping process(es)

4/27/2017 1:50:36 PM Getting process 3556

4/27/2017 1:50:36 PM Found and terminating 3556

4/27/2017 1:50:36 PM System.Management.ManagementBaseObject

4/27/2017 1:50:36 PM Terminated 3556

4/27/2017 1:50:36 PM Getting process 5816

4/27/2017 1:50:36 PM Found and terminating 5816

4/27/2017 1:50:36 PM System.Management.ManagementBaseObject

4/27/2017 1:50:36 PM Terminated 5816

4/27/2017 1:50:36 PM Getting process 11316

4/27/2017 1:50:36 PM Found and terminating 11316

4/27/2017 1:50:36 PM System.Management.ManagementBaseObject

4/27/2017 1:50:36 PM Terminated 11316

so as you can see even after waiting 30s something still has a hold of the process even though we got confirmation the service stopped. obviously, you can move forward with updating or removing data if the server still has a lock on it so we were forced to create the PowerShell script to kill any "zombies" as we call them. We dont really want to have to forcefully kill the process b/c that could cause unforeseen issues.

again, we are running this process in a 10.2.2 environment without any issues at all but for some reason this is not liked in 10.3x, 10.4x, 10.5

dave

MichaelVolz · ‎04-28-2017

Have you tested to see how long it takes for these "zombie" processes to end without having to programmatically kill them (Wait longer than 30 seconds - e.g. 1 minute, 5 minutes, 10 minutes, etc)?

DaveTenney · ‎04-28-2017

i can tell you we tested it a lot in 10.2.2

there it took anywhere from 5mins to 24hrs! thats why we decided to create our own process of killing them.

yesterday was the first time i actually watched it with my own eyes in 10.5, 5min wait was the longest "wait" we did. honestly, 5min is about 4min too long.

we have also noticed that introducing Load Balancers to your system does some funny things to services, we havent really dived into this issue yet too deep. all we know at this point is our system works 99% of the time without a load balancer, but the minute we introduce a load balancer that percentage drops drastically!

dave

MichaelVolz · ‎04-28-2017

Does this issue only occur with geocode services?

I think I run into this issue on a fairly regular basis, which I alleviate by bouncing the 2 two servers consecutively in the cluster.

Do you have the snippet of code for killing the "zombie" processes?

DaveTenney · ‎04-28-2017

this happens mainly with the geocode services, but we do have one dynamic map service this happens too as well.

i can probably get a snippet of the code.

aslo, after more testing today...

step 1: run update process with load balancer OFF, we had 5 successful runs with no issues

step 2: run update process with load balancer ON, the process fails

step 3: run update process with load balancer OFF, the process is successful

repeat the the on vs off process and the pattern has remained the same.

JonathanQuinn · ‎04-28-2017

Do you have any settings within the load balancer to keep sessions alive? Perhaps it's retaining a connection to the service, preventing the process from fully being destroyed.