Solved! Go to Solution.
I'll add that I too have these issues and concerns. Our 10.0 ArcGIS Server was running with about 500 services. Then our transition to 10.2.2 began to get unstable with around ~200 services.
I found that setting the service recycle time to a random value so that all the services will not restart at the same time (00:00) helps. Also found that setting the service start-up timeout to a higher value (currently 15 minutes is working) seems to ensure the services start-up. Otherwise some will timeout, fail to start, and then my guess is the config-store gets confused, and managing these particular services sometimes fails.
"Also found that setting the service start-up timeout to a higher value (currently 15 minutes is working) seems to ensure the services start-up. "
Hello Andrew. Can you tell me where this setting is? Are you referring to The maximum time a client will wait to get a service?
Priscilla,
See the maxStartupTime property of a service. You can set this value through the ArcGIS REST API.
Thank you Erin. So this is not something that you can set through the manager correct? Is there any way to change the default?
Correct, it doesn't appear that maxStartupTime is exposed through either the Desktop or Manager UIs. It's likely one of those properties that the vast majority of users never need to change, so it's omitted for brevity.
And, to my knowledge, the default values for the various service "timeout" properties are not configurable, or at least the configuration is not officially documented by Esri.
I seem to have become more stable over time - probably because I reboot the servers once a week now over the weekend.
I have one lingering issue that's driving me nuts. Intermittently, my REST responsiveness will drop dramatically. Looks like about every 10 minutes, and only for a couple of seconds. But, and request made during that time hangs real bad. I have a script that pings my web adapter, 6080 on the server, and a web adapter on the server. Generally speaking, they all experience the slowdown at the same time - so I dont think its a web adapter specific issue, but something at the AGS root level. Check out the perfectly acceptable response times, then Boom!
"Information","ajp-bio-8014-exec-2","02/19/15","13:38:15",,"119 ms"
"Information","ajp-bio-8014-exec-2","02/19/15","13:38:30",,"116 ms"
"Information","ajp-bio-8014-exec-3","02/19/15","13:38:45",,"119 ms"
"Information","ajp-bio-8014-exec-2","02/19/15","13:39:13",,"13462 ms"
"Information","ajp-bio-8014-exec-3","02/19/15","13:39:17",,"2984 ms"
"Information","ajp-bio-8014-exec-3","02/19/15","13:39:30",,"184 ms"
"Information","ajp-bio-8014-exec-2","02/19/15","13:39:45",,"102 ms"
"Information","ajp-bio-8014-exec-2","02/19/15","13:40:00",,"95 ms"
I believe I have ruled out network issues. Done lots and lots of ping and connectivity tests. Everything seems to check out there. Its just like the box 'chokes' every ten minutes for a couple secs. Nothing in the logs that correspond with these 'choke' times that I can see. I thought maybe Disk IO on the data store, but that seems to be fine as well. Im at a loss.