Web Server was found to be stopped when it should have been started - messages

Discussion created by lewisdw91 on Jun 13, 2013
Latest reply on Jun 25, 2019 by klassizistisch
I have three ArcGIS 10.1 SP1 servers in a arcgis site.  On one of the three servers, I am constantly receiving the following error message in the arcgis server log:
"The Web Server was found to be stopped when it should have been started. Restarting it."

That message is recorded once per minute, 24x7. 

A little background:
All servers are Server 2008 (not R2), 64bit enterprise, 8GB ram, real servers (not virtual)

Using web adaptor on IIS 7 on a fourth server 2008 server

single arcgis site, two clusters:  default and imageserver.  Two of the servers are in the default cluster, single server in imageserver cluster (has imageserver extension on it)

Using SSL with signed certs

A week ago on a Monday morning, I get to work and the web adaptor does not respond, can not login to arcgis server manager or the rest admin on any of the server.  so I restart the arcgisserver service on all of the arcgis servers.  could now login to rest admin on each server, web adaptor still does not respond.  Restarted IIS and finally can login to the arcgis server manager.  Server manager shows one server started, the other two are down (one from the default cluster, one from imageserver cluster).  I double-check in services.msc, arcgisserver service is started.  I look in rest admin, the two servers that show as down in server manager show as stopped/started in rest admin.  I try to start the servers in server manager and it gives me an error.  So I try it in rest admin and it works.   I refresh server manager and it again shows those two as stopped. 

I give up and reboot all four servers.  I wait 30 minutes or so and then check, same result, one server is started, other two are stopped.  I try to start them in rest admin again, it reports them both as started/started, I check in server manger, and after about one minute, those two servers show stopped again.

At this point I open a incident with ESRI.  The tech wants me to do a repair install of arcgis server on those two servers.  I do it on one and reboot that server, same result.  I send him some of the logs and screen shots and he will get back to me.
I remove the two faulty servers from the site and add them back.  I get several errors from server manager while it tries to add them back (I did not record those errors, wish I had), but, after a refresh (f5), server manager shows both of those servers added back, but still stopped.  I try to start them in rest admin and it shows them started/started.  I check in server manager, and they are both started.  I wait 5 minutes, still started.

I check the arcgis server logs at this point, and the two faulty servers are now showing the following error every minute:
"The Web Server was found to be stopped when it should have been started. Restarting it."

I reboot those two servers again, still showing that error in the log EVERY MINUTE.  I send the logs and screen shots to ESRI again, and ask for the ticket to escalated.  They escalate it, and I have to go over everything again with the new person. 

We have looked at a few things and sending e-mails and logs back and forth, but no resolution to date.

All published services seem fine, but we do have to reboot the servers every other night because arcgis server gets unstable after a while and services will stop working and generate various errors in the log, a reboot clears it up.  I miss arcgis server 10.

Two days ago, one of the two servers STOPPED generating the web server error message, with nothing changed by me.  Still nothing from ESRI.  The server that is still recording the error is the one in the imageserver cluster.  I moved it to the default cluster temporarily to see if that would affect it, it still logs the error every minute.
Does the apache web server used by arcgis server 10.1 create logs somewhere?  I'm not familiar with apache, can we "control" it somehow, any command line interface to it?  Any one have any ideas about the error that posts every minute?