ArcGIS Web Adaptor 11.1 App Pool freezes

21359
128
05-21-2023 10:04 PM
Scott_Tansley
MVP Regular Contributor

Hi.   

I've recently upgraded a client to ArcGIS 11.1, and we're having random problems with the new web adaptors.  I'm looking to see if anyone has observed the same issues.

So the clients ArcGIS Enterprise was deployed at 10.8.1.  There's an IIS Web Server in the DMZ.  A single host with the rest of the base deployment, which is only used for Hosted Feature Services.  WA's exist for portal and hosting.  There is a third machine with a general-purpose ArcGIS Server, federated and primarily serving Map Image Layers.  There is a Web Adaptor called server.

There have never been any repeated outage issues.  The environment was upgraded to 10.9.1 last year.  Once again no issues.

They were upgraded to 11.1 two weeks ago.  Immediately, we found that the machine was running out of memory, we noted the advice given in the new dependencies, and increased the RAM from 4 to 8GB.  It sits at around 5-6GB with no issues, and we have not seen any spike above 6GB to date.

After adding RAM it all seemed to settle for a few days.  But now, every couple of days the IIS application pool for the 'server' web adaptor will just stall.  IIS logs show 200/304 responses for everything up to the freeze/stall and 500 for everything.  There is nothing untoward in the requeste.

ArcGIS Server is still available on 6443 and can be accessed.  It just isn't receiving requests from IIS.  With Info logging turned on, it shows the last good 200 request from the WA.  Then nothing, no errors, no issues.  It's just as if it's sat there waiting for a request and not receiving it.

There have been no firewall or environmental changes recently, the only change is the upgrade to 11.1 and the addition of memory.

On the web server there is nothing in event viewer, system/admin/security or IIS application logs.

I'm blind.  It's just as if the App Pool WA says I've had enough.

The only way to bring the application back online is to restart IIS.  On the AppPool you can stop it.  But it will not start unless IIS is restarted. 

I'm currently blind.  We've external ping monitoring in place so we know when the healthCheck API fails, but there's nothing else we can do but monitor and restart at this point.

Scott Tansley
https://www.linkedin.com/in/scotttansley/
128 Replies
IanIce1
New Contributor III

Hi,

This was a new observation made and I'm not necessarily making any connections between the web adaptors and CPU spike. I'm not tying to jump all around these forums to report issues. Also, the functionality in AGO was just an observation. As mentioned in my post, the spike lasts anywhere from 10 - 30 minutes after closing out the map viewer with the offending data. Restarting the ArcGIS Service also works. 

0 Kudos
Edgar_W_Iparraguirre
New Contributor III

My excuses as I was not clear enough about stating the your findings - my opinion - probably are better on another branch of this forum.

Anyhow, what you comment about the SOC's spike is known to me. One of our customer had a couple of complex services, which from time to time would exhaust resources, sometimes bring the AGS Service itself and the rmid down, resulting in active but unresponsive SOC's (one could walk through de metadata on AGS Manager/Admin/Rest ... but nothing else). As the AGS Service and rmid went down, all SOC processes became orphaned (kind of zombie). Then prior starting the starting it was necessary to kill all orphaned processes ... and for shure is faster and easier to restart the machine.

IanIce1
New Contributor III

No worries, I actually thought we were facing the initial web adaptor issues again when this happened until our Network Engineer reported the CPU spike. And of course, nothing's going to work well when CPU is maxed out. When I reported this to my esri rep (initially brought in due to the AppPool freezes) I was not directed to create another ticket.

Interesting note about the spikes. I've never seen an ArcSOC or java app spike the CPU like that before, so it's likely a random issue with that particular dataset.  Also, I've never had to restart the ArcGIS Server service so many times just to get things to work again. lol

0 Kudos
lah
by
New Contributor III

This is affecting us too at 11.1 with portal and server WAs in the DMZ. Initial 11.1 Reliability Patch did not do anything for us. I submitted a case request today, to which support routed me to 3 other WA bugs in the works.
Going to attempt the LukeSavage solution and hope a patch comes soon!!!

BUG-000159944 

BUG-000159933 

BUG-000160009 

AndrewBowne
Occasional Contributor III

We are still having issues here when under load - even after implementing the @LukeSavage solution.  I intend to call support today and get myself on the bug-train!

JonEmch
Esri Regular Contributor

Update as of July 27th 2023

   Feedback indicates that the ArcGIS Web Adaptor (IIS) 11.1 Reliability Patch, released on Thursday, June 29, has helped ArcGIS Enterprise users overcome an instability in the ArcGIS Web Adaptor (IIS) introduced in version 11.1. As indicated in the patch page notes, all ArcGIS Enterprise 11.1 users who utilize the ArcGIS  Web Adaptor (IIS) should apply this patch.

   After the release of the ArcGIS Web Adaptor (IIS) 11.1 Reliability Patch, Esri received more reports through the community forum and Technical Support cases that the ArcGIS Web Adaptor (IIS) continued to exhibit an instability even after applying the ArcGIS Web Adaptor (IIS) 11.1 Reliability Patch. The ArcGIS Web Adaptor (IIS) 11.1 stops responding to requests, particularly when the system was under load. The system will only recover after a change is made to IIS to either stop a hung process or restart IIS. The BUG ID of this defect is BUG-000159944.

   To resolve BUG-000159944, as well as address additional user feedback, Esri will release an additional software patch for the ArcGIS Web Adaptor (IIS). The patch is not yet finalized, we expect to release it in the next month. This high priority patch will address the defects listed below.

  • BUG-000159944 - ArcGIS Web Adaptor (IIS) 11.1 becomes unresponsive when under load, resulting in the inability to access ArcGIS Enterprise via the web adaptor URL.
  • BUG-000159933 - ArcGIS Web Adaptor (IIS) 11.1 becomes unresponsive after the client browser requests tiles from a cached map service.
  • BUG-000160009 - ArcGIS Web Adaptor (IIS) 11.1 fails to transmit multiple set-cookie response headers resulting in some cookies not getting created and affecting apps like ArcGIS StoryMaps.

 

   Please contact Esri Technical Support for help determining if you are experiencing this problem with the ArcGIS Web Adaptor (IIS) 11.1. For those who have followed this community post about the ArcGIS Web Adaptor (IIS), several folks have asked about the user posted workaround to change the IIS Settings to increase the Queue Length to 10000 and increase the Maximum Worker Processes to 5. We have found that this change can delay the effects of BUG-000159944 but under enough system load, the problem can still occur. The full resolution for BUG-000159944 will be provided in the upcoming patch for the ArcGIS Web Adaptor (IIS) 11.1. Thank you to those who shared their detailed feedback, such as this workaround and other testing results, as we have used that information during the investigation of this defect. 

Keep on keeping on!
Scott_Tansley
MVP Regular Contributor

Thanks for the update and transparency Jon.  Much appreciated. 

Scott Tansley
https://www.linkedin.com/in/scotttansley/
AndrewBowne
Occasional Contributor III

@JonEmch - I was wondering if you could give us a status update on the patch you mention above?

ThomasBuchmann
New Contributor

Esri just announced the release of the ArcGIS Web Adaptor (IIS) 11.1 Reliability Update 2 Patch, which solves the issue:

ArcGIS Web Adaptor (IIS) 11.1 Reliability Update 2 Patch (esri.com)

JonEmch
Esri Regular Contributor
0 Kudos