Web apps causing ArcGIS Enterprise to crash

548
5
Jump to solution
02-21-2019 05:47 PM
HamishMills
Occasional Contributor

Hello all! 

Early last year we had an issue where whenever web apps were operating on our portal, the entire ArcGIS enterprise stack would intermittently go unresponsive for up to 15 minutes or more and sometimes go down completely. As soon as web app/s were removed the crashes/hanging sessions stopped. This issue was never resolved and now roughly a year on, and having just upgraded to the 10.6.1 update, this issue is still occurring. Naturally we cannot continue like this, we need to be able to use web apps on our ArcGIS Portal!

 

More detail on issue:

- It seems that ArcGIS Server is hanging first then everything else will lock up eventually, site hangs up for 5-15 mins then sometimes comes back or otherwise crashes completely.

- Often it has been noticed that if you catch it early and restart the ArcGIS Server component, Portal will become responsive again almost immediately (left too long both can become entirely unresponsive).

- We were unable to find anything conclusive in logging and extensive testing only pointed to the apps as the trigger for whatever was happening.

- Some evidence may point to editable feature services in the web apps, but this is yet to be fully proven. (i.e. may be related to arc objects accessing enterprise gdb?)

 

Environment details:

ArcGIS Enterprise 10.6.1;             PostgreSQL 9.5.7;          Microsoft server 2016;

 

AGS server machine specs:                          Portal machine specs:                             SQL machine specs:                              

                      

Any ideas or similar experiences anyone has had are most welcome! Desperate to get this one resolved! 

0 Kudos
1 Solution

Accepted Solutions
HamishMills
Occasional Contributor

Hello to anyone who may be interested. Thought it important to update this. 

This issue has now resolved for us and seems to have been resolved by one of the following: 

1) A definition query applied to one of the layers was removed from the ArcGIS Pro map before publishing and a simpler version of this query was re-applied later through a web map layer filter instead. 
2) Ensured that only one instance of each layer was present in the map at one time. i.e. no two layers referenced the same feature class in our SDE. 
3) Simplified all labelling and symbology. Most notably removed label callout boxes on one item and ensured no symbol/label warnings shown when publishing the layers.
4) Stopped using line offset in line symbology (relates to number 3 above).

With these 4 aspects combined we are no longer seeing any issues with our Web Apps. 

Thanks! 

Hamish. 

    

View solution in original post

0 Kudos
5 Replies
ThomasJones1
Esri Contributor

Hi Hamish,

It sounds like you've done a lot of initial troubleshooting. In general with an issue like this I would recommend trying to isolate each variable. For example I would recommend creating a web app from a template. Monitor system performance and slowly add services. If you are using registered service that I would also look into system resource usage on the database machined.

When the Portal and ArcGIS Server machines have performance issues are the CPU and RAM used maxed out? I would recommend using the following command to count the number of ArcSOC.exe processes running on your ArcGIS Server machine.

tasklist | find ".exe" /c 
tasklist | find "ArcSOC.exe" /c

The first command counts the total number of processes. The second command counts the total number ArcSOC.exe. Finally if you haven't already I would also recommend using Windows event viewer. In particular the application and system logs.

Tuning services in your ArcGIS Server site using best practices—Documentation | ArcGIS Enterprise 

Thanks,

Thomas.

HamishMills
Occasional Contributor

Hi Thomas,

Thanks so much for taking the time to reply. 

We have since stood up a test environment specifically for identifying this issue and are yet to even be able to replicate the issue. Appreciate the tips you've outlined, I will try to apply these ideas where possible. At this point server load doesn't seem to be related and all logs from Esri/Windows components don't communicate anything useful that we can find.

Thanks again. Will keep this post updated with anything we discover. 

0 Kudos
HamishMills
Occasional Contributor

Hello to anyone who may be interested. Thought it important to update this. 

This issue has now resolved for us and seems to have been resolved by one of the following: 

1) A definition query applied to one of the layers was removed from the ArcGIS Pro map before publishing and a simpler version of this query was re-applied later through a web map layer filter instead. 
2) Ensured that only one instance of each layer was present in the map at one time. i.e. no two layers referenced the same feature class in our SDE. 
3) Simplified all labelling and symbology. Most notably removed label callout boxes on one item and ensured no symbol/label warnings shown when publishing the layers.
4) Stopped using line offset in line symbology (relates to number 3 above).

With these 4 aspects combined we are no longer seeing any issues with our Web Apps. 

Thanks! 

Hamish. 

    

0 Kudos
MichaelVolz
Esteemed Contributor

For #1, would you prefer to use the definition query or the filter?  Do you feel one solution is cleaner and easier to maintain in the long run than the other?  Since you made 4 different changes, I'm just wondering if the definition query was a major part of the problem.

HamishMills
Occasional Contributor

I/we don't really have a preference at this point on which method is better query/filter. The web map filter definitely feels 'cleaner' now. But probably more because it works, and/or gut feel is that the filter in the web map is better suited in providing the format that the web app needs. I agree that its likely that the query was the main issue but as I have not had time to trouble shoot them individually I can not absolutely confirm this (remembering here that the original issue was very intermittent and hard to even replicate, especially in short spaces of time, hence why I was unable to test each item individually). 

Thanks! 

Hamish. 

0 Kudos