We upgraded from 10.9.1 enterprise to 11.2. Initially upon upgrading things seemed ok. After about a week we noticed major issues with one of our more utilized Federated ArcGIS Servers was continuing to fail hard with no apparent explanation and progressively got worse until it would freeze all together.
- The SDE database connections would fail validation.
- The GP Publishing tool would start to have issues hitting max instances and getting stuck.
-.locks would persist in the Linux volume for the config-store and prevent users from deleting, updating or overwriting a service.
- System DynamicMapping service continues to fail hard
We have gotten past the database failing validation and some of the locking issues (not all) by doing a lot of server clean up and switching the pool from shared to dedicated. Stopping unused services, limiting data returned on service. We have stopped nearly 200 or more services, limited the rest, but we still seem to experience service corruption and fails.
This seems like a big performance hit from 10.9.1.
I wonder if this is due to the new management of services via portal. I did notice some conflicts in the language from portal to server on the disable identify relates. In portal is says Enable identify relates: true and in server it says disable identify relates true.
I have a long standing open case 03596765.Description: "Our team recently upgraded to 11.2 from 10.9.1. After the upgrade we are having a number of issues with layers that reside in POSTGRES/SDE .
For example, in logs we have a number of errors on System/DynamicMappingHost.MapServer code=7563 failed to process request, no layer or table initialed . or code=8001 failed to process request
in the ArcGis manager when we go to validate multiple data sources we get a failed on a few with the error "java.util.ConcurrentModificationException", but when we validate 1 at a time it validates fine.
These layers are also very slow to render in portal and sometimes loose symbology upon rendering."
If anyone else can assist or provide insight it would be much appreciated.
I had a lot of similar issues (not really documented, but a lot of built-in utilities were failing, our entire enterprise environment became unstable) when I moved to 11.2 in our production environment. We reverted back to 11.1 which we've found to be way more stable. 11.3 (a long-term release) was just released yesterday - hoping that is a bit more stable than 11.2 (a short-term interim release). But before moving it to prod I plan on doing a bit more extensive testing in a sandbox dev environment.
If I were in your shoes I would revert if you have server snapshots and just move to 11.1. The move from 10.9.1 to 11.1 was pretty smooth as I recall.
I really wish we could, but we are about 1 and a half months after the upgrade. I do wish that this instability was documented more on the ESRI forums. Unfortunately, we were too hopeful that there was a solution and it we spent way too much time working with tech support to still be in similar state.
In that case I would leap straight to 11.3 then. It's a long-term release and should be more stable than 11.2. Good luck!
So we made some headway. We noticed that when we switched everything to dedicated some services stopped working with no apparent reason. Some were published with ArcPro 3.0 some 3.1. The all had different capabilities enabled. We also noticed that these broken services worked temporarily when they were shared in a pool with another service that utilized the same postgres schema. They still would not allow you to overwrite or delete the services and would shut down the dynamicmappinghost after some time. When you made them the only service in the shared pool with the postgres user schema or dedicated they would give a 500 error.
I am not sure when or why, but it looks like the services are corrupt (I am assuming something broke somewhere on upgrade) and only appeared to be working because they were hijacking the connection from a working service. By switching everything to dedicated we were able to identify the problematic services and fix them by replacing the .msd in the directories. We are still in the midst of fixing services and should have a better idea if this resolves issues once complete.