Greetings Enterprise Community,
Our organization's ArcGIS Enterprise system is having a reoccurring issue with creeping service response times. After about two weeks of use, the web services start experiencing intermittent but increasing freeze-ups, as does ArcGIS Server Manager. Web maps and services will be fine one minute, and completely frozen the next, and back again. These 'frozen' intervals keep increasing over the next 3 - 4 days until the system becomes unusable, and a server reboot is our only option.
After that, services immediately start running quickly and consistently again... for another two weeks, or so. This problem has persisted through several Enterprise version upgrades.
We are currently running Enterprise 10.9. We have also migrated our GeoDatabase (SQL Server 2016 - 64 bit) onto it's own machine, so as to not compete with ArcGIS Server for system resources. I've also switched all viable services to Shared instances (Currently running about 40 ArcSOC processes). Portal for ArcGIS is also on it's own machine. Our Server machine meets and exceeds all recommended system reqs (CPU @ 3.4 GHz, 4 processors; 50 GB RAM; 400GB disk space). We've had several ongoing Esri helpdesk tickets, but are still unable to ascertain a cause.
The Server Manager statistics graphs consistently show this creeping response time (The discrepancy between 'Max' and 'Avg' response times indicates the intermittent nature of the freezing). I've also noticed the OpenJDK Platform Binary starts at about 470 MB immediately after reboot, and balloons to about 3.5 GB within two weeks. Is this expected behavior? Has anyone else encountered this and been able to determine a cause?
We are beginning to consider a complete Enterprise reinstall from scratch, at great expense, but are obviously hoping to avoid this, if possible. Any insight would be greatly appreciated! Thank you.