We are facing service interruption in ArcGIS Enterpsise 10.6.1 (Windows Server 2016 and Geodatabase on a AWS RDS Instance PostgreSQL 9.6 ). The issue is reproduced randomly everyday at different times and suddenly we cannot access to Portal Home, ArcGIS Server Manager or services (from 7443 or 6443 neither) getting a timeout. If we restart ArcGIS Server Windows service everything start to work.
We don’t see any related error in Server logs or Event Viewer and Hardware resources are ok at the time of failure. ArcGIS Enterprise is deployed in a r5.xlarge instance (4 vCPU and 32 Gb RAM), RAM usage is around 40% and sometimes we get sporadic spikes of CPU. We increased to r5.xlarge instance (8 vCPU and 32 Gb RAM) but the issue is still happening.
Analyzing ArcGIS Server Statistics we saw that some services are heavily used, in particular there is a Feature Service that some days is having more than 30,000 requests.
This service is used for editing (Online and Offline) by 17 users through Collector for ArcGIS, so we increased min and max instances and also for System/SyncTools but we are still having downtime. It appears the problem is happening when this kind of feature services are heavily used.
We really appreciate any help on this matter.