We are going through a major enterprise meltdown right now. My boss wants me to create a policy to check the logs every day so we can keep on top of the behavior.
Which log files should I be checking regularly?
https://yourportal.com/portal/portaladmin/logs/queryFilter and https://yourserver.com/server/manager/log.html
Would be two places to start. These are obviously stored on the respective server log folders as well.
Not specifically log-related, but the one suggestion I can make is to actively monitor available drive space on all machines participating in your ArcGIS Enterprise Deployment. I've seen several production systems go down because the default ArcGIS Enterprise backup settings fill the system drive with backups.
First thing I do now after deploying Enterprise is disable the default backups and configure the WebgisDR util.