I have created a standby instance of our production server for testing upgrades and HA in case the primary instance goes down. Ubuntu Linux 22.04, ArcGIS Enterprise 11.3.
The instance was created from a snapshot image of the primary instance while all of its services were shut down. When I check the logs for the portal I can see the following alternating errors about every five minutes:
<Msg time="2025-02-04T16:08:16,136" type="WARNING" code="217065" source="Portal" process="8843" thread="1" methodName="" machine="ARCGIS-TEST" user="" elapsed="0.0" requestID="">Error checking status of web server. java.util.concurrent.TimeoutException</Msg>
<Msg time="2025-02-04T16:14:16,657" type="WARNING" code="217064" source="Portal" process="8843" thread="1" methodName="" machine="ARCGIS-TEST" user="" elapsed="0.0" requestID="">The web server was found to be stopped. Re-starting it.</Msg>
Are these errors just side effects of another issue, should I be looking elsewhere? Tomcat is serving properly, and I can navigate to /server/manager and I get the login form, but when I try to sign in I get an error: "Unable to authenticate. Your portal might be using a self-signed certificate." When I navigate to /portal/home I get "Could not access any ArcGIS Enterprise portal machines. Please contact your system administrator."
Where else can I check to troubleshoot the issues with portal?
UPDATE: I made a change to the /etc/hosts file entry for the standby server, and now the portal is working. All I did was add the machine's hostname to the same line that originally only had the machine's IP address followed by the FQDN, so now it is as follows:
IP_Address FQDN hostname
Worth mentioning: I used the operationalHealth.py Python script to check the portal before making the change to the /etc/hosts file. The first time I ran it, it took a fairly long time and responded with:
2025-02-04 17:09:19,623 [CRITICAL] Unable to reach https://arcgis-test:7443/arcgis/sharing/rest, (<urlopen error [Errno 104] Connection reset by peer>). Check the Portal for ArcGIS hostname and try again.
After fixing the hostname entry, I ran operationalHealth.py again and everything checked out except my SSL certificate was not trusted.
I would still like assistance in making sure that how I configured our standby instance is as correct as possible, and not just "good enough." The communities feedback is greatly appreciated.