We have setup an HA Portal configuration and cannot get the standby Portal to pass the health check. Keeping getting an error that the /sharing/rest endpoint cannot be reached. In /portaladmin/machines the machine is listed and shows as standby:
If we check status get
The portal log on the standby machine just shows over and over that it cannot reach the endpoint
<Msg time="2023-07-12T13:44:00,470" type="WARNING" code="209041" source="Portal Admin" process="9564" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">Unable to reach the ArcGIS Portal Directory https://wsvesriportq200.domain.loc:7443/arcgis/sharing/rest. Restart the Portal for ArcGIS service and try again. If the problem persists, contact Esri technical support (U.S.) or your distributor (customers outside the U.S.).</Msg>
<Msg time="2023-07-12T13:44:00,470" type="WARNING" code="218037" source="Portal Admin" process="9564" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">Health Check failed, the portal is not ready.</Msg>
<Msg time="2023-07-12T13:44:11,814" type="INFO" code="216005" source="Portal" process="9576" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">HA: Node wsvesriportq200.domain.loc is configured to be standby.</Msg>
<Msg time="2023-07-12T13:44:11,829" type="INFO" code="216001" source="Portal" process="9576" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">HA: Getting the primary connection information from the recovery.conf file</Msg>
<Msg time="2023-07-12T13:44:11,829" type="INFO" code="216050" source="Portal" process="9576" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">HA: The primary server wsvesriportq100.domain.loc is identical to the master wsvesriportq100.domain.loc.</Msg>
<Msg time="2023-07-12T13:44:11,829" type="INFO" code="216006" source="Portal" process="9576" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">HA: Monitoring the master node.</Msg>
<Msg time="2023-07-12T13:45:00,455" type="WARNING" code="209041" source="Portal Admin" process="9564" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">Unable to reach the ArcGIS Portal Directory https://wsvesriportq200.domain.loc:7443/arcgis/sharing/rest. Restart the Portal for ArcGIS service and try again. If the problem persists, contact Esri technical support (U.S.) or your distributor (customers outside the U.S.).</Msg>
<Msg time="2023-07-12T13:45:00,455" type="WARNING" code="218037" source="Portal Admin" process="9564" thread="1" methodName="" machine="WSVESRIPORTQ200.domain.LOC" user="" elapsed="" requestID="">Health Check failed, the portal is not ready.</Msg>
We have restarted, rebooted, and even re-installed on the problem persists. We have turned off domain firewall to see if it might be a firewall issue. The machine is part of a Windows cluster, but don't think that should cause a problem.
Any thoughts @JonathanQuinn
Thanks -Joe
is portal still happy if you shutdown the primary machine?
@BillFox absolutely not!
We had a similar partner problem with the 11.1 HA build we are working on now.
We tried to test the fail over after just installing the two portal machines and two web adaptors.
From what we could tell from the logs it wanted the remaining pieces of the HA enterprise installed and configured first - which we are working on now.
To get going forward again we had to restore those windows servers from backup.
@BillFox interesting, we were actually testing this is that same scenario, just the Portal setup. We are just completing the rest of the installation and will give a try to turning the standby on and see what happens