AnsweredAssumed Answered

Portal for ArcGIS 10.5.1 HA Failing

Question asked by pradeepnegi on May 8, 2019
Latest reply on May 30, 2019 by pradeepnegi

We setup Portal for ArcGIS 10.5.1 in high-availability (active-passive) environment. It was running fine until a fail over happened, it recovered by switching from primary to standby site. The site came back with in stipulated time of 5-6 minutes but after recovering the performance of the portal degraded. Following the error message in portal log

 

  • "HA: error in HA plugin. The semaphore timeout period has expire”.
  • Cannot read from directory path <> Please check that the location is valid and that the Portal service account has permissions to the location. (Please refer file attachment)

     For the second error message above > Confirmed the location and permissions

 

Some weird behavior:

 1) Primary site shows status Health Check successful, the portal is ready

    Secondary portal site shows Error Site is not ready yet Code: 500

 

2) Primary site shows status shows Error Site is not ready yet Code: 500

In this scenario, ideally switch over should happen and secondary site should be updated as Primary but it is not happening. If you refer the db directory from standby machine it shows "recovery,done" file (latest one), even there is a "recovery.done" file (old one) in the primary site (Please refer the attachments)

 

3) Sometimes both primary and secondary are active and shows “Health Check successful, the portal is ready

 

Some message in standby portal log (hope this makes sense) "HA: The primary server <Machine Name> is identical to the master <Machine Name)"

 

Performance has degraded badly, even the service listing in ArcGIS REST Services Directory is very slow. Enough RAM is available and CPU usage is also low in server.

 

Restarted Portal in following order:

  • Stopping Portal Order: Stop first the standby Portal, then stop primary Portal.
  • Starting Portal order: Start first the primary portal, then start standby Portal.

 

Attached screen captures of following:

  • Portal Log
  • Primary and secondary portal folder
  • Primary and secondary db folder

 

Highly appreciate any suggestion as this production environment is down @Jonathan Quinn 

Thank You

Outcomes