Hi Ahmed,
It doesn't appear that the standby Portal completely promoting itself to primary. If it did, it would attempt to connect to it's own database, not the previous primary database. In the db folder of the standby, do you see a recovery.conf or a recovery.done file?
Hi Jonathan,
First of all, thank you for the reply.
Yes I can see a recovery.conf.bak and a recovery.done files in the described folder. What does that mean?
Let me update you with my latest notices regarding the portal HA (10.5), can you test a scenario where both portal machines are shutdown and try to power on only the machine which had "standby" role just before shutdown? In my case, I am getting the error:
"The portal has been initialized and configured but is not accessible. The internal portal database does not appear to be running or accepting connections. Restart the portal machine or machines and if the problem persists, contact Esri technical support (U.S.) or your distributor (customers outside the U.S.).</Msg>"
The standby machine would never become primary or even starts until I power on the primary server!
It sounds like there are two problems. If you have a primary and standby, both running and healthy, the standby should have a recovery.conf file, not a recovery.done file and possibly a recovery.done. The presence of a recovery.done file tells the standby Portal it has successfully promoted itself to the primary, which is why you don't see the fail over in "normal" circumstances, (you simply stop the primary while the standby is running).
Your next problem is a current limitation that we're looking to improve upon for later releases. The next problem is a timing thing. If you stop the standby, primary removes it from the HA configuration to make sure that it knows the standby is done. Once the standby comes up again, it adds it back. If you stop the standby and then stop the primary, and finally start the standby, it won't promote itself to primary because it won't have the latest snapshot of the data from the primary database, (since it was down). If it were to promote itself, you will have lost any data created on the primary while the standby is down, since the data wasn't replicated from the primary to the standby. In your case, where you stop both machines, it sounds like a similar scenario. Based on the timing of when you stop both, the standby may have stopped first, and then the primary. When you start the standby, it won't promote itself as that would potentially cause data loss.
**Edit: it's fine if one or both machines have a recovery.done file within the db directory. That's an indication that it had at some point, promoted itself to primary. The presence of the file won't affect future failover/failbacks.
Hi John,
I have an ArcGIS Enterprise Base HA deployment on AWS. According to our policy both the EC2s (Primary & Standby) shutdown everynight. Therefore, I am facing the same issue of simultaneous shutdown of primary & standby. Therefore messing up the HA configuration. Fortunately, I can control the timings of when each EC2 should shutdown and startup again (doesn't have to be simultaneous). What are your recommendations regarding shutdown and reboot order for Primary & Secondary machines?
Thanks,
Girish
I would stop the standby then primary. When you need to start them, start the primary first, then the standby.
Thanks!!! Jonathan Quinn , I will give it a try.
Dear Jonathan,
I would like to ask one question which can help me understand more on Portal HA.
We followed link for 10.5.1 to configure Portal HA Configure a highly available portal—Portal for ArcGIS (Windows) Installation Guide (10.5) | ArcGIS E...
One is shown as Primary and the other is Standby. Incase the primary portal goes down, it takes about 3 minutes for the standby to become active and till then portal is not accessible.
Why it says Active/Standby and not Active/Active? Is it because of software license restriction or is this how it is done? When we checked ESRI documentation, it says Portal supports Active/Active.
Can you please more details on this. Looking forward on this.
Regards,
Sandeep B
Portal HA is active/active in that both web servers take requests. However, it's primary/standby at the database tier. Portal's internal database doesn't support multi-master so it needs to be primary/standby. The standby will automatically be promoted to primary when it detects the primary down. From 10.3.1-10.6, that will take minutes, unfortunately. At 10.6.1, we've improved the failover time to occur under a minute, typically under 30 seconds.
Dear Jonathan,
Thank you for your reply. Can you please provide some more information on below issue:
Federation is valid but not able to start hosted service in ArcGIS Server. Because of this not able to publish any hosted layer. Can you please advice on this.
Regards,
Sandeep B