I have a HA Enterprise 11.1 which is upgraded from 10.9
I am trying to setup failover scenario with the relational datastore for managing hosted feature services. I have followed this post: https://support.esri.com/en-us/knowledge-base/error-failed-to-change-role-for-data-store-machine-sta...
The describedatastore command shows, both primary and standby machines got the store mode as : READWRITE. I know this is not same in 10.9 env I have, where Primary is READWRITE and Standby is in READONLY mode.
I have removed standby, reinstalled datastore and reconfigured, but still that shows as READWRITE. Is this expected / normal? @ChristopherPawlyszyn @JonathanQuinn any suggestions? I am wondering whether upgrade to 11.1 will necessitate further additional re-configurations of relational datastore?
I have then tried to 'test' the failover scenario by stopping the arcgis service in the 'primary' datastore. As expected the standby then take over as primary. The issue is when the arcgis service is restarted on the now demoted standby, that the replication is no longer working. A look at the validate datastore shows:
"datastore.isActiveHA": "false",
If then try to manually designate the standby as primary via server administrative REST end point, the error comes:
Server machine 'https://VM-DATASTR-PRI.xxxxxxxxxxxxxx.INTERNAL.CLOUDAPP.NET:2443/arcgis/datastoreadmin/machines/VM-D...' returned an error. 'Machine 'https://VM-DATASTR-SBY.xxxxxxxxxxxxxx.INTERNAL.CLOUDAPP.NET:2443/arcgis/datastoreadmin/' returned an error. 'Failed to change role for data store machine 'VM-DATASTR-SBY.xxxxxxxxxxxxxx.INTERNAL.CLOUDAPP.NET'.
Caused by: Validation checks on data store machine 'VM-DATASTR-SBY.xxxxxxxxxxxxxx.INTERNAL.CLOUDAPP.NET' failed.''
The only way to sort this is to reinstall / re-configure the data store.
Any suggestion would be very much appreciated.
Many thanks
Thomas
Hi Thomas,
Seems like you are seeing a few issues here. First, it seems like you are running into the issue described in BUG-000156579. We have seen occurrences where this issue also presents itself in 11.1 environments. So that would explain why you are seeing both values as 'READWRITE'. Looks like the fix was implemented in 11.2. Once this gets out of sync it is difficult to get it back to normal due to the above BUG.
To confirm that the STANDBY data store is actually In 'READONLY', navigate to the ArcGIS Server admin endpoint and validate the data store there. From there you should see a value listed as "db.isInRecovery". For the PRIMARY data store this should be set to "false". For the STANDBY data store this should be listed as "true". I would verify this just to confirm that data store is in fact acting as STANDBY.
In regard to the issues you're seeing with failover, that may be a bit different. If not already created, I would open a case with Esri Technical Support as it will take a bit more digging to determine why failover is not working correctly here. If possible, I would try setting the ArcGIS Server logs to DEBUG and viewing the logs for ArcGIS Server and ArcGIS Data Store when reproducing the issue. The logs may have more information that could be helpful for troubleshooting. Hope that helps!
Brian
Hi @Brian_M
Thanks for looking into the case. I was told by ESRI support team that this is indeed an issue with the bug you mentioned and addressed in 11.2. I haven't upgraded this into 11.2 as we have multiple deployments in Production and have to align them all before we upgrade.
Having looked at the validate status: seems both pri and sby machines has the "db.isInRecovery" set to false: see attached.
Thomas
@Brian_Mwhen will esri provide an 11.1 (long-term release) patch for that bug?