Hello, I have a high-availability system where Arc Portal/Datastore/Server are each on two machines, i.e., we have 6 total nodes and then a fileserver ec2 instance that acts as a shared directory between everything. One of the ArcGIS server nodes went down during testing and when I try to spin it back up and join the existing site, I keep getting an error:
Error: Failed to join the site. Machine '<redacted>' cannot access 'AGSDataStore_ds_4kwjycbq' data store(s) registered with the site. Please ensure that the ArcGIS Server account has read and write access to the data store(s).
Use -h for help.
I tested access to the primary IP of the first datastore machine from the node I am trying to join to my site, and it works fine, however, I cannot reach the standby datastore node, so I am wondering if that is the reason the join-site utility is failing. I tested the same thing on my currently joined node, and I have the same experience: I can reach the primary datastore endpoint, but the standby one does not work, so it's a bit confusing.
Is there something I am supposed to do when adding an additional ArcGIS server machine to my federated ArcGIS portal that allows this join-site utility to work?
EDIT: I wanted to clarify that I was able to validate the relational datastore from within portal web UI, as well as validate each individual datastore machine from the portal administration endpoints.
Solved! Go to Solution.
Hi @JVig, when you initially joined the additional ArcGIS Server machine, did you have the standby ArcGIS Data Store configured before or after? If it was after, it sounds like ArcGIS Server may be validating connections to both primary and standby data stores. To test further, could temporarily remove the standby data store and try joining the the additional ArcGIS Server machine.
Hi @JVig, when you initially joined the additional ArcGIS Server machine, did you have the standby ArcGIS Data Store configured before or after? If it was after, it sounds like ArcGIS Server may be validating connections to both primary and standby data stores. To test further, could temporarily remove the standby data store and try joining the the additional ArcGIS Server machine.
Hmm, let's see, the entire site was put together. So, both primary and standby datastores were registered and the site was then federated with the portal instance(s). Then testing happened, where I broke one of the connections to one of the two arcserver instances, and a health check fail caused that node to drop out of the target pool. The autoscaling group fired off a new ec2, and it attempted to rejoin the site which resulted in the above error.
I can attempt to deregister the standby node and give it another try though!
Hey @JakeSkinner , thanks for the advice. That worked out just fine: once I deregistered the standby datastore, I was able to add the secondary node without issue. I was then able to reregister the standby node, and everything was great.
To add a layer of complexity here this is supposed to be a self-healing, infrastructure-as-code I am writing here, so this sort of dancing between nodes will require some more advanced scripting, but I thank you for the solution.