Hello,
I'm currently testing the webgisdr utility and replicating a primary portal environment to a standby portal environment for disaster recovery. Each environment uses the same URL and references separate virtual IP addresses. My portal setup has a federated ArcGIS Server instance and a separate ArcGIS Server as the host server. I'm coming across two issues in the standby portal instance:
Solved! Go to Solution.
There's a bug that's resolved at 10.6 regarding restoring Data Store when the backup location is set to a UNC path:
BUG-000109900 ArcGIS Data Store backups fail to restore if the backup location is set to an NFS or UNC share
Can you set the backup location to a local drive for all data store types and try again?
After reconfiguring the data store in the primary environment and standby environment, the data store import was sucessful! Thank for you for the bug information Jonathan. Once I get the environments configured with the mapped drives my setup will hopefully be good to go!
After mapping the drive and reconfiguring the the data stores, the web dr tool is working. Thank you for all your help Jonathan!
Jonathan Quinn, it looks like we are having the same error message:
If this is something fixed at 10.6, is there a workaround I can use at 10.5.1? Maybe a datastore command-line tool?
==========================================
Starting the webgisdr utility.
==========================================The configuration and base backup time in the current Web GIS
-------------------------------------------------------------
Portal: https://portal.maps.website.com/portal
|
|-- Hosting Server: https://portal.maps.website.com/server
| |
| |-- Relational Data Store: https://mainserver01.machinename.website.com:2443
/arcgisUnzipping the backup file:
\\domain\shared\ArcGIS\ContentStores\Production\WebGISDR\September-14-2018-12
-32-48-PM-EDT-FULL.webgissiteThe backup file has been unzipped in 00hr:03min:59sec.
The backup file was created at September 14, 2018 12:32:48 PM EDT.
The configuration and base backup time in the incoming Web GIS
--------------------------------------------------------------
Portal: https://portal.maps.website.com/portal at 9/14/18 12:25 PM
|
|-- Hosting Server: https://portal.maps.website.com/server at 9/14/18 12:25 PM
| |
| |-- Relational Data Store: https://altserver01.machinename.website.com:2443
/arcgis
Starting the restore process with the webgisdr utility.Starting the restore of ArcGIS Data Store:
Admin Url: https://APRDVGISPORT01.machinename.website.com:2443/arcgis/datastoreadmin.Failed to restore the ArcGIS Data Store.
Admin Url: https://altserver01.machinename.website.com:2443/arcgis/datastoreadmin.
{"jobId":"734bbac5-78de-489e-a8ef-7ddf26b423c2","errorMessage":"Failed to import
data to your replicated site.. Extended error message: Failed to import data to
your replicated site.. Extended error message: D:\\arcgisdatastore\\data\\backu
pedContents20180914\\backup_Content","description":"Deploy data store snapshot S
eptember-14-2018-12-25-29-PM-EDT-35-FULL from \\\\domain\\shared\\ArcGIS\\Con
tentStores\\Production\\WebGISDR\\Scratch\\WebGISSite1536954664700\\dataStore\\f
4941eaf-6d7f-4abe-8b6a-72b4b482f4bc","lastModified":"2018-09-14 16:05","status":
"failed"}Starting the restore of ArcGIS Server:
Admin Url: https://portal.maps.website.com/server/admin.The following ArcGIS Server has been restored successfully:
Admin Url: https://portal.maps.website.com/server/admin.The restore of ArcGIS Server has completed in 00hr:09min:58sec.
Unregistering the standby portal machine ...
The standby portal machine APRDVGISPORT02.machinename.website.com has been unregistere
d successfully in 00hr:03min:13sec.Starting the restore of Portal for ArcGIS:
Admin Url: https://portal.maps.website.com/portal.The following Portal for ArcGIS has been restored successfully:
Admin Url: https://portal.maps.website.com/portal.The restore of Portal for ArcGIS has completed in 00hr:38min:15sec.
The Portal for ArcGIS has been restarted successfully in 00hr:02min:19sec.
Joining a portal machine ...
Failed to join Site. Unable to configure local machine in standby mode for high
availability. com.esri.arcgis.portal.admin.core.PortalException: The configurati
on store is not connected. Please invoke the connect() method and try again.
The restore of Web GIS components has completed in 01hr:07min:11sec.Stopping the webgisdr utility.
So the backup location for the ArcGIS Data Store is set to a UNC path or the K:\ drive, which is just a mounted UNC drive? I would use the configurebackuplocation tool to update the path to be somewhere on the local machine.
In regards to the "Failed to join Site. Unable to configure local machine in standby mode for high availability. com.esri.arcgis.portal.admin.core.PortalException: The configuration store is not connected. Please invoke the connect() method and try again." error, do you have a load balancers health check pointing directly at 7443? How often does it check? The issue is that the health check calls on code that causes joinSite or createSite to fail. It's a timing problem and likely won't happen each time you restore.
Because our datastores are high availability, will that synchronization get disrupted if the datastore backup directories are migrated to local?
As for the `Failed to Join Site`, you are correct. I ran it again, and the problem did not occur.
No, Postgres, (what the Data Store is built on), is managing the replication of data from the primary to the standby. You only need to run it on primary but I would just run the describedatastore.bat file on each machine to make sure that the backup location is updated on both.
That did the trick. Thank you.
Jonathan Quinn do you have a bug listing or more details on this: "do you have a load balancers health check pointing directly at 7443? How often does it check? The issue is that the health check calls on code that causes joinSite or createSite to fail. It's a timing problem and likely won't happen each time you restore."
What are the symptoms of when a Portal joinSite fails because of this? Will it provide the successful join message (within about 5 minutes) but then spin and never return?
Thanks for any details you can provide.
There will be an explicit error, "config-store is not connected, invoke the connect() method...". In your case, you may be running into BUG-000121969 If both portal machines restart at the same time, the web server can become deadlocked. This can occur during a restore, when joining. It's fixed at 10.8.