Standby Portal Webgisdr issues

5870
21
Jump to solution
02-09-2018 07:11 AM
SamJehle1
New Contributor III

Hello,

I'm currently testing the webgisdr utility and replicating a primary portal environment to a standby portal environment for disaster recovery. Each environment uses the same URL and references separate virtual IP addresses. My portal setup has a federated ArcGIS Server instance and a separate ArcGIS Server as the host server. I'm coming across two issues in the standby portal instance:

  1. The tool fails on the ArcGIS Data Store saying it is unable to find the data store in the standby environment since the names are different. Does the standby data store environment need to have the same database name?
  2. Besides the data store, everything seems to restore just fine. But, after futher investigation I noticed both the ArcGIS Server instances have had their directories changed to the same location as the primary environment. The site configuration file location for the standby sites are still the same. Is this normal behavior? I'm worried this will cause issues down the road when I'm importing the full site to the standby environment. I don't want to overwrite services that are in primary environment.
21 Replies
JonathanQuinn
Esri Notable Contributor

There's a bug that's resolved at 10.6 regarding restoring Data Store when the backup location is set to a UNC path:

BUG-000109900 ArcGIS Data Store backups fail to restore if the backup location is set to an NFS or UNC share

Can you set the backup location to a local drive for all data store types and try again?

0 Kudos
SamJehle1
New Contributor III

After reconfiguring the data store in the primary environment and standby environment, the data store import was sucessful! Thank for you for the bug information Jonathan. Once I get the environments configured with the mapped drives my setup will hopefully be good to go!

0 Kudos
SamJehle1
New Contributor III

After mapping the drive and reconfiguring the the data stores, the web dr tool is working. Thank you for all your help Jonathan!

0 Kudos
MichaelSchoelen
Occasional Contributor III

Jonathan Quinn‌, it looks like we are having the same error message:

  • Running ArcGIS Enterprise 10.5.1
  • High Availability
  • Drives are perpetually mounted (it is a, NTFS fileshare, but the is accessed via the "H:\" drive)

If this is something fixed at 10.6, is there a workaround I can use at 10.5.1? Maybe a datastore command-line tool?

==========================================
Starting the webgisdr utility.
==========================================

The configuration and base backup time in the current Web GIS
-------------------------------------------------------------
Portal: https://portal.maps.website.com/portal
|
|-- Hosting Server: https://portal.maps.website.com/server
| |
| |-- Relational Data Store: https://mainserver01.machinename.website.com:2443
/arcgis

Unzipping the backup file:
\\domain\shared\ArcGIS\ContentStores\Production\WebGISDR\September-14-2018-12
-32-48-PM-EDT-FULL.webgissite

The backup file has been unzipped in 00hr:03min:59sec.

The backup file was created at September 14, 2018 12:32:48 PM EDT.

The configuration and base backup time in the incoming Web GIS
--------------------------------------------------------------
Portal: https://portal.maps.website.com/portal at 9/14/18 12:25 PM
|
|-- Hosting Server: https://portal.maps.website.com/server at 9/14/18 12:25 PM
| |
| |-- Relational Data Store: https://altserver01.machinename.website.com:2443
/arcgis


Starting the restore process with the webgisdr utility.

Starting the restore of ArcGIS Data Store:
Admin Url: https://APRDVGISPORT01.machinename.website.com:2443/arcgis/datastoreadmin.

Failed to restore the ArcGIS Data Store.
Admin Url: https://altserver01.machinename.website.com:2443/arcgis/datastoreadmin.
{"jobId":"734bbac5-78de-489e-a8ef-7ddf26b423c2","errorMessage":"Failed to import
data to your replicated site.. Extended error message: Failed to import data to
your replicated site.. Extended error message: D:\\arcgisdatastore\\data\\backu
pedContents20180914\\backup_Content","description":"Deploy data store snapshot S
eptember-14-2018-12-25-29-PM-EDT-35-FULL from \\\\domain\\shared\\ArcGIS\\Con
tentStores\\Production\\WebGISDR\\Scratch\\WebGISSite1536954664700\\dataStore\\f
4941eaf-6d7f-4abe-8b6a-72b4b482f4bc","lastModified":"2018-09-14 16:05","status":
"failed"}

Starting the restore of ArcGIS Server:
Admin Url: https://portal.maps.website.com/server/admin.

The following ArcGIS Server has been restored successfully:
Admin Url: https://portal.maps.website.com/server/admin.

The restore of ArcGIS Server has completed in 00hr:09min:58sec.

Unregistering the standby portal machine ...
The standby portal machine APRDVGISPORT02.machinename.website.com has been unregistere
d successfully in 00hr:03min:13sec.

Starting the restore of Portal for ArcGIS:
Admin Url: https://portal.maps.website.com/portal.

The following Portal for ArcGIS has been restored successfully:
Admin Url: https://portal.maps.website.com/portal.

The restore of Portal for ArcGIS has completed in 00hr:38min:15sec.

The Portal for ArcGIS has been restarted successfully in 00hr:02min:19sec.

Joining a portal machine ...
Failed to join Site. Unable to configure local machine in standby mode for high
availability. com.esri.arcgis.portal.admin.core.PortalException: The configurati
on store is not connected. Please invoke the connect() method and try again.
The restore of Web GIS components has completed in 01hr:07min:11sec.

Stopping the webgisdr utility.

0 Kudos
JonathanQuinn
Esri Notable Contributor

So the backup location for the ArcGIS Data Store is set to a UNC path or the K:\ drive, which is just a mounted UNC drive? I would use the configurebackuplocation tool to update the path to be somewhere on the local machine.

In regards to the "Failed to join Site. Unable to configure local machine in standby mode for high availability.  com.esri.arcgis.portal.admin.core.PortalException: The configuration store is not connected. Please invoke the connect() method and try again." error, do you have a load balancers health check pointing directly at 7443? How often does it check? The issue is that the health check calls on code that causes joinSite or createSite to fail. It's a timing problem and likely won't happen each time you restore.

0 Kudos
MichaelSchoelen
Occasional Contributor III

Because our datastores are high availability, will that synchronization get disrupted if the datastore backup directories are migrated to local? 

As for the `Failed to Join Site`, you are correct. I ran it again, and the problem did not occur.

0 Kudos
JonathanQuinn
Esri Notable Contributor

No, Postgres, (what the Data Store is built on), is managing the replication of data from the primary to the standby. You only need to run it on primary but I would just run the describedatastore.bat file on each machine to make sure that the backup location is updated on both.

MichaelSchoelen
Occasional Contributor III

That did the trick. Thank you.

0 Kudos
JTessier
Occasional Contributor II

Jonathan Quinn‌ do you have a bug listing or more details on this: "do you have a load balancers health check pointing directly at 7443? How often does it check? The issue is that the health check calls on code that causes joinSite or createSite to fail. It's a timing problem and likely won't happen each time you restore."

What are the symptoms of when a Portal joinSite fails because of this?  Will it provide the successful join message (within about 5 minutes) but then spin and never return?

Thanks for any details you can provide.

0 Kudos
JonathanQuinn
Esri Notable Contributor

There will be an explicit error, "config-store is not connected, invoke the connect() method...". In your case, you may be running into BUG-000121969 If both portal machines restart at the same time, the web server can become deadlocked. This can occur during a restore, when joining. It's fixed at 10.8.