Webgisdr in to replicate a highly available deployment (portal url)

MichaelSchoelen · ‎08-16-2018

We have two data centers, each with highly available, single-machine deployments:

mainmachine01.domain.com:7443/arcgis (active)
mainmachine02.domain.com:7443/arcgis (standby)
and
altmachine01.domain.com:7443/arcgis (active)
altmachine02.domain.com:7443/arcgis (standby)

Currently, the main site is active (bold). You can get to it from https://maps.portal.com/portal. If we flipped to the alternate data center, it would still be the same maps.portal.com/portal

We want to replicate the deployment between the two sites. So two questions:

When I backup the main site using the webgisdr utility, what should I specify as the Portal URL in the export webgisdr properties file? maps.portal.com/portal? Or the active machine name mainmachine01.domain.com:7443/arcgis
When I restore the site to the alternate data center, what should I specify as the Portal URL import webgisdr properties file?
maps.portal.com/portal (likely not)? Or the active machine name altmachine01.domain.com:7443/arcgis

JonathanQuinn · ‎09-14-2018

How did you federate Portal and Server? On Site A, did you do something like this:

Services URL:

https://maps.portal.com/server

Admin URL

https://maps.portal.com/server

And on Site B, you did the same thing? If so, you're right, requests are being routed to the primary data center/deployment because Site B is resolved https://maps.portal.com/server to the primary environment. What you need to do is have your standby environment resolve maps.portal.com/server to the standby machines. This can be done through DNS or using a web adaptor to "fake" the URL:

For example, I have a traffic manager resolving traffic to my primary environment. I set up a standby environment and use a Web Adaptor as the front end. This machine's hostname is defined as maps.portal.com through etc\hosts and the entry is added to all machines in the deployment. You'll need to have also defined the web context URL in the second environment. When you take backups and restore them using the DR tool, all traffic will be routed to the standby machines based on the etc\hosts file entries. In the event you need to failover, remove the entries in the etc\hosts file so maps.portal.com resolves to your traffic manager.

View solution in original post

JonathanQuinn · ‎08-23-2018

For the active site, you can use the "public" URL, (maps.portal.com/portal) or the internal machine name, (mainmachine01.domain.com:7443/arcgis). Both resolve ot the same machines.

On the DR site, you must use the machine name as requests to maps.portal.com/portal will resolve to the active environment.

MichaelSchoelen · ‎09-14-2018

So I'm trying the following:

Export site from: primarymachine:7443 on the main site

Success

Import site on: primarymachine:7443 on the standby site

Failed
Cannot find the portal properties from the server https://maps.portal.com/server

Interesting that the tool is looking at the load balancer name, as that would route traffic to the main datacenter.

When looking at the logs, I'm also seeing that the alternate site is attempting to reach out to the primary site machine, and failing. Which that doesn't seem right--this tool should be able to run in the event that the primary site went down.

Thoughts?

JonathanQuinn · ‎09-14-2018

How did you federate Portal and Server? On Site A, did you do something like this:

Services URL:

https://maps.portal.com/server

Admin URL

https://maps.portal.com/server

And on Site B, you did the same thing? If so, you're right, requests are being routed to the primary data center/deployment because Site B is resolved https://maps.portal.com/server to the primary environment. What you need to do is have your standby environment resolve maps.portal.com/server to the standby machines. This can be done through DNS or using a web adaptor to "fake" the URL:

For example, I have a traffic manager resolving traffic to my primary environment. I set up a standby environment and use a Web Adaptor as the front end. This machine's hostname is defined as maps.portal.com through etc\hosts and the entry is added to all machines in the deployment. You'll need to have also defined the web context URL in the second environment. When you take backups and restore them using the DR tool, all traffic will be routed to the standby machines based on the etc\hosts file entries. In the event you need to failover, remove the entries in the etc\hosts file so maps.portal.com resolves to your traffic manager.

MichaelSchoelen · ‎09-14-2018

So my portals are already using web adaptors with the webcontext url set. Is it possible to just change the etc hosts file to point to that one?

I'm guessing that the alternative would be to wait until we fail over to the standby site, then restore the backup? Albeit, there would be 20 minutes of downtime as the portal restored.

JonathanQuinn · ‎09-14-2018

So are your services and admin URL the same? This is how the DR tool behaves during an import

1) Uses the PORTAL_ADMIN_URL to connect to the deployment you want to restore.

2) Determines that the "public" or front-facing URLs for the Portal and all federated Servers in the target deployment match the public URLs in the backup through the federation settings

3) Uses the admin URL in the federation settings to connect to the federated Servers to check the registered Data Stores as well as internal machine names for the Server machines

4) Finds the internal machine name for the Portal machines and runs a restore against the primary Portal machine after unregistering the standby

5) Runs a restore through the internal machine name for Server

At step 3, if your services and admin URL are the same, then it would resolve to the primary environment. You have three approaches:

1) Set up the web adaptor as described above

2) Use separate internal load balancers in each environment for the admin URLs

3) Restore the backup once there's a failure

In my opinion, they're in order of my recommendation, but I really wouldn't recommend 3 as you'd want the standby in a "warm" state. If you restore once there's a failure, then you're down for the amount of time required for the restore, which you've mentioned.

The first is ideal because you can connect to the deployment in your standby environment through the public URL to make sure all expected content and applications are accessible to mimic how your users will connect to it. If it works with the WA in place, it'll work if you need to failover, (after updating hosts files).

MichaelSchoelen · ‎09-18-2018

Great! Option 1 resolved the issue, and I am no longer getting that error message. The tool is running all the way through.