Configure a Replicated ArcGIS Enterprise Deployment in AWS using the WebGIS DR Tool

6607
6
03-24-2020 04:23 AM
by Anonymous User
Not applicable
14 6 6,607

Many organizations have installed and configured their deployments of ArcGIS Enterprise in the cloud and are required to incorporate disaster recovery plans to ensure the least amount of downtime in the event of a failure or catastrophe. There are various options out there for organizations to choose from to plan for a disaster recovery scenario. For the purpose of this blog post, I will walk through the steps for replicating a primary Enterprise deployment specifically in AWS to a warm, geographically redundant standby site using the WebGIS DR Utility.

Learn more on disaster recovery and replication in ArcGIS Enterprise

Much of this workflow has been covered extensively by my colleagues for on-premise deployments in another blog post Migrate to a new machine in ArcGIS Enterprise using the WebGIS DR tool. The general workflow for our scenario is very similar to the on-premise deployments with some added steps utilizing AWS services* – Application Load Balancer (ALB), Route 53, and S3 buckets – as well as the exclusion of using etc\host file entries in EC2 instances. Please read the blog post above to understand the overall workflow and prerequisites before proceeding with this workflow, as it will contain information not covered in this post.

* I will assume readers will have some prior experience with AWS and its services mentioned above.

Scenario

Let’s say we want to have a replicated and geographically redundant multi-machine deployment with the following components setup in AWS in the US East and US West regions: 

  • Portal for ArcGIS 
  • ArcGIS Hosting Server 
  • ArcGIS Data Store 
  • ArcGIS Geocode Server

Architectural design of base enterprise deployment with additional server

Route 53 Hosted Zones

Route 53 is a scalable Domain Name System (DNS) that utilizes a combination of public and private hosted zones, which are containers for records about how you want to route traffic for a specific domain. Public hosted zones are used to route traffic on the internet, while private hosted zones are used to route traffic within an Amazon VPC. The following workflow will be utilizing both types of hosted zones with overlapping namespaces. Therefore, in our scenario when we are logged into an EC2 instance in a VPC that is associated with a private hosted zone:

  • The Resolver will evaluate whether the name of the private hosted zone matches the domain name in the request. If there is no match, the Resolver will forward the request to a public DNS resolver.
  • If there is a match in the domain name of the request, the hosted zone is searched for a record that matches the domain name. If there is no record in the matching private hosted zone, the Resolver does not forward the request to a public DNS resolver, but will return a non-existent domain (NXDOMAIN) to the client.

That last bullet is very important! If you have other applications in the same VPC and you setup a private hosted zone, please ensure to add records for those applications so that those application remain completely operational.

Before Deployment

Before proceeding with deploying ArcGIS Enterprise, we need to complete the following tasks using the AWS services mentioned above in order to maintain a consistent DNS across the primary and standby sites:

 

  1. Create an ALB in both regions that will manage traffic for the ArcGIS Enterprise deployments. A few things to note: 
    1. Ensure to attach the appropriate certificate from the public domain registered in Route 53.
    2. A target group must be created when configuring a new load balancer. Create a target group for Portal at this time. It does not need to be registered to any targets with this target group.
  2. In Route 53 under your Public Hosted Zone:
    1. Create two identical Record Sets for each of the load balancers using CNAME types. One thing to note here: 
      1. Having multiple Record Sets using the Weighted Routing Policy is not a requirement; it is more a matter of preference. It is possible to have just one Record Set and replace the ALB value when the switch is needed.
    2. Set the time to live (TTL) to the recommended 60 seconds, which will "minimize the amount of time it takes for traffic to stop being routed to your failed endpoint."
    3. Set the Routing Policy to Weighted:
      1. Assign a weight of 100 to the primary site.
      2. Assign a weight of 0 to the standby site.                     Public record set in Route 52 with Weighted Routing PolicyPublic record set in Route 52 with Weighted Routing Policy
  3. Create a Private Hosted Zone for each region using the same Domain Name as the Public Hosted Zone in Route 53.
    1. Attach the private hosted zone to the VPC assigned to your ALB during time of creation.List of hosted zones in Route 53
  4. In Route 53 under each of the Private Hosted Zones:
    1. Create an identical Record Set to match the one created in the Public Hosted Zone.
    2. TTL can remain as the default value of 300 seconds.
    3. The Routing Policy can remain the default value of Simple.Private record set in Route 52 with Simple Routing Policy Private record set in Route 52 with Simple Routing Policy

The last two steps of this pre-deployment process are vital to the success of the WebGIS DR utility. Having the two Private Hosted Zones with identical Domain Names and Record Sets that match the Record Sets in the Public Hosted Zone ensures the ability to configure identical ArcGIS Enterprise deployments. Additionally, the deployments will remain in an operational state to perform consistent backups and restores on the appropriate systems without issue due to being on separate regions and VPCs.

ArcGIS Enterprise Deployment

It is finally time to deploy the components of our architecture to EC2 instances (repeat for each region):

  1. To setup the base deployment (Portal for ArcGIS, ArcGIS Hosting Server, and ArcGIS Data Store), follow steps 1-20 from our documentation on deploying Portal for ArcGIS on AWS, which utilizes Esri’s Amazon Machine Images (AMIs).
    1. Make sure to assign the EC2 instances to the same VPC and Security Group as the ALB.
  2. Repeat steps 11-19 in the linked documentation from the previous step to setup the additional ArcGIS Geocode Server.
  3. Create the two remaining target groups for the ArcGIS Hosting Server and Geocode Server (the Portal target group was created during the ALB creation in the pre-deployment steps).
  4. Register each instance with its appropriate target group.
  5. Configure the appropriate forwarding rules in the ALB to match the target groups. If everything is setup correctly, we should be able to successfully access portal through the DNS alias (Name property) created in the Record Sets.ArcGIS Portal homepage with arrow pointing to URL
  6. Follow steps 21-23 in the linked documentation above using the DNS alias from the Record Sets. For example, my portal’s system properties would have the following entries: Portal system properties
    1. And federating my hosting server (or any others) would look like so (the admin URL may vary depending on the level of administrative access setting placed on the Web Adaptor):ArcGIS Server federation in Portal organizational settings.
  7. Repeat step 22 for the ArcGIS Geocode Server.

We now have two identical and fully functional ArcGIS Enterprise deployments in each region. The deployment that has a Weighted Policy of 100 in the Public Hosted Zone will be accessible over the internet and act as the primary site, while the other deployment (with Weighted Policy of 0) in the Public Hosted Zone can only be accessible within its own VPC and act as the standby site.

Active and standby ArcGIS Enterprise architectural designs

Replication

Now that we have our primary and standby sites, we have to setup two replicating S3 buckets in the appropriate regions to have a fully replicated and geographically redundant deployment prepared for a disaster recovery scenario.

In Amazon S3, create two buckets with easily recognizable naming conventions, like the following:

Amazon S3 buckets

In the second step of the creation process for each bucket under Configure Options, check the box under Versioning. This is a requirement to enable Cross-Region Replication. Once the buckets have been created, select the bucket that will be utilized by the primary site and navigate to Replication under the Management tab and select Get Started. We can leave the defaults selected for all settings throughout this process. We just need to ensure we select our designated standby site bucket for the destination.

Configuring the replication rule in Amazon S3 bucket

Amazon S3 provides a selectable option (seen in the screenshot above) called S3 Replication Time Control, which will replicate 99.99% of new objects within 15 minutes. This may be a valuable option for organizations with large-scale ArcGIS Enterprise deployments with lots of content who need minimal down time. In my tests, I have found replication to be fast without that option selected, granted my backups ran around 5-6 GBs, which amounted to ~300 hosted services and a Geocode service (and its data) copied to ArcGIS Server.

Disaster Recovery

We are now able to create backups of the primary site and restore the standby site from said backups.

  1. Create a backup with the WebGIS DR tool of your primary site on the instance where Portal is installed. Make sure to point to the appropriate S3 bucket (should be the one in the same region as the primary site) in the properties file. The backup will be replicated to the S3 bucket in the other region.
  2. Restore the replicated backup with the DR tool on the standby instance where Portal is installed. Be mindful of the appropriate S3 bucket in the properties file.

This is the final architecture and workflow:

Full workflow/architecture outlined with S3 bucket replication

In the event of a disaster, we now have the ability to “failover” to our warm, standby deployment by just toggling the values of the Weighted Policies of the Record Sets in our Public Hosted Zone with downtime dependent upon the TTL of the Route 53 record set.Additionally, depending on the length of downtime for the original hot site, you may need to modify how your backups and restores are handled on the two environments, so that any new items created in the new hot site can be maintained when your original data center is restored.

* It should be heavily emphasized that this workflow with the WebGIS DR tool is not considered a highly available ArcGIS Enterprise deployment. The “failover” may be quick in this scenario, but the standby deployment will only have content available from the last WebGIS DR backup/restore. Please see our documentation on configuring a highly available ArcGIS Enterprise.

ArcGIS Enterprise failover workflow/architecture

Since both deployments are also identical, there should be no issues with services integrated into other business systems. Most importantly, this entire workflow – running WebGIS DR backups and restores, DNS toggling, as well as detecting who is acting as the primary site at any given moment – can be completely scripted and automated, ensuring that mission critical systems remain operational.

Learn more about preventing data loss and downtime with ArcGIS Enterprise.

6 Comments
JacobBoyle412
Esri Contributor

Excellent write-up!

DavidHoy
Esri Contributor

Thank you Taylor,

I will be taking the major elements of this idea as part of our migration of a (large) site from one AWS account to another. Particularly grateful for the tip about the need to set up the private zone in the source VPC as well to allow this site to continue running after the switchover of public addressing.

JasonHansel1
New Contributor

Taylor,

How would this solution apply to a 10.7.1 ArcGIS Server Federated site pointing to S3, i.e. Cache Directory? 10.8.1 seems to have resolved a BUG related to Export Site that throws a 500 Error when running webgisdr. Aside from upgrading to 10.8.1, what would be the proper workflow for changing the S3 directories in ArcGIS Server?

by Anonymous User
Not applicable

Hey Jason, that bug you are referencing should be included in a 10.7.1 patch to be released relatively soon.

JasonHansel1
New Contributor

Taylor,

Awesome, thank you for the good news. How soon will it be released? Would it be possible to get the patch early?

Thanks!

by Anonymous User
Not applicable

Hey Jason, I must have been misinformed. It looks like we released the patch only for 10.6.1 deployments: ArcGIS Server 10.6.1 High Availability and Disaster Recovery Quality Patch. I was just informed that the patch will not be coming for 10.7.1 (and the recommended route is to upgrade). Apologies for the unfortunate news