Testing Autorecovery of AWS EC2 File Server

11-20-2019 08:24 PM
Esri Contributor

AWS deployment of HA ArcGIS Enterprise - use of shared File Server.

Has anyone deployed the AWS CloudFormation template for HA ArcGIS Enterprise and done some testing of the ability of this to withstand EC2 failures?

The particular problem we are trying to work through is related to the shared file server - which is deployed in Autorecovery mode with CloudWatch configured to recreate the EC2 instance in the event of a "System Failure". (Not Instance Status Failure as would happen if you "terminate" or "shutdown" the instance)

According to a few posts around the traps



you cannot simulate a System Failure and trigger Autorecover.

Has anyone any ideas on this front?  One suggestion is to just manually simulate an Autorecover by a simple shutdown/restart sequence.

3 Replies
Occasional Contributor III

Hey David,

It's been a little while since I have played with Autorecovery but I'm fairly sure that the system failure event is more related to if the AWS machine that is hosting your EC2 instance goes down then your EC2 instance is migrated to another AWS machine. Status Checks for Your Instances - Amazon Elastic Compute Cloud 

If you are concerned about the EC2 failing out then you could consider putting it into an autoscaling group and automating the build of the file server or using snapshots for a more hands on process in case of disaster recovery.



0 Kudos
Esri Contributor

thanks Ben,

this architecture is the"standard" built by the Esri supplied Cloud Formation template.

I am hoping to be able to demonstrate it meets the client's High Availability needs

Occasional Contributor III

Unfortunately the file server is the single point of failure in the highly available solution. You can shift most of the configuration from the file server to an s3 bucket and dynomodb database and then backup the ArcGIS server directories. 

Depending on if your client has Service level agreements if place for their services you could just fully automate the deployment to check if there is an issue and just deploy a completely new stack and redeploy all the data. This is a DevOps mindset and works reasonably well for Arcgis Servers but portal often takes upwards of 40 minutes to deploy even for an automated solution. 

Happy to answer any other questions on it as combing DevOps, cloud and Esri is an area of interest of mine.