We've deployed a solution that seems to show promise in testing. Our time to add a new machine to a site has gone from ~20min to ~4min. Our process:
- Set the max number of instances for the auto-scaling group (i.e., 6)
- Create an AWS warm pool and set the max count equal to the value entered in step #1
- Use the AWS CLI to set the "instance reuse" policy for the warm pool so that instances are not potentially terminated. Setting this policy is not supported in the console. (If using Windows, must use AWS Cloudshell. Command does not work if using the CLI in the CMD window).
- Example:
- aws autoscaling put-warm-pool --auto-scaling-group-name agsSuper-ServerStack-AutoScalingGroup-1K7QS9RIGCV3K --pool-state Stopped --instance-reuse-policy '{\"ReuseOnScaleIn\": true}'
- Use AGS REST Admin to change the "remove from site" setting so that machines are not removed
- Confirm that all instances have joined the site
- Decrease the desired number of instances in the auto-scaling group to the number that we want to be used during normal traffic (i.e., 2)
The unused instances will be returned to the warm pool but will still appear in ArcGIS Manager. When it needs to scale out again, the machines simply change state (e.g. stopped to start) and are part of the AGS site as soon as AGS gets a heartbeat. Doing this avoids the 20min spin up time to create the new instance and allow the Chef and any "user data" scripts to run. Because we've "pre-configured" everything they are ready to go and just need to change state. Doing this also avoids potential IP address issues in Enterprise when attempting to use auto-scaling and warm pools (more info). The only potential issue we've identified with our approach is that you could get into IP address conflicts if the combined number of instances you have active and have configured for the warm pool are less than the maximum number of instances you've set for the auto-scaling group. As scaling events occur, there could be IP address issues between what AWS assigns to new instances (i.e., not coming from the warm pool but actual new instances) and what Enterprise thinks is in use. It's a bit deep and convoluted to explain, but the takeaway is that by: 1) configuring ArcGIS to not release the IP addresses in the site and 2) we've configured the warm pool max count to be equal to the max count of the auto-scaling group, we mitigate that concern.