Curious if anyone has examples / best practices for Python automation for cloud-deployed ArcGIS Enterprise instances (AWS - deployed on Linux servers).
Here are my thoughts:
1. AWS Lambda - seems like memory limits and inability to load in arcpy libraries to AWS lambda (even Python API is difficult to load in AWS Lambda) would make that a non-starter except in limited circumstances
2. Using one of the ArcGIS Server instances would work since Python 3 is installed with it, but would need to create a new conda env, etc. and since Pro cannot be installed on a Linux server, would only be limited to workflows not using Pro, unless you create those aprx's on a different machine and then copy them to the Linux server.
Which leads me to the most likely solution:
3. Windows machine on AWS Workspaces with ArcGIS Pro installed. Ensure access to your enterprise geodatabases, etc. Create scripts, arpx's here, publish to Enterprise from this machine, and use Windows task scheduler for automation.
I'm sure I am missing a lot of options- curious what other folks have done for automation / task servers in a Linux cloud deployment of Enterprise.
I suppose it depends on what exactly your scripts are doing. We have a lot of automation going on over here, but we mostly work with layers already published, meaning there is a service endpoint we can hit from anywhere, so long as we're authenticated. With that, we can do a number of things, including refreshing web layers with new data, creating PDFs from layouts in a Pro project, etc.
Thank you! A lot of our scripts are updating our enterprise geodatabases, mostly ETL pipelines, but also some automated map generation, etc. etc. Curious how and where you are automating the execution of these scripts?
We used to have our automated processes run on a single machine, but this became a bit of a liability, especially when transitioning to fully remote work during the pandemic.
While our scripts are all still scheduled to run on a particular machine, we actually put many of our administrative processes into a central git repository so that if needed, any of the scripts can be run from any other machine with the right Python env set up.
It also allows us to more easily develop and test new ideas / bugfixes using alternate git branches, then merge into the main branch when everything is fully tested.
Since setting this up, it has happened on several occasions that the on-premise machine has been unavailable at the scheduled time. In such cases, I'm able to manually run the up-to-date version of the script from my home machine.
This model does introduce its own security concerns, but we feel we've addressed those adequately.
Since our SDE layers are all published to our Portal, there's no need for direct SDE access for any of our scheduled scripts. Rather, we can simply use the arcgis and pandas Python packages to do all of our ETL procedures.
Got it - thanks for your input! I guess I'm looking for folks working automation using cloud resources. If they just deployed an additional AWS server just for automation, or ran python scripts from an ArcGIS Server machine using its Python install. In all cases, it seems like lambda isn't a very good option.
Agreed, lambda's better for other things. But having a lightweight AWS machine just to run your Python would be simple enough. Why would it need to be on the same machine as the Server installation?
No specific reason - maybe to save on costs? Avoid having a machine that only runs automation scripts a couple times day and is otherwise up and doing very little? Really I'm just trying to understand / get a list of different ways folks have approached automation in the cloud env (and if lambda is a part of their solution I would like to understand that as well)!
Check out this article: https://docs.aws.amazon.com/solutions/latest/instance-scheduler/welcome.html
We use the Instance Scheduler (which is running lambda and using DynamoDB for the settings) for our EC2 scheduling. There are, for instance, a couple of EC2 instances that only need to be on for very specific intervals, and not every day of the week. Using lambda to turn those instances on and off regularly has saved us quite a bit.
So suppose you had a lightweight instance for your scripts, you could use the Instance Scheduler to turn it on each day / week and stay on for a few hours, then turn off again.
Great question! Been wondering the same thing actually.
We are running Portal in AWS on Linux machines as well. So far I've had success publishing geoprocessing services to our Server (linux) and it functioning properly. I even published one web tool that takes excel spreadsheets as input and aggregates them all together and returns it to the user. Which at first I wasn't sure would work with the server machine being Linux based but so far it's been great. Outside of that, alot of the web tools are just hitting the REST endpoints of our various services we have published in project support.
Haven't explored the staged .aprx file on the Linux machine yet to see if it can still be accessed or not. Post back here if you find out any more about that!