Automate Your ETL Processes On A Schedule - Two Ways

05-31-2019 06:40 AM
Esri Regular Contributor
8 2 5,774

This post is about automating repetitive ETL processes right from your desktop.  No code, no server.

Note:  This post originally discussed only one way to schedule ETL processing, but with the ArcGIS Pro 2.5 release, due out soon, job scheduling is coming to Desktop geoprocessing right from any tool's Run button!  I'll leave the 'legacy' approach details in the post but do read through to the 'new' approach once you're able to deploy Pro 2.5.

The legacy approach:

We're seeing many people using Data Interoperability to periodically synchronize datasets between systems of record.  Typically the source data refresh 'trigger' is driven by a schedule and not some random event, and the frequency of updates is based on multiples of a working day.  If you're on this kind of treadmill this post is for you!

You may have heard of this sort of automation in the context of Windows Task Scheduler with a Python script as the task and the script calling a geoprocessing tool or model.

We're going down the task scheduling path too, but without needing Python.

In the modern era there is a lot of emphasis on service oriented architecture and the ArcGIS stack has comprehensive publication and synchronization capabilities amongst apps, but you're reading this because you're working outside the stack, at least at one end of your synchronization workflow.  You have used Data Interoperability's Workbench app to wrangle services, databases, files and so on to achieve your own private batch 'service'.  You don't have to be the server and click 'Run' too.  Your friend is this guy:

C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe

That's right, a big fat executable.  This is the one that does all the work when an ETL tool runs.  You may never have noticed, but when you run an ETL tool while being edited in Workbench, the very first line that appears in the log window is:   Command-line to run this workspace:  followed by the path to our new friend above and the path to the open workbench .fmw file, and any arguments the workspace needs.  Its all there, so lets plug it together.

Lets dispense with some legalities first.  With ArcGIS Pro, Enterprise and OnLine you're living in a world of named user licensing.  Your ETL tool may embed these credentials.  Provided the scheduled task you build automates the ETL tool on the machine you would use to run it interactively there should not be any licensing issues.  If someone else needs to run it they should replace the named user credentials first.

For my example I'm going to recycle an ETL tool example from an earlier post.  I use it to maintain a hosted feature service using data harvested from a Geoserver instance via an extended WFS API.  It has an official refresh rate of once a week, each Saturday local time; I run the ETL tool when I remember to on Monday mornings (hey its only a demo).  Let's automate that.  Mondays are getting problematic for me, I may forget.

The example ETL tool reports the command line I should use to run the workspace is:

"C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe" C:\Work\Synchronize\Synchronize.fmw --API_Key "im_not_telling_you_my_api_key" --LDS_Unique_ID "address_id"

Because ETL tools store their parameter values it isn't necessary to supply those arguments if they don't change, so this works too:

"C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe" C:\Work\Synchronize\Synchronize.fmw

Now we create the scheduled task.  Open Task Scheduler and fill in the dialogs for a Basic Task:

Adjust the settings how you need:

Tip:  If you configure the task exactly as above a command window like below will pop up, if you don't want this use the setting 'Run whether user is logged on or not'.

While I remember, if you're interested in more ways to batch ETL check out this post.

Now do your bit and come in late Mondays!

Note:  We have had reports from the field that Windows Task Scheduler can be impeded from working by some system security settings.  If you find this and cannot work around them with your IT department, log a support call with Esri and ask the analyst to consult Analyst Knowledge Article 000022373 which has a reference to an alternative scheduling technology.

The new approach:

Please read the Pro 2.5 help topic 'Schedule geoprocessing tools' for details, I'll only show the user interface experience here.  Starting with the same 'Synchronize' ETL tool as in the legacy approach outlined above, I create a scheduled tool from the Run button, here is a screen grab:

Select 'Schedule' and you'll get a configuration dialog:

I set up weekly recurrence like in the first example; to refine the 'Begin On' value the pull-down supplies a handy date-time picker:

I'm done!  How easy is that!  Apart from the obvious ease of setting up the automation you should note that the ETL tool is just a tool, there are no special considerations around handling an ETL tool versus a core geoprocessing tool (or model).  Caveat, if you are using concurrent licensing and scheduling a Python script tool that calls any extensions (Data Interoperability for example) then your code will need a CheckOutExtension() call.
A fine point, don't forget to use appropriate power management (disk shutdown, sleep, hibernate) settings for your scheduling PC, talk this through with your IT folks if you have any doubts, for example it is possible for network administrators to enforce rules for hibernation that override the visible power settings.

Now go ahead and automate stuff!