Skip navigation
All Places > Open Platform, Standards and Interoperability > Blog > 2019 > May
2019

This post is about automating repetitive ETL processes right from your desktop.  No code, no server.

 

Note:  This post originally discussed only one way to schedule ETL processing, but with the ArcGIS Pro 2.5 release, due out soon, job scheduling is coming to Desktop geoprocessing right from any tool's Run button!  I'll leave the 'legacy' approach details in the post but do read through to the 'new' approach once you're able to deploy Pro 2.5.

 

The legacy approach:

We're seeing many people using Data Interoperability to periodically synchronize datasets between systems of record.  Typically the source data refresh 'trigger' is driven by a schedule and not some random event, and the frequency of updates is based on multiples of a working day.  If you're on this kind of treadmill this post is for you!

 

You may have heard of this sort of automation in the context of Windows Task Scheduler with a Python script as the task and the script calling a geoprocessing tool or model.

We're going down the task scheduling path too, but without needing Python.

 

In the modern era there is a lot of emphasis on service oriented architecture and the ArcGIS stack has comprehensive publication and synchronization capabilities amongst apps, but you're reading this because you're working outside the stack, at least at one end of your synchronization workflow.  You have used Data Interoperability's Workbench app to wrangle services, databases, files and so on to achieve your own private batch 'service'.  You don't have to be the server and click 'Run' too.  Your friend is this guy:

 

C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe

 

That's right, a big fat executable.  This is the one that does all the work when an ETL tool runs.  You may never have noticed, but when you run an ETL tool while being edited in Workbench, the very first line that appears in the log window is:   Command-line to run this workspace:  followed by the path to our new friend above and the path to the open workbench .fmw file, and any arguments the workspace needs.  Its all there, so lets plug it together.

 

Lets dispense with some legalities first.  With ArcGIS Pro, Enterprise and OnLine you're living in a world of named user licensing.  Your ETL tool may embed these credentials.  Provided the scheduled task you build automates the ETL tool on the machine you would use to run it interactively there should not be any licensing issues.  If someone else needs to run it they should replace the named user credentials first.

 

For my example I'm going to recycle an ETL tool example from an earlier post.  I use it to maintain a hosted feature service using data harvested from a Geoserver instance via an extended WFS API.  It has an official refresh rate of once a week, each Saturday local time; I run the ETL tool when I remember to on Monday mornings (hey its only a demo).  Let's automate that.  Mondays are getting problematic for me, I may forget.

 

The example ETL tool reports the command line I should use to run the workspace is:

 

"C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe" C:\Work\Synchronize\Synchronize.fmw --API_Key "im_not_telling_you_my_api_key" --LDS_Unique_ID "address_id"

 

Because ETL tools store their parameter values it isn't necessary to supply those arguments if they don't change, so this works too:

 

"C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe" C:\Work\Synchronize\Synchronize.fmw

 

Now we create the scheduled task.  Open Task Scheduler and fill in the dialogs for a Basic Task:

 

 

Adjust the settings how you need:

 

 

Tip:  If you configure the task exactly as above a command window like below will pop up, if you don't want this use the setting 'Run whether user is logged on or not'.

 

 

While I remember, if you're interested in more ways to batch ETL check out this post.

 

Now do your bit and come in late Mondays!

 

Note:  We have had reports from the field that Windows Task Scheduler can be impeded from working by some system security settings.  If you find this and cannot work around them with your IT department, log a support call with Esri and ask the analyst to consult Analyst Knowledge Article 000022373 which has a reference to an alternative scheduling technology.

 

The new approach:

Please read the Pro 2.5 help topic 'Schedule geoprocessing tools' for details, I'll only show the user interface experience here.  Starting with the same 'Synchronize' ETL tool as in the legacy approach outlined above, I create a scheduled tool from the Run button, here is a screen grab:

 

 

Select 'Schedule' and you'll get a configuration dialog:

 

 

I set up weekly recurrence like in the first example; to refine the 'Begin On' value the pull-down supplies a handy date-time picker:

 

 

I'm done!  How easy is that!  Apart from the obvious ease of setting up the automation you should note that the ETL tool is just a tool, there are no special considerations around handling an ETL tool versus a core geoprocessing tool (or model).  Caveat, if you are using concurrent licensing and scheduling a Python script tool that calls any extensions (Data Interoperability for example) then your code will need a CheckOutExtension() call.
A fine point, don't forget to use appropriate power management (disk shutdown, sleep, hibernate) settings for your scheduling PC, talk this through with your IT folks if you have any doubts, for example it is possible for network administrators to enforce rules for hibernation that override the visible power settings.

 

Now go ahead and automate stuff!

Let me get you through one paragraph of background before we get to the fun stuff:  In an earlier video I included an example of capturing a spatial constraint from the active ArcGIS Pro map or scene and sending it into an ETL workspace.  The sample happened to be working with a WFS service; these have a bounding box parameter that can constrain the features retrieved.  WFS services also support more complex spatial operators which can be used with arbitrary geometry operands supplied as GML fragments.  However, unless you know how to put all the required XML together for WFS requests then you'll be like me and terrified of attempting it.  ArcGIS Pro 2.3 itself only supports a bounding box constraint on WFS services.

 

Spatial constraints are a lot easier with feature services.  This blog will show you how easy.

 

Core geoprocessing has supported feature services as input parameters for several releases now, why bother using Spatial ETL against feature services anyway?  Well, if your feature service is heading out the door as some other format, or you are using some transformations indicating Data Interoperability, or your feature service is very large and you don't want to use selections to subset it.  I just helped one customer who needed to dynamically handle a spatial constraint mid-ETL with a FeatureReader transformer (more on that below).  There are many use cases.

 

Data Interoperability is all about code-free approaches, but I'll take a wee diversion into feature service REST API query parameters so you understand what goes on.  Below is a screen shot of the HTML view of a feature service Query endpoint.  Note there is an Input Geometry parameter (supplied as JSON) and you can set how it is used, in my case it is a Polygon for which I want only features satisfying the constraint Intersects.

 

 

So, the trick with applying spatial constraints to feature services is just supplying the geometry!

 

In the blog download (Pro 2.3+) you'll find the sample tool used, but the approach is very simple, just apply it yourself in your own models.  Click to enlarge this graphic to see the map I used, the feature set in the map and table of contents and the model run as a tool.  The feature set is driving the analysis geometry automatically.

 

 

The tool being used is the Model named SpatiallyConstrainedGP which has an input parameter of type Feature Set.  At run time you supply a value by choosing a layer or feature class or creating a feature manually by editing in the map.

 

 

SpatiallyConstrainedGP wraps the ETL tool SpatiallyConstrainedETL like this, there is a model tool Calculate Value between the input feature set and the ETL tool:

 

 

All that is happening with Calculate Value is the input feature set is turned into a JSON string with a Python snippet:

 

 

The JSON is then supplied to the published ETL tool parameter Input Geometry (remember the Query endpoint!) and...

 

 

...the ETL tool does its stuff, considering only features intersecting my feature set...

 

 

..which is to make a spreadsheet summarizing some parcel area totals per case of an attribute:

 

 

So that's it, just grab JSON from the map when you need to supply a feature service reader with an Input Geometry parameter.  if you are using a FeatureReader transformer to read a feature service the workflow is a little different, you'll need to convert the JSON into an actual FME feature with a GeometryReplacer (the geometry encoding is Esri JSON) and supply it as the initiator Spatial Filter constraint of the FeatureReader, like this:

 

 

Now you can apply map-driven spatial constraints to your ETL!

Data Interoperability extension sees Point Cloud data, such as ASPRS LAS and Esri LAS Dataset as their own feature type, just like many other formats.  Here is some on a coastline - surf's up!   (Look above the headland)

 

Some high denisty LiDAR on a coastline

 

Formats are designed to deliver specific capabilities, but all geospatial formats have something in common - a coordinate system - and your GIS needs to be able to manage it.  LAS data is a bit of an outlier here as we expect ArcGIS users to collect their data in the coordinate system they intend to use, and stick with it, but in the case where the 'ground moves' (literally, like plate drift or quakes, or if a new datum or realization is published) then ArcGIS's comprehensive core projection tools don't yet support the format.

 

A situation we hear about is people have LAS data in ellipsoidal heights (say WGS84) and want to generate DEMs in orthometric heights.  Orthometric heights are gravity-defined and approximate height above mean sea level, so they are important if you need to model coastal or estuarine flooding, for example.  You can always create a DEM and reproject its vertical coordinate system with the geoid grids delivered by the ArcGIS Coordinate Systems Data install, or your own local ones, but that leaves the LAS data behind .

 

Your LiDAR vendor would be pleased to reprocess your LAS data but you can do it yourself with ArcGIS Data Interoperability extension.  The secret is in this transformer - CsmapReprojector:

 

CsMapReprojector Transformer

 

In the blog download there is a sample specific to accommodating a new vertical datum for New Zealand, but read between the lines in the document delivered in the download and leverage the vertical grids delivered in the Coordinate Systems Data install, or geoid grids you obtain locally, and reproject your LAS data how you need.

 

Then when a point says its floating you can trust it (bad I.T. pun).

 

Floating Point that is nothing to do with a computer data type!

 

Note:  The blog download and the Geoprocessing gallery sample here are equivalent.