Skip navigation
All Places > Open Platform, Standards and Interoperability > Blog

Earthquakes definitely fall into the 'hard to see' category, but also tricky to get right in your GIS.

 

You can easily find earthquake data, government agencies offer feeds and historic databases from which you can extract data.  This is great for 2D maps, but often the Z (vertical) coordinates are given as positive depth values in kilometers, so 'going the wrong way' for the normal 'positive up' coordinate system.  Another wrinkle is the default Z domain for geodatabases has a Z minimum at -100,000, and the lithosphere extends below this depth in meters, so you can lose features on the way in.

 

I'm not going to do a big post on coordinate systems, I'm just going to throw a couple of things over the fence for you to look at.  Firstly watch the movie file in the blog downloads.  I was involved a few years ago in adjusting GIS data after an earthquake moved the ground (a lot, over 6m in some places).  Just watch the movie to see a year's worth of quakes go by and fly to where a lot of deformation occurred after a severe one; you'll fly past labels of movement values and to a homestead that shifted.  The apparent sudden jump of the property is real, and what you'll see is high resolution orthophotography before and after the adjustment work (it didn't have to be re-flown, just adjusted).

 

 

The movie was exported from an ArcGIS Pro 3D Scene, but this was only possible with correct 3D points for the quakes, and that data was made from a GeoJSON download and processing with the Spatial ETL tool Quakes2016.fmw that is the second download file.

 

Its a really simple workspace....

 

 

..until you go to the Tool Parameters>Scripting>Startup Script setting and see a bit of fancy footwork making a custom Feature Dataset in the output geodatabase with a Z domain that goes to the center of the earth.  The takeaways are you might not have known about startup scripts and that you can use one to operate on workspace parameters.

 

 

 

Please comment on the post with your experiences and ideas.

Dataset management in ArcGIS has plenty of supporting tools and workflows, but when you don't have control for any reason you may be the person who has to figure out what data changed, and where.

 

This blog is about a tool published in the ArcGIS Online sample galleries for bulk change detection between pairs of feature classes.

 

My first example datasets are two parcel feature classes, where one has been revised with survey and subdivision work, but without any edit tracking fields - the data is not managed in ArcGIS.  The maps are named for their content, Original has the old data, Revised has the new data.

 

 

The two datasets have about 650,000 features each over a huge area, so visual comparison is impossible, especially as I need to compare attributes too.  The Feature Compare geoprocessing tool is an option if my data has a unique key field to sort on (it does) but its output is a table, I want features.

 

The Pro Change Detector tool delivers flexible change detection between two feature classes with your choice of attribute and geometry comparison, and outputs feature classes of Adds, Deletes, Updates and NoChanges (Updates are only detectable if the data has a unique key field separate to ObjectID; without a key field updates are output as spatially overlapping deletes and adds).

 

The tool requires the ArcGIS Data Interoperability extension, but you don't have to learn to drive the Workbench application delivered with Data Interoperability, this sample is just a normal Python script tool.

 

For my parcel data I chose all the attributes to be considered as well as geometry:

 

 

Then 7 1/2minutes later after comparing ~650,000 features per input I had my change sets:

 

 

You can compare any geometry type but if you are going to do change detection of multiple pairs of feature classes be sure to change the output objects names as the tool will overwrite its outputs.  Alternatively, keep your data in separate project databases (see below).

 

For a second example I decided to 'go big' and compare two street address datasets each with about 2 million features and a lot of attributes:

 

 

Now its 22 minutes to find a couple of thousand changes to 2 million features:

 

 

...and in the map it is easy to find a locality where subdivision has resulted in new addresses being created - see the extra address points in the Revised map:

 

 

To use the tool your data must be in a single File Geodatabase, here is how my Catalog pane looks, note to preserve my change sets I used two separate databases in the Project.

 

 

The tool was created with ArcGIS Pro 2.5 beta 2 software (sharp eyed people will see the new style geoprocessing Details view above) but works in Pro 2.4.  You will need ArcGIS Data Interoperability installed and licensed, and you'll need permission to copy a file into the install of your Pro software, please see the README file in the download.

 

Now go detect some changes and comment in this blog how you get on!

Many organizations publish OGC WFS services as one option for data supply, either to the general public or to a restricted audience.  Often however these services are intended for large scale mapping, such as within a single municipality, and bulk download at national scale is not supported - either a maximum feature collection size per request is set on the server, or response paging is not supported, so an out-of-the-box client is not going to deliver an entire dataset.   Sometimes, although these restrictions are not present, assembling and delivering a request for a large feature collection is beyond the capability of the server or network settings (by design), or the client app doesn't support paging (full disclosure, WFS 2.0.0 response paging is coming to core ArcGIS Pro in a future release; Data Interoperability extension already supports WFS 2.0.0 paging if the server provides next/previous URLs).

 

This blog is about using ArcGIS Data Interoperability to work around these limitations to achieve repeatable bulk download of WFS data at any scale.  You will need solid Data Interoperability (or FME) skills to implement this workflow, or be willing to learn from the content of the blog download.

 

At this point I need to show you a map or you'll go do something else, so I bring you today's subject matter - Norway!

 

 

It's necessary to use a real world example, and the people at GeoNorge have excellent public WFS services that let me show the issues, so Norway is it.  Browsing their site I settled on a road network service.  Here is how to get there yourself, while optionally learning a little Norwegian.  Here is GeoNorge, (don't use '/en' if your Norwegian is up to it) click on Go to the map catalogue, then in the selector pane on the left choose Type = Service, Topic = Transportation, Distribution form = WFS Service, then of the available services click on ELF Road Transport Network.  Scroll down and you'll see:  Get Capabilites Url: https://wfs.geonorge.no/skwms1/wfs.inspire-tn-ro?request=GetCapabilities&service=WFS.

If you don't know OGC standards, be thankful, that's our job!  The URL above is a typical pattern, the XML document returned advertises what the WFS service can do.  You know I'm going to make you click on the above URL don't you and inspect the response, but before the excitement of XML we'll go off road here and begin to understand the problem a little better.
Here is a map of 50 food businesses within 500m walking distance of the Royal Palace in Oslo.  I detect a pattern of having to walk north or south of the palace for lunch, which is interesting, maybe its a function of having to cross a major road bisecting the area, but my main point is downtown Oslo has a lot of roads you can walk alongside, whereas up in the arctic circle - not so many (no map, but trust me).  We're going to need a way to read the WFS road transport service in chunks such that we don't request more than the service response limit in cities and don't make unnecessary requests in areas with few roads.  We're going to design a tiled WFS reading strategy.
OK now click on the GetCapabilities URL and look for these things:
We cannot request pages:
We can only get 10000 features at a time:
We can retrieve tn-ro:RoadLink feature types in a wide variety of coordinate systems over a huge area:
We can request features within a Bounding Box (BBOX):
Now for an exercise.  Open the Workbench app from the Analysis ribbon (Data Interoperability will need to be installed and licensed) and add a WFS Reader using these parameters (GetCapabilities URL, WFS Version 2.0.0, RoadLink feature type, no MaxFeatures).  Connect a logger to the reader, there is no need to write anything.
Run the workspace, you will see this URL is generated and you'll get a download containing 10000 features.
Now add the URL to your browser then edit the URL to add a parameter 'resultType=hits'.  This is a special request to count the number of features available in the service, run the edited URL in your browser.  You'll get a response like this:
See the numberMatched property -  1,976,423 Road Link features are available.
Norway has a land area of ~385,000 square kilometers, so on average ~5 road link features per square kilometer, and on average ~2,000 square kilometers will have ~10,000 road links, the WFS service limit, roughly a 45km square.  It is going to be a much larger area in the country's north to contain 10,000 features.  Using the scientific method of picking a convenient number out of thin air that is the right order of magnitude, my starting point for a WFS-reading tiling scheme was a 100km square fishnet, made with the Create Fishnet geoprocessing tool (cells that do not intersect land are deleted, and I went with ETRS 1989 UTM Zone 33N projection, which is EPSG:25833 in the service properties):
Notice I added some fields (XMin,YMin,XMax,YMax,RoadCount) to the fishnet and set the initial values for the coordinate bounds fields (using Python snippets - these are in the blog download).  These bounds are going to be used as Bounding Box parameter inputs in WFS requests.  Now I need a workflow to refine the fishnet so cells are subdivided progressively so less than 10,000 road link features will be in each.  First I need to figure out the methodology of reading the WFS service in an extent....
If you open Workbench and drag in BasicGetFeatureWithBBOX.fmw from the blog download you'll see a WFS reader with the properties I needed to inspect a GetFeature URL.  The workspace looks like this:
Under the reader you can see how I replicated the GetFeature URL in an HTTPCaller but parameterized the BBOX values.  I used a fishnet cell extent containing the city of Trondheim.  The download format is GML  I used the Quick Import geoprocessing tool (available with Data Interoperability) to translate the GML into a file geodatabase.  Here are 10,000 road links around Trondheim:
Now I have the building blocks of a tiled WFS reader.  And here it is!  ReadWFSFeatures.fmw:
The Spatial ETL tool reads RoadLink features in fishnet cells selected by a WHERE clause, here is the first pass reading features in all cells:
I can see not all 100km cells intersect roads - the ones you can see selected in the fishnet layer - so they can be deleted.  Now the work of refining the fishnet begins.
The iterative workflow is this (be very careful!):
  • Run ReadWFSFeatures.fmw with a WHERE clause selecting the smallest cell size (initially Shape_Length = 400000, then 200000 when those cells are made, then 100000 when those are made in a subsequent step below...)
  • Add the output RoadLink feature class to your map
  • Run RoadCount.py in the Python window to populate RoadCount in NO_Fishnet
  • Select NO_Fishnet features with RoadCount >= 9000 (undershooting 10,000 to allow for road construction)
  • If there are no NO_Fishnet features selected then BREAK - you are finished making the fishnet
  • Run MinimumBoundingFishnet to create a separate fishnet with cells half the width/height of the previous minimum; it is important the selection on NO_Fishnet is still active
  • Run Delete Features on the selected NO_Fishnet cells
  • Run Append to add the generated smaller fishnet cells to NO_Fishnet, using the field map option.
  • Run SetExtentAttributes.py in the Python window to recalculate the boundary coordinates
  • Delete the RoadLink feature class
  • Go back to the first step
The first subdivision of fishnet cells into 50km square features with MinimumBoundingFishnet looks like this:
After looping through the fishnet refinement process until no cells contain more than 9,000 roads, you can run ReadWFSFeatures.fmw with a WHERE clause that selects all fishnet cells and create the complete RoadLink feature class.  Finally run RoadCount.py to populate NO_Fishnet with how many road segments intersect each cell.  See if there are any cells with RoadCount = 0 and if you think roads will never be built there then delete the cells, but you'll have to be Norwegian to make that judgement.
Downloading all features took exactly 1hr 0s and exactly 1,976,423 arrived, just as advertised by the WFS service.  Here is how the data looks, with the labels being the final road count:
The fishnet can be repurposed to access other WFS features from the GeoNorge agency, and the methodology applied to any WFS service that cannot supply a complete dataset with core approaches.
This post was created using ArcGIS Pro 2.5 beta 2 software, but the .fmw files should work in Pro 2.4.  If the MinimumBoundingFishnet tool doesn't work for you, download a fresh copy from here.

The National Emergency Number Association promulgates GIS standards for datasets that support public safety operations in the USA.  A principal example is Civic Location Data Exchange Format (CLDXF).  Digging in further we can find a well defined data model for address points. The problem we're tackling in this blog is how to directly use data maintained in this schema to create ArcGIS geocoding locators without anyone having to construct complex ETL processes and copy data around repetitively.

 

The workflow requires your NENA data be maintained in an Enterprise Geodatabase, and there is a disclaimer - the full granularity of subaddress elements in the NENA schema is not supported.  At time of writing (Pro 2.4.1 release) only one pair of subaddress type & identifier values is supported, but the sample demonstrates how three pairs of type & identifier values can be handled, as at the Pro 2.5 release locators will support this many subaddress fields.  My test data (the counties of Kings, Queens, Nassau and Suffolk in New York, thanks to NYS GIS Clearing House) has units (apartments etc.), levels (floors, basements etc.) and building units (rooms, annexes etc.).  Building name is usable too, and seat in the room and additional location data is retained and may be output by a locator but not used for searching.

 

Before we go further, why doesn't Esri just design the Create Locator tool to accept all the NENA fields?  The short answer is we have to have internationally applicable parameters so it would overload the tool.

 

I said 'no ETL required'.  Well hopefully that is true for you, and for my test data it would be if I had access to the database, but what I often see in the wild is things like empty strings and blank values in character fields, so I like to enforce proper null values and fix invalid date values with a bit of processing with Data Interoperability extension.  In the screen captures below (click on images to enlarge) I'm making sure empty data is null as I import my test data to my EGDB.

 

 

 

 

The only other thing I did with my ETL was rename fields to lower case (what PostgreSQL likes, my EGDB platform) and make a couple of fields wider (pretype, posttype) in case my concatenations overflow those fields.  Make sure domains don't bite you too, you'll be adding new values to pretype and posttype fields.  Having said that though, I see in the data view of my layer that the character fields have arbitrary widths of 255 characters, so I'm not sure if the input field definitions are honored, or that views have any concept of domains, this is something that might be platform dependent.  Anyway, that gets me to what should be your starting point.  I have NENA-schema address points in my EGDB and I want to make a locator.

 

The secret sauce here is creating a view in my DBMS that performs all the manipulations necessary to rename, cast, substring and concatenate data into a schema directly usable in ArcGIS Pro as a feature layer input to the Create Locator geoprocessing tool, using the Point Address data role.

 

I seldom descend into SQL to this depth so to develop my view I built it up in pgAdmin (you'll need whatever SQL authoring tool comes with your DBMS), going field by field and inspecting the result in Pro as I went.  Tip:  you can recreate your view in pgAdmin and leave it in Pro's table of contents and just reset the layer source each time you want to view it - it will refresh in the map.

 

 

The blog download has the pgAdmin SQL source - esri_view.sql - and you can inspect the comments to understand the logic.  Basically the fields specific to NENA that cannot be mapped to Point Address role inputs have their values passed into other fields.  Fields combining type & identifier values are parsed into separate fields for each.  The SQL will need to be ported to your environment, but its pretty standard stuff.

 

If you are a SQL wizard and can go straight to a SELECT statement then you could use the Create Database View tool and input the view definition.  The edited source (no comments in it) is the file test_view.sql in the download.  No prizes for user interface design but it works:

 

 

Having created the view, add it to your map and specify the ObjectID field as the unique identifier:

 

 

 

Let it index and you have your (dynamic) view of NENA data in your map as a feature layer:

 

 

You can see why I had to widen the type fields, check out '1375 Sunrise Hwy Westbound Service Road, Islip, NY, 11706'

 

 

Anyway, run Create Locator (hard to make an exciting graphic but hopefully useful):

 

arcpy.geocoding.CreateLocator("USA", "nena.sde.esri_view PointAddress", @"""PointAddress.ADDRESS_JOIN_ID 'nena.sde.esri_view'.address_id"";""PointAddress.HOUSE_NUMBER 'nena.sde.esri_view'.house_number"";""PointAddress.BUILDING_NAME 'nena.sde.esri_view'.building_name"";""PointAddress.STREET_NAME_JOIN_ID 'nena.sde.esri_view'.street_id"";""PointAddress.STREET_PREFIX_DIR 'nena.sde.esri_view'.prefix_direction"";""PointAddress.STREET_PREFIX_TYPE 'nena.sde.esri_view'.prefix_type"";""PointAddress.STREET_NAME 'nena.sde.esri_view'.street_name"";""PointAddress.STREET_SUFFIX_TYPE 'nena.sde.esri_view'.suffix_type"";""PointAddress.STREET_SUFFIX_DIR 'nena.sde.esri_view'.suffix_direction"";""PointAddress.SUB_ADDRESS_UNIT 'nena.sde.esri_view'.unit"";""PointAddress.SUB_ADDRESS_UNIT_TYPE 'nena.sde.esri_view'.unit_type"";""PointAddress.NEIGHBORHOOD 'nena.sde.esri_view'.neighborhood"";""PointAddress.CITY 'nena.sde.esri_view'.city"";""PointAddress.METRO_AREA 'nena.sde.esri_view'.metro_area"";""PointAddress.SUBREGION 'nena.sde.esri_view'.county"";""PointAddress.REGION 'nena.sde.esri_view'.state"";""PointAddress.POSTAL 'nena.sde.esri_view'.zipcode"";""PointAddress.COUNTRY 'nena.sde.esri_view'.country""", r"C:\Work\Product_Management\Address_Management\Nena", "ENG", None, None, None)

 

Then geocode!

 

Units work:

 

285 Asharoken Ave, #1, Huntington, NY, 11768

 

 

Fancy house numbers work:

 

5 1/2 Locust Ave, Brookhaven, NY, 11790

 

 

Building names work:

 

Building 22A, John F Kennedy Airport, New York, NY, 11430

 

 

So there you have it, maintain your data in NENA compliance and use it to geocode.

 

But wait, there's more!  In response to the blog commentary around handling aliasing the download has been updated to add the SQL source esri_views.sql that creates an alternate city name table, used as below in Create Locator - see the Alternate Name Tables section:

 

 

Ignore the warning chip in the dialog capture, that just appears after locator creation to indicate you'll overwrite the output if you re-run the tool.

 

The wisdom of harvesting alternate city names from as many fields as i did can be debated, but hopefully you get the idea, the various NENA fields for zone values can be viewed suitably for use as alternate name roles.  In production, it would be more efficient to create an alternate city name table from centerline data and join to it on street_id.

 

Here is the view used as the alternate city name table:

 

 

The address with address_id = 'KIN0000001' is '463 Maspeth Ave, New York, NY, 11211'  Using the city alias 'Brooklyn' works with score = 100:

 

 

Additionally, I took a question off-line about maintaining all parts of addresses defined in the FGDC standard such as prefix and suffix address number parts, street name separator elements, pre-modifiers and post-modifiers.  If you want to output these elements when geocoding then define them as custom output fields for your locators.  This functionality is available in the tool as the last parameter, but you'll also need to supply source fields in the field map for each output.

I output seat and additional_location in my locator, which would let me work on candidates if that's what I needed.

 

GeoNet Ideas contains many customer requests for ODBC connectivity from Pro to databases that are not supported ArcGIS workspaces.  This blog is about implementing read-only import of ODBC data sources to Geodatabase.

 

Thumbnail:  We'll use a scripted approach, creating a Python script tool in a Pro toolbox.  You could make this a standalone script and a scheduled task.  The coding isn't scary.  You'll need permission to create an ODBC data source on your computer, and if you need to publish this to ArcGIS Enterprise the data source will need to be set up there too.  If multiple users need to use the tool on your machine the ODBC data source will need to be a system one.  Off we go...

 

The ArcGIS Pro python environment ships with a module named osgeo, from the OSGeo organization.  This supplies the GDAL libraries that support conversion of hundreds of geospatial formats, and one of the supported sources is ODBC, which isn't a 'format' of course but handles moving tabular data around.

 

For my example source I chose MariaDB, a binary equivalent to MySQL.  After installing MariaDB and the appropriate 64bit ODBC driver I imported some CSV data and created a user-level ODBC data source in Windows.  Here is how the admin tool looks (click on images to enlarge them):

 

 

MariaDB ships with a handy administration utility - HeidiSQL - here is how the Data view of my target data looks in HeidiSQL (the names and addresses are made up):

 

 

So that's my target data, now how to get at it?  To understand what the osgeo module needs to connect to my ODBC source I researched the relevant vector driver.  So far so good.  With a little more surfing some examples and the submodule osgeo.ogr API the parts were apparent.  Next step - code it!  Here is the result in my project:

 

 

The blog download has the tool and source, plus the CSV data I used.  Disclaimer: this is a very simple example without any defensive code to handle variability in the input data.  The idea is to give you confidence you can script a repeatable workflow.

 

How did I do?   I run the tool:

 

 

...and the output table is created in my project home geodatabase.

 

 

Success!  I imported 6000 rows in about 8 seconds.  So that is the pattern I wanted to show.  The approach will handle more data types than just the string and integer values I used, and it is quite likely the part of my code where I map OGR field types to ArcGIS field types has issues.  Please do comment in this blog space on your challenges and successes.

 

Now for the optional extra - Access databases!

 

I have 64bit Office on my machine, I also have Microsoft Access Runtime 2013 installed, I'm not entirely sure if both are needed or just one but my ODBC datasource options include .mdb and .accdb.  Otherwise the pattern to reading Access databases is the same as the above.  I configured an ODBC MS Access Database connection in the 64bit ODBC administrator to connect to an .accdb database on disk.  I possibly should have added a new one and given it a descriptive name but you get the idea.  From there it is just like any other ODBC source, except it does have a dependency on 64bit Office and/or the runtime driver.

 

Everyone likes SQLite databases - they are a single file, perform and scale well, support enough SQL to be useful and have a DB API-compliant Python module and API access in other languages.  SQLite databases can power a lot of mobile and desktop apps, ArcGIS Pro included.

 

SQLite as a container has an incarnation - OGC GeoPackage - that supports the encoding of vector and raster features for direct use in ArcGIS Pro.  You can read about the standard here.

 

The GIS format most often compared with GeoPackage is the Esri-defined shapefile.  Shapefile is the most shared GIS format on the planet and its encoding of vector features is published.  Note however the publication date - 1998.  At the time the shapefile was designed, the components available had limitations that can frustrate today's advanced workflows.  These include file size limit, attribute field count and name width limits, dates not supporting time, complexity in handling character encodings and lack of null value support for most field types.  Shapefile has been spectacularly successful for handling simple vector features, but it can be limiting.

 

I think of GeoPackage as the new shapefile without the old limitations and I encourage you to as well, it is a great format for, well, geo-packaging!  However, don't go as far as thinking it is a full blown GIS workspace, it doesn't have geodatabase behaviors like support for editing.  What it does it does well, let you move data around in a directly accessible and performant database.

 

GeoPackage is extensible, and there are approved OGC extensions for gridded tiles of elevation data and table relationships, and non-approved community extensions such as map styling of features, and storing vector tiles.  ArcGIS Pro does not yet implement support for any GeoPackage extensions (excepting aspatial table functionality adopted in the v1.2 release).

 

What can you do reliably with a GeoPackage in ArcGIS?

 

  • Share vector and raster data
    • As a direct-read layer source in Pro
    • As a static item in your portal or ArcGIS Online
  • Use vector and raster data in read-only workflows
    • View, symbolize, query... in Pro

 

What can you do with limitations with a GeoPackage in ArcGIS?

 

  • Perform Geoprocessing in Pro
  • Manage table and feature class schemas in Pro

 

What can you not do with a GeoPackage in ArcGIS?

 

 

More on the limitations:  Basically, if an analysis just reads GeoPackage data and creates an output in a Geodatabase it should work.  For example Summary Statistics, Feature To Point, Polygon To Line work fine.  Geoprocessing tools that make use of Geodatabase behavior may fail with GeoPackage data, for example Feature Class To Feature Class, Copy Features and Table to Table.  You can add fields and calculate values with geoprocessing tools or arcpy, but you may find it slower than native Geodatabase operations.  Geometry storage in a GeoPackage is not compressed like a geodatabase, so they can get big.  The recommendation is to do your geoprocessing before creating your GeoPackage, then copy your data into it.

 

If Feature Class to Feature Class and Table to Table might fail how do you get your data into a GeoPackage?  Firstly, get your data into the state you need it for sharing.  Then move your data into the GeoPackage like this:

 

  • Create a GeoPackage with the Create SQLite Workspace tool (using the GeoPackage spatial type)
  • Use the Copy tool (Data Management, General toolset) to add vector data
  • Use the Add Raster to GeoPackage tool (Conversion, To GeoPackage toolset) to add raster mosaics

 

You can also use the copy/paste context menu behavior of the Catalog pane tree to move data into a GeoPackage.

 

Your GeoPackage is now ready for use!

 

Note on sharing:  You can upload a .gpkg file to your portal or ArcGIS Online, the file type will be recognized.  You can send a link after sharing the item and others can then download it from the content gallery.

 

Advanced topic:  Because it is based on SQLite, GeoPackage comes with a database engine and good SQL language support.  There are 3rd party tools for working with SQLite which you may find useful, but to include a spatial component in your work the ArcGIS Data Interoperability or Safe Software FME products support scripting SELECT, CREATE, DROP, DUPLICATE, TRUNCATE and CROSS JOIN statements within Spatial ETL tool transformers like SQLCreator and SQLExecutor.  This approach enables very powerful and performant use of a GeoPackage.

This post is about automating repetitive ETL processes right from your desktop.  No code, no server.

 

We're seeing many people using Data Interoperability to periodically synchronize datasets between systems of record.  Typically the source data refresh 'trigger' is driven by a schedule and not some random event, and the frequency of updates is based on multiples of a working day.  If you're on this kind of treadmill this post is for you!

 

You may have heard of this sort of automation in the context of Windows Task Scheduler with a Python script as the task and the script calling a geoprocessing tool or model.

We're going down the task scheduling path too, but without needing Python.

 

In the modern era there is a lot of emphasis on service oriented architecture and the ArcGIS stack has comprehensive publication and synchronization capabilities amongst apps, but you're reading this because you're working outside the stack, at least at one end of your synchronization workflow.  You have used Data Interoperability's Workbench app to wrangle services, databases, files and so on to achieve your own private batch 'service'.  You don't have to be the server and click 'Run' too.  Your friend is this guy:

 

C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe

 

That's right, a big fat executable.  This is the one that does all the work when an ETL tool runs.  You may never have noticed, but when you run an ETL tool while being edited in Workbench, the very first line that appears in the log window is:   Command-line to run this workspace:  followed by the path to our new friend above and the path to the open workbench .fmw file, and any arguments the workspace needs.  Its all there, so lets plug it together.

 

Lets dispense with some legalities first.  With ArcGIS Pro, Enterprise and OnLine you're living in a world of named user licensing.  Your ETL tool may embed these credentials.  Provided the scheduled task you build automates the ETL tool on the machine you would use to run it interactively there should not be any licensing issues.  If someone else needs to run it they should replace the named user credentials first.

 

For my example I'm going to recycle an ETL tool example from an earlier post.  I use it to maintain a hosted feature service using data harvested from a Geoserver instance via an extended WFS API.  It has an official refresh rate of once a week, each Saturday local time; I run the ETL tool when I remember to on Monday mornings (hey its only a demo).  Let's automate that.  Mondays are getting problematic for me, I may forget.

 

The example ETL tool reports the command line I should use to run the workspace is:

 

"C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe" C:\Work\Synchronize\Synchronize.fmw --API_Key "im_not_telling_you_my_api_key" --LDS_Unique_ID "address_id"

 

Because ETL tools store their parameter values it isn't necessary to supply those arguments if they don't change, so this works too:

 

"C:\Program Files\ArcGIS\Data Interoperability for ArcGIS Pro\fme.exe" C:\Work\Synchronize\Synchronize.fmw

 

Now we create the scheduled task.  Open Task Scheduler and fill in the dialogs for a Basic Task:

 

 

Adjust the settings how you need:

 

 

Tip:  If you configure the task exactly as above a command window like below will pop up, if you don't want this use the setting 'Run whether user is logged on or not'.

 

 

While I remember, if you're interested in more ways to batch ETL check out this post.

 

Now do your bit and come in late Mondays!

Let me get you through one paragraph of background before we get to the fun stuff:  In an earlier video I included an example of capturing a spatial constraint from the active ArcGIS Pro map or scene and sending it into an ETL workspace.  The sample happened to be working with a WFS service; these have a bounding box parameter that can constrain the features retrieved.  WFS services also support more complex spatial operators which can be used with arbitrary geometry operands supplied as GML fragments.  However, unless you know how to put all the required XML together for WFS requests then you'll be like me and terrified of attempting it.  ArcGIS Pro 2.3 itself only supports a bounding box constraint on WFS services.

 

Spatial constraints are a lot easier with feature services.  This blog will show you how easy.

 

Core geoprocessing has supported feature services as input parameters for several releases now, why bother using Spatial ETL against feature services anyway?  Well, if your feature service is heading out the door as some other format, or you are using some transformations indicating Data Interoperability, or your feature service is very large and you don't want to use selections to subset it.  I just helped one customer who needed to dynamically handle a spatial constraint mid-ETL with a FeatureReader transformer (more on that below).  There are many use cases.

 

Data Interoperability is all about code-free approaches, but I'll take a wee diversion into feature service REST API query parameters so you understand what goes on.  Below is a screen shot of the HTML view of a feature service Query endpoint.  Note there is an Input Geometry parameter (supplied as JSON) and you can set how it is used, in my case it is a Polygon for which I want only features satisfying the constraint Intersects.

 

 

So, the trick with applying spatial constraints to feature services is just supplying the geometry!

 

In the blog download (Pro 2.3+) you'll find the sample tool used, but the approach is very simple, just apply it yourself in your own models.  Click to enlarge this graphic to see the map I used, the feature set in the map and table of contents and the model run as a tool.  The feature set is driving the analysis geometry automatically.

 

 

The tool being used is the Model named SpatiallyConstrainedGP which has an input parameter of type Feature Set.  At run time you supply a value by choosing a layer or feature class or creating a feature manually by editing in the map.

 

 

SpatiallyConstrainedGP wraps the ETL tool SpatiallyConstrainedETL like this, there is a model tool Calculate Value between the input feature set and the ETL tool:

 

 

All that is happening with Calculate Value is the input feature set is turned into a JSON string with a Python snippet:

 

 

The JSON is then supplied to the published ETL tool parameter Input Geometry (remember the Query endpoint!) and...

 

 

...the ETL tool does its stuff, considering only features intersecting my feature set...

 

 

..which is to make a spreadsheet summarizing some parcel area totals per case of an attribute:

 

 

So that's it, just grab JSON from the map when you need to supply a feature service reader with an Input Geometry parameter.  if you are using a FeatureReader transformer to read a feature service the workflow is a little different, you'll need to convert the JSON into an actual FME feature with a GeometryReplacer (the geometry encoding is Esri JSON) and supply it as the initiator Spatial Filter constraint of the FeatureReader, like this:

 

 

Now you can apply map-driven spatial constraints to your ETL!

Data Interoperability extension sees Point Cloud data, such as ASPRS LAS and Esri LAS Dataset as their own feature type, just like many other formats.  Here is some on a coastline - surf's up!   (Look above the headland)

 

Some high denisty LiDAR on a coastline

 

Formats are designed to deliver specific capabilities, but all geospatial formats have something in common - a coordinate system - and your GIS needs to be able to manage it.  LAS data is a bit of an outlier here as we expect ArcGIS users to collect their data in the coordinate system they intend to use, and stick with it, but in the case where the 'ground moves' (literally, like plate drift or quakes, or if a new datum or realization is published) then ArcGIS's comprehensive core projection tools don't yet support the format.

 

A situation we hear about is people have LAS data in ellipsoidal heights (say WGS84) and want to generate DEMs in orthometric heights.  Orthometric heights are gravity-defined and approximate height above mean sea level, so they are important if you need to model coastal or estuarine flooding, for example.  You can always create a DEM and reproject its vertical coordinate system with the geoid grids delivered by the ArcGIS Coordinate Systems Data install, or your own local ones, but that leaves the LAS data behind .

 

Your LiDAR vendor would be pleased to reprocess your LAS data but you can do it yourself with ArcGIS Data Interoperability extension.  The secret is in this transformer - CsmapReprojector:

 

CsMapReprojector Transformer

 

In the blog download there is a sample specific to accommodating a new vertical datum for New Zealand, but read between the lines in the document delivered in the download and leverage the vertical grids delivered in the Coordinate Systems Data install, or geoid grids you obtain locally, and reproject your LAS data how you need.

 

Then when a point says its floating you can trust it (bad I.T. pun).

 

Floating Point that is nothing to do with a computer data type!

 

Note:  The blog download and the Geoprocessing gallery sample here are equivalent.

Agencies around the world publish their data on the web using a great variety of technologies, and while standards exist to make them accessible within ArcGIS, nothing performs within ArcGIS like our own services.  Sometimes it just makes sense to regularly synchronize data from its system of record to ArcGIS Online or Portal.  This blog is about how to do that efficiently.

 

To see if you should read further, download the blog attachment NZ_Street_Address.lyrx and add it to a new map in Pro, then using the Locate pane and the ArcGIS World Geocoding Service zoom to Wellington, NZL (or your favorite other New Zealand locality).  Zoom in to 1:5000 or larger scale, pan around, turn on the label classes Standard Number, Suffix Number and Range Number and inspect the address point house numbers.  Identify features.  Select features.  You are accessing a feature layer in an ArcGIS Online standard feature data store.  Here are links to the item and service.  If you have a reasonable internet connection you will have a good map exploration experience.  The layer you are looking at has over 2 million features.  You can download the data.  You can use it in geoprocessing.  The data is maintained weekly and the synchronization process averaging thousands of updates each week takes under 2 minutes.  The approach uses no coding.  If you want to do this for data accessible to you then read on (click on images to enlarge them).

 

 

Firstly, what data sources are candidates for this treatment?  Anything accessible to ArcGIS Data Interoperability extension, which is all these formats and feeds in many storage repositories.  My specific example happens to use data available by WFS service but that is not critical to the discussion, the approach is generic.

 

Lets dig a little deeper.  To look at the layer a little more closely, with ArcGIS Online as your active portal, Add Data from All Portal with search tags 'LDS' and 'ETL'.

 

 

You'll see the same point features (with default symbology) but also in your table of contents there is a standalone table 'Timestamps' with one row:

 

 

 

The value in UpdatedUTC is refreshed at each synchronization so will differ from the graphic but its the key to synchronization.  It lives within the feature service as a layer.  The UTC time of synchronization is the final step of the process that also writes feature updates.

 

So what are all the steps?  To follow you'll need ArcGIS Pro 2.3+ with Data Interoperability extension installed and enabled, and to have downloaded the toolbox and ETL tool source .fmw files in the blog download Synchronize.zip.  Add the toolbox to your project, you'll see these ETL tools in it:

 

 

Right click each ETL tool and repair the source path to its .fmw file.

 

My target data is available as a bulk download, which i took as a file geodatabase.  I copied the address point feature class into my project home geodatabase.  In any event get your target data into your project home geodatabase, using ETL processes if necessary.

 

Next I made the Timestamp table using MakeTimestampTable, which looks like this:

See note below its not a great idea to use the table name 'Timestamps' but we'll let it go for now

Repair the destination file geodatabase path to be the same as your features of interest.  If you run MakeTimestampTable in edit mode you can pick your own initial timestamp value with a useful date picker.  I used UTC time but didn't have to get it exact, if you do and live in Greenwich UK then look at your watch and ignore any current daylight savings adjustment, otherwise use a little Python after making the table with any value:

 

 

Then calculate UpdatedUTC to equal DownloadedUTC and you'll have it:

 

 

Its at this point in blog writing you find out its a really bad idea to use a table name 'Timestamps' as it is too close to a reserved word in many database technologies including file geodatabase, but as it doesn't affect my goal here I'll leave it, but if you go into production use another name!

 

Now stand up a feature service.  Add your target data and the timestamp table to a map, then select both objects in the table of contents:

 

 

Then right click and choose Share as Web Layer:

 

 

Configure the service to be a feature layer in the folder you want and let it load.

 

Included in Synchronize.tbx is an ETL tool LoadData that creates a feature service too if you want to go that way.

 

 

Now for the synchronization stuff in the ETL tool Synchronize:

 

 

The design of your version will depend on your target data, but in broad strokes:

 

  • The current UTC time at the beginning of processing is captured
  • The timestamps layer (table) is read from the Esri web layer
  • Your target data is read from its system of record
  • Inserts, Updates and Deletes are derived between the target source and Esri web layer
  • Inserts, Updates and Deletes are validated by unique identifier comparison with the Esri layer
  • Deletes are committed
  • Updates are written
  • Inserts are written
  • The timestamps layer (table) is updated with the UTC time captured when processing began

 

For my target data the curator provides a changeset API that let me build from/to times into a WFS call which gave exact insert, update and delete sets.  If your data has timestamps for created, edited and retired values you can do this yourself.  If you have nothing to go on you can derive changesets by reading all data from both sources and doing brute force change detection with the UpdateDetector transformer, although this of course may take time.

 

In the Synchronize ETL tool there are some less obvious features.  The sequence of feature writing is determined by writer order in the Navigator pane, top down.  Writing the timestamp update is therefore enforced to be last, so if anything fails it will not be falsely updated.  ArcGIS Online and Portal feature writers in Delete and Update mode require the ObjectID value in the Esri service to be sent with the feature, so the values are picked up mid stream with a FeatureReader and joined on a layer unique identifier.  Similarly, the Inserts stream looks for existing unique identifiers before writing, only features known to not exist pass.

 

In the opening paragraph I said the approach uses no coding.  There is a math function used (floor) to calculate a batch number in modulo 20 chunks to obtain target service ObjectIDs.  That's as close to writing code you need to get, although you are free to use Python if you like.

 

While I mention coding, in a production environment you would want to run synchronization as a scheduled task.  This begins as a Python script.  I stub one out here that assumes things like ETL web connections are available to the process owner, which is easily done by sharing the connection file in a well known directory. 

 

 

Another approach I'll blog about separately is calling the FME engine executable directly in a scheduled task.

 

Do explore the ETL tools supplied and send questions to this forum.

 

I hope this gives you confidence to build your own synchronizations.  Enjoy!

We're going on a journey to the bottom of the sea, but the real message here is the ability of ArcGIS Data Interoperability to reach out to the web (or anywhere) and get feature and media data into a geodatabase feature class with attachments without having to code.  Well just a tiny bit, but you don't have to sweat the details like a coder.

 

A colleague came to me asking if ArcGIS Data Interoperability could bring together CSV position and time data of a submersible's missions and related media content and get it all into geodatabase.  In production all data sources will be on the web.  No problem.  Data Interoperability isn't just about formats and transformations, it is also about integrations, and building them without coding.

 

 

Python comes into the picture as a final step that avoids a lot of tricky ETL work.  The combination of an ETL tool and a little ArcPy is a huge productivity multiplier for all you interoperators out there.  Explore the post download for how the CSV and media sources are brought together - very simply - below is the whole 'program':

 

 

ArcGIS Data Interoperability has a great 'selling point', namely that you can avoid coded workflows and use the visual programming paradigm of Workbench to get data from A to B and in the shape you want.  I often show colleagues how to efficiently solve integration problems with Data Interoperability and its always pleasing to see them 'get it' that challenges don't have to be tackled with code.

 

Low level coding is the thing we're avoiding.  ArcGIS geoprocessing tools are accessible as Python functions; using geoprocessing tools this way is just a command-line invocation of what you have access to in the Geoprocessing pane and Analysis ribbon tools gallery and so on.  If this is news to you, take a look and run the Get Count tool first from the system toolbox and then use the result in the Python window.

 

Here is the tool experience:

 

 

Now in the Catalog pane History view, right click the result and send to the Python window:

 

 

You'll see the Python expression equivalent of the tool:

 

 

Note I haven't written any code...

 

Where am i going with this?  ArcGIS Data Interoperability concepts differ a little from core geoprocessing in that input and output parameters tend to be parent workspaces and not feature types or tables within them.  You frequently write to geodatabases for example, in which case the output parameter of the ETL tool is the geodatabase, not the feature classes and tables themselves, although these are configured in the ETL tool.

 

What if you need to do something before, during, or after your ETL process for which there is a powerful ArcGIS geoprocessing tool available but which would be really hard to do in Workbench?

 

You use high level ArcGIS Python functions to do this work in Workbench.

 

I'll give a simple, powerful example momentarily, but first some Workbench Python tips.

 

Workbench allows you to configure your Python environment; to avoid any clash with Pro's package management just go with the default and use these settings:

 

In Tools>FME Options>Translation check you prefer Pro's Python environment:

 

 

In your Workbench, check your Python Compatibility will use the preference.

 

 

Now you know the ArcGIS Python environment can be used.

 

For my use case I'll provide a real example (attached below) where I need to create and load geodatabase attachments.  We cannot do this entirely in Workbench (except by how I'll show you) because it cannot create the attachments relationship class.  You could do that manually ahead of loading data, but then you still have to turn images into blob data, manage linking identifiers and other things that make your head hurt, so lets use ArcPy.   Manual steps also preclude creating new geodatabases with the ETL tool, which I want to support.

 

The example Workbench writes a feature class destined to have attachments, and a table that can be used to load them.  You can research the processing, but the key element to inspect is the shutdown script run after completion, see Tool Parameters>Scripting>Shutdown Python Script.

 

Here is the embedded script:

 

 

Now this isn't a lot of code, a few imports, accessing the dictionary available in every Workbench session to get output path values used at run-time, then just two lines calling geoprocessing tools as functions to load the attachments.

 

This is a great way to integrate ArcGIS' powerful Python environment in ETL tools.  Sharp eyed people will notice a scripted parameter in the workspace too, it doesn't use ArcPy so I can't claim it as avoiding low level coding, it was just a way to flexibly create a folder for downloading images at run-time.  There are a number of ways to use Python in Workbench, but I would be detracting from my message here that you can start with the simple and powerful one - use ArcGIS geoprocessing where it saves work.  Enjoy!

Last week was the 2019 Esri Partner Conference followed by Developer Summit, events at which we enjoy being challenged by friends from around the world who are using ArcGIS in their work, and also other apps and formats that ArcGIS does not make or manage.

 

One partner from Europe asked how to use GML (Geography Markup Language) files in ArcGIS Pro.  This format is really a category of formats; the underlying XML - as a markup language intends - can be extended, usually to push a data schema down into the protocol.  He had in mind however what we know as Simple Feature GML, which makes the task - well - simpler, but that isn't critical to this discussion.

 

In ArcMap, Data Interoperability extension may be used to both directly read GML files (of the simple feature profile, recognized from a .gml filename extension, even without licensing Data Interoperability extension) or to make an interoperability connection to any supported GML profile, such as the complex INSPIRE themes.  This workflow is not implemented in Pro, partly because WFS services (which are usually GML "in motion") are the most common use case for GML and are natively supported in Pro, and partly because interoperability connections are being re-imagined for a future release of Pro.

 

In ArcGIS Pro, Data Interoperability extension can also be used to convert GML files just like in ArcMap - with the Quick Import geoprocessing tool, or with a Spatial ETL tool, but the partner thought asking everyone to license an extension would be a hurdle.

 

I decided to blog about an implementation pattern that does what was asked for - convert GML to geodatabase features within ArcGIS Pro - but that can also be used to convert any of the hundreds of formats and connections accessible to ArcGIS Data Interoperability.  The GML data originator has access to ArcGIS Enterprise with Data Interoperability extension so the pattern leverages that, but end users with GML files only need ArcGIS Pro and authenticated access to a geoprocessing service.  You can use this pattern to stand up any format conversion you wish at any scale - but I hasten to add it must be a free or cost-recovery-only service if you make it public.

 

Enough talk, how do we do this?  We're going to use ArcMap and Pro in a double act.  Why both?  At time of writing ArcGIS Pro cannot publish web tools containing Spatial ETL tools, so we'll use ArcMap for that step.

 

In the blog download below you'll find a 10.6.1 version toolbox which contains these tools (click any images to enlarge them):

 

 

GML2FGDB is a Spatial ETL tool that converts one or more files of any schema of simple feature GML to a file geodatabase named GML2FGDB.gdb (with overwrite!).  It looks like this if edited:

If you run it as a tool you can see it has a parameter exposed for GML geometry axis order that defaults to 1,2.  If your data is in Y,X order you can set 2,1.  3D data is supported by the 1,2,3 and 2,1,3 values.

 

 

GML2FGDBModel is a model tool that incorprates the script tool ZipGDB to compress the file geodatabase to zip file.  The compression step is necessary because geoprocessing services do not support workspaces (i.e. geodatabases) as output parameters.

 

 

GML2FGDBModel is shared as a geoprocessing service (make it synchronous) which I called GML2FGDBService:

 

 

Lastly, the model tool GML2FGDBService wraps the service and adds the script tool UnZIPGDB for the round trip from a local GML file or files, to the web service that does the translation without requiring Data Interoperability locally, then finally unzips the scratch ZIP file containing a scratch geodatabase into a user-selected destination directory.

 

 

Now GML2FGDBService can be freely used in Pro:

 

 

GML2FGDBService will always output a file geodatabase named scratch.gdb to your output directory, so be careful not to overwrite prior output!

 

 

Now anyone with access to the model tool and service can convert suitable GML files (or any other data if you refactor the Spatial ETL component) to local geodatabase using ArcGIS Pro.  Enjoy!

Hello Interoperators!

 

If you watched the video ArcGIS Data Interoperability In Action you might have some questions on how some of the demos were built.

 

I tend to create Spatial ETL tools with external sources - namely .fmw files - as I can easily drag them into Workbench after opening it from the Analysis ribbon or share them with FME users.  They are attached below to give you the pattern, but they will not all work for you without repair and replacement of credentials.  Contact me if you need help with any of them.

 

To head off one question - how did I create the New York county name pulldown parameter and attribute value mapping in the Simple Powerful ETL demo, see the CountyParameter.fmw workspace.  It scrapes the NY website and writes a CSV file that can be used to import parameter values for the demo tool.  You can see the pattern and recycle it elsewhere.

Formats!

At Pro 2.3 (and classic Desktop 10.7) we have gone from 308 supported formats in Pro 2.2 to 372.  But, since one of the formats is GDAL Generic Raster that embodies more than 100 actual raster formats, the real format count has exploded.

The background story to this has been both a closer relationship to the underlying FME product plus a decision to support raster and imagery workflows.

Transformers!

Again we have made sure to adopt as much (in fact almost all) FME transformers possible.  The main area of difference are transformers related to FME Server interaction.  If anyone has need of those, please drop us a line.

Latest FME Engine!

Pro 2.3 (and 10.7) ships with the FME 2018.1.1.0 engine, and as soon as we can manage it we'll ship the FME 2019 engine.

 

How does this help me?

Shipping all this new functionality means you can enjoy equivalence with your FME workspaces, sharing FMW files with other users knowing things will not break, and building raster workflows into your ETL practice.