Select to view content in your preferred language

Metadata - Data Interoperability's Hidden Talent

184
0
Friday
BruceHarold
Esri Regular Contributor
3 0 184

In the Extract Transform and Load (ETL) world you can make the mistake of focusing on data movement at the expense of data usability.  

As data moves and evolves one crucial aspect that often gets overlooked is metadata. Metadata, the information about your data, plays a vital role in ensuring the context, accuracy, and integrity. It also embraces the FAIR data principles (Findable, Accessible, Interoperable, and Reusable).

@BruceHarold and @JordanDuft team up in this post to show how to create and maintain metadata as data moves through ArcGIS.  As the use cases involve ETL, ArcGIS Data Interoperability will be used.

We'll show three use cases for metadata management:

  1. Populating feature class and table metadata in ArcGIS Pro
  2. Populating metadata for a hosted feature layer in ArcGIS Online
  3. Updating metadata for multiple portal items in ArcGIS Online in bulk

Not shown, but discussed near the end of this post:

  • Deep copying data and metadata between environments using metadata.xml

Let's start with an example of Case #1 - feature class & table metadata written at ETL time.

To set us up Bruce imported data from Vancouver's open data web site. The data was brought into an ArcGIS Pro project geodatabase by a spatial ETL tool (ImportBuildingPermits, in the blog download) that reads from Vancouver's open data web site.  The dots on the map are today's permit locations in the feature class BuildingPermits. 

Not only was the building permit data written to a feature class it also had metadata generated by an ETL process. In the screenshot below, we are viewing the metadata for the BuildingPermits feature class in ArcGIS Pro. In the metadata we can see the item information (title, tags, summary, description, credits, use limitations).

Building permits metadataBuilding permits metadata

If you're familiar with writers in spatial ETL tools you'll be aware they don't have much in the way of metadata support.  However, as Data Interoperability is an ArcGIS Pro extension it knows about everything ArcGIS Pro does - including ArcPy's metadata module, and that is where we go for this support!

To back up a little bit, the Workbench app delivered by Data Interoperability has a feature called a shutdown script (there is also a startup script) which lets you run Python code with any installed interpreter on completion of any tool run.  For the subject matter data, we leverage this feature to do two things the built-in writers don't know about:

  • Creating relationship classes between output feature types
  • Creating metadata for each output (yay!)

The only trick to using this feature is knowing that the shutdown script has a built-in dictionary object (fme.macroValues) that has keys for all tool parameters, this includes input and output data paths.  The rest, as they say, is easy!  We built the script starting from snippets from the Copy Python Command option in ArcGIS Pro's History pane.  Here is the shutdown script:

 

#  Create relationship classes between BuildingPermits and child tables PropertyUse and SpecificUseCategory
import arcpy
from datetime import datetime
import fme
import os
import pytz
arcpy.env.overwriteOutput = True
gdb = fme.macroValues['OutputGDB']
arcpy.env.workspace = gdb
origin = "BuildingPermits"
for destination in ["PropertyUse","SpecificUseCategory"]:
    arcpy.management.CreateRelationshipClass(
            origin_table=origin,
            destination_table=destination,
            out_relationship_class=origin + '_' + destination,
            relationship_type="SIMPLE",
            forward_label=destination,
            backward_label=origin,
            message_direction="NONE",
            cardinality="ONE_TO_MANY",
            attributed="NONE",
            origin_primary_key="PermitNumber",
            origin_foreign_key="PermitNumber")
# Create layer level metadata
current_time_utc = datetime.now(pytz.utc)
pst = pytz.timezone('Canada/Pacific')
current_time_pst = current_time_utc.astimezone(pst)
current_time_pst_formatted = current_time_pst.strftime('%Y-%m-%d %H:%M:%S')
descriptions = {"BuildingPermits":"Construction projects and any change of land use or occupancy on private property requiring a building permit",
                "PropertyUse":"General use of property; multiple uses will be accessible in a 1:M lookup",
                "SpecificUseCategory":"Category of property use; multiple categories will be accessible in a 1:M lookup"}
for obj in ["BuildingPermits","PropertyUse","SpecificUseCategory"]:
    new_md = arcpy.metadata.Metadata()
    name = obj.replace("gP","g P").replace("yU","y U").replace("cU","c U").replace("eC","e C")
    new_md.title = f"{name} of Vancouver, Canada 2024."
    new_md.tags = f"Demo,Esri,City of Vancouver,Canada,applications,{obj},{name}"
    new_md.summary = f"Layer includes information of all {name} issued to date by the City of Vancouver in 2024"
    new_md.description = descriptions[obj]
    new_md.credits = f"City of Vancouver, {current_time_pst_formatted} Pacific."
    new_md.accessConstraints = "https://opendata.vancouver.ca/pages/licence/"
    tgt_item_md = arcpy.metadata.Metadata(obj)
    tgt_item_md.copy(new_md)
    tgt_item_md.save()

 

Taking the time to embed metadata automation at ETL time at minimum ensures you don't forget it after the fact and is a best practice.  Why?  To:

  • Ensure that essential information is documented for your data.
  • Help consumers of your data understand the context of your data and how it can be used in future mapping and analysis.
  • Enhance search and discovery of your data- making it easier for people to find.

You'll have noticed in the map and metadata screen grab above that Bruce used the unique value renderer to display the data in categories (type of permitted work) and configured pop-ups.  He then published the layer and related tables to ArcGIS Online - the target information product.

 At publication time the metadata for each layer flows to the sublayers of the hosted feature layer, like this:

MetadataFlows2.png

..and here it is in the BuildingPermits layer:

Sublayer metadata flows throughSublayer metadata flows through

Because metadata is flowing from the data to the published sublayers of the hosted feature layer this saves Bruce the trouble of recreating it.  So that's how to get metadata into your information product at ETL time and why it's important to do so.

However, there is another level of metadata we need - for the hosted feature layer itself, so we'll move on to Case #2 - populating metadata for a hosted feature layer using ETL tools.

Below is the details page of the hosted feature layer in ArcGIS Online:

Item home page showing metadataItem home page showing metadata

Note, the metadata for the hosted feature layer looks amazing!! However, Bruce must confess something… he doesn’t enjoy creating metadata. Therefore, he didn't write a line of the summary, description, tags or terms of use metadata on the item details page. So where did he get metadata? How did he populate the item information on the item details page?  Using magic? (…maybe, depending on how you define magic!)

This metadata was harvested straight from the source - Vancouver's portal.

Below is the landing page, with datasets sorted on popularity (which we may be skewing 😉).  We highlight the dataset we are harvesting and if you look near the bottom we also highlight a link to an Excel spreadsheet with all dataset metadata.

Open Data Landing PageOpen Data Landing Page

Opening the spreadsheet we can see the row for the subject matter dataset - Issued building permits.

Dataset metadata spreadsheetDataset metadata spreadsheet

It is just a matter of getting the cell values for metadata elements from the Excel spreadsheet to the hosted feature layer - no problem - that's the sort of thing this technology was built for.  Here is where that happens, the RefreshBuildingPermits ETL tool (in the blog download) - this is it.

RefreshBuildingPermits ETL toolRefreshBuildingPermits ETL tool

Without making this a tutorial on ETL tool design, the steps we're using for our information product are to:

  • Make an initial file geodatabase incarnation of the data.
    • ...using the ETL tool ImportBuildingPermits that has the shutdown script shown above.
  • In ArcGIS Pro create the layer with symbology and pop-up behavior we want.
  • Publish the feature class and related tables as a hosted feature layer to ArcGIS Online.
  • Maintain the hosted feature layer using the RefreshBuildingPermits ETL tool.

...which means we put the hosted feature layer's metadata update functions in this second ETL tool.  If you look bottom right you'll see a bookmark Update Metadata.  The FeatureWriter transformer handling actual permit data has a summary port, if any data is written a hidden connector (for aesthetic reasons) triggers a FeatureReader which accesses the Excel file on the open data site and gets the right cell contents to an ArcGISOnlineConnector (also works with ArcGIS Enterprise) with the following settings:

Set metadata details with ArcGISOnlineConnectorSet metadata details with ArcGISOnlineConnector

So it's as easy as filling in the form!  If there are no data changes there are no metadata changes, otherwise there may be.  How easy is that?  You now know how to harvest and write metadata using ETL tools!

Spoiler
Note:  The ArcGISOnlineConnector is downloadable manually or will automatically download if you type the transformer name into your Workbench canvas and are connected to the internet.

Lastly, the third leg to the stool.  Case #3, making bulk metadata changes to any number of portal items.

Let's say that we manage an ArcGIS Online or ArcGIS Enterprise organization with hundreds of items and we need to change our terms of use for all hosted feature layers in a content folder.

This is what we have now:

BruceHarold_0-1722976906203.png

This is what we want:

BruceHarold_1-1722977259161.png

This is how to effect the bulk change, it is crazy simple with another ETL tool:

BruceHarold_0-1722977555167.png

It is just an ArcGISOnlineConnector used to list the candidate portal items, a Tester to filter for hosted feature layers, then another ArcGISOnlineConnector to update the target element:

BruceHarold_1-1722977720729.png

If no value is supplied for an item it is left unchanged, to zero it out you would supply a value of NULL.

Now, we mentioned above another ETL pattern of interest is deep copying of data and metadata between ArcGIS environments, for example between ArcGIS Enterprise and ArcGIS Online.  There are core approaches for this use case in which metadata travels with the data, as you would expect, but there are also use cases where you wish to perform edits on the data and/or metadata during the ETL process.  In this situation you might work with the metadata.xml file for each dataset.  This can be accomplished with an option in the Get Details and Set Details actions in the the ArcGISOnlineConnector to get and set the value of metadata.xml.

Move metadata between portal environmentsMove metadata between portal environments

So that's it, managing metadata end to end using ArcGIS Data Interoperability!