I'm coining a new term here - Total Integration - meaning the continuous, automated integration of any data and any ArcGIS service, this concept combines data movement with refreshing the web services the data powers. No manual ETL, no disconnected processes after the ETL, no downtime, just continuous Enterprise Service Bus maintenance the way it should be done, in one process.
It has become trite to say you're "excited" to share details on functionality, but I really do feel using a Notebook as a lightweight scheduler will bring web tool automation to the masses.
My example scenario happens to be about maintaining a portal locator (i.e. geocode service), a very common foundational service, but read between the lines and picture your full suite of services under continuous management no matter how the data powering them changes. Of course, you don't have to be maintaining a service, your web tool may be doing any sort of geoprocessing.
There are a few parts to total integration, the trick is making them work together. The logical steps are:
In my case I use ArcGIS Data Interoperability in the web tool but that's just an artifact of my data source. The lightbulb moments for me were realising I could combine ETL with core geoprocessing to include service refresh in my web tool and also use a hosted ArcGIS Online hosted Notebook as a lightweight scheduler for complete automation.
While I'm usually a no-code guy, there is a little Python involved, but you can relax, I'm sharing the code patterns in the blog download. My blog scenario is at the high end in its usage of ArcPy but only because ArcGIS currently lacks a geoprocessing tool that rebuilds a portal locator (note to self: get one coming), much of the time you might build your web tool using Data Interoperability called within ModelBuilder, so without ArcPy code.
So with some ArcPy and using the ArcGIS Python API in a Notebook we can put it all together. Let's get started!
My source dataset to integrate is the city and county of San Francisco Enterprise Addressing System Addresses with Units. The data changes daily and I want a portal locator refreshed each day before working hours. Here is my Data Interoperability workspace authored in Pro and which does the job interactively:
The ETL tool reads a CSV file at a URL provided by Socrata, writes the features to a scratch geodatabase, performs change detection between the latest data and its previous state (maintained in a portal feature service) then writes the changes both to the feature service and an Edits feature class in the scratch geodatabase. It also emails me a handy summary of new or changed addresses. The tool is in the blog download and requires Pro 3.0+ and Data Interoperability.
But how do changes get pushed to the portal locator by the ETL tool, you ask? By using a shutdown script that calls ArcPy in the geoprocessing scratch environment! Data Interoperability ETL tools are geoprocessing tools and the scratch geodatabase in this case is found by this scripted parameter:
import arcpy
scratchGDB = arcpy.CreateScratchName("xx",".gdb","Workspace",arcpy.env.scratchFolder)
arcpy.AddMessage('Using scratch GDB {}'.format(scratchGDB))
return scratchGDB
The scratch folder is found similarly:
import arcpy
scratchFolder = str(arcpy.env.scratchFolder)
arcpy.AddMessage('Using scratch folder {}'.format(scratchFolder))
return scratchFolder
When shared as a web tool the log shows these messages:
Using scratch GDB C:\Users\arcgis\AppData\Local\Temp\scratch\xx0.gdb
Using scratch folder C:\Users\arcgis\AppData\Local\Temp\scratch
Now the powerful bit, using ArcPy in a shutdown script to recreate the locator service. The locator creation code originated as a copy from a history item after a Create Locator tool run, the rest just from reading the help.
# If there are edits then recreate the portal locator
import arcpy
import datetime
import fme
import os
import pytz
scratchGDB = fme.macroValues['ScratchGDB']
scratchFolder = fme.macroValues['ScratchFolder']
portalURL = fme.macroValues['PortalURL']
serverURL = fme.macroValues['ServerURL']
portalUser = fme.macroValues['PortalUser']
portalPassword = fme.macroValues['PortalPassword']
arcpy.env.workspace = scratchGDB
arcpy.env.overwriteOutput = True
if arcpy.Exists('Edits') and arcpy.Exists('AddressesWithUnits'):
# Create or recreate the locator if there are data changes
arcpy.geocoding.CreateLocator("USA",
r"{}\AddressesWithUnits PointAddress".format(scratchGDB),
"'PointAddress.HOUSE_NUMBER AddressesWithUnits.Full_Address_Number';"+
"'PointAddress.STREET_NAME AddressesWithUnits.Street_Name';"+
"'PointAddress.STREET_SUFFIX_TYPE AddressesWithUnits.Street_Type';"+
"'PointAddress.SUB_ADDRESS_UNIT AddressesWithUnits.Unit_Number';"+
"'PointAddress.NEIGHBORHOOD AddressesWithUnits.Neighborhood';"+
"'PointAddress.CITY AddressesWithUnits.City';"+
"'PointAddress.SUBREGION AddressesWithUnits.County';"+
"'PointAddress.REGION AddressesWithUnits.Region';"+
"'PointAddress.POSTAL AddressesWithUnits.ZIP_Code';"+
"'PointAddress.COUNTRY AddressesWithUnits.Country';",
r"{}\SanFrancisco".format(scratchFolder),
"ENG",None,None,None,"GLOBAL_HIGH")
# Create the SD draft and SD
locator_path = os.path.join(scratchFolder,'SanFrancisco')
sddraft_file = os.path.join(scratchFolder,'SanFrancisco.sddraft')
sd_file = os.path.join(scratchFolder,'SanFrancisco.sd')
service_name = 'SanFrancisco'
pst = pytz.timezone('US/Pacific')
sfNow = datetime.datetime.now(pst)
sfNowStr = sfNow.strftime('%Y:%m:%d %H:%M:%S')
summary = 'Point Address With Units locator for the City of San Francisco updated at {} PST'.format(sfNowStr)
summary += '\nBuilt from the Socrata data source: https://data.sfgov.org/Geographic-Locations-and-Boundaries/Addresses-with-Units-Enterprise-Addressing-System/ramy-di5m'
tags = 'San Francisco,EAS'
analyze_messages = arcpy.CreateGeocodeSDDraft(locator_path,
sddraft_file,
service_name,
copy_data_to_server = True,
summary = summary,
tags = tags,
max_result_size=50,
max_batch_size=1000,
suggested_batch_size=150,
overwrite_existing_service=True)
# Upload the locator
if analyze_messages['errors'] == {}:
arcpy.SignInToPortal(portalURL,portalUser,portalPassword)
arcpy.server.StageService(sddraft_file,sd_file)
arcpy.server.UploadServiceDefinition(sd_file,serverURL)
# Clean up the scratch GDB
try:
arcpy.management.Delete(scratchGDB)
except:
pass
So that's the ETL tool that I published as a web tool. Before moving on to how we orchestrate the web tool I have a couple of tips on authoring and publication.
At this point we have a web tool that does what we want, now we need to orchestrate it on a schedule. For this we use a hosted Notebook in ArcGIS Online. This is the first notebook I ever authored (my no-code background shows) so if I can do it you can too. The web tool runs on my portal but my notebook runs in Online. That is no problem for the notebook, the web tool isn't executing in Online, under the covers the notebook is just authenticating to the portal and sending a submitJob REST request. Online notebooks know how to connect to portal GIS instances. Here is the single-cell notebook (Standard runtime):
How hard is that? Crazy simple. The hardest part was getting a certificate from the IT folks that the portal would accept when hearing from ArcGIS Online (my colleague Renato S. gets the credit for that). All I did was read the help and replace help examples with my paths and names. To find the function name to use after importing the web toolbox I just put the code into IDLE so intellisense could give me the methods.
Then while editing the notebook I went to the Tasks editor and set up my notebook to run weekdays at 6am. Note that the task editor time picker uses local time values from your browser client.
Now I can let the system(s) run and each day my organization has an up-to-date locator maintained in a portal!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.