|
BLOG
|
A colleague recently asked me how to move a couple of billion records to GeoEvent's spatiotemporal big data store (STBDS) at a customer site, using ArcGIS Data Interoperability. These are archived vehicle positions for a utility company, stored in an Oracle geodatabase. A couple of million new events are coming in daily. The archived data is in 70+ tables. Because a feature service in the STBDS has the same REST API as an ordinary hosted feature service, the plan going in was to leverage the pattern described in my earlier blog, namely to ETL the archive data into portal shapefile items then use the target layer's Append REST endpoint to asynchronously load the data. An additional wrinkle was to parallelize the ETL as multiple concurrent jobs. This approach would reduce the risk of network outages while streaming small transactions (1000 features is the default batch size) to the portal. It turned out the customer environment wasn't at a release that supported the workflow and we did go with streaming, but it got me thinking of how to ETL big data to the new wave of cloud warehouses which you can use natively, and in ArcGIS Pro 2.9. My basic message is that you can do these lift and shift jobs by moving big datasets as files, in multiple concurrent jobs, thus maximizing throughput and minimizing transport risk. Lets see how. Cloud warehouses like Snowflake, Big Query and Redshift can be queried and read in ArcGIS Pro 2.9, but not written-to using out of the box tools. ArcGIS Data Interoperability can however write to these warehouses, including spatial data, but the default mode is streaming, which might not scale how you need. I'm going to show you a pattern you can use across all three warehouses: ETL whole datasets, including spatial data, as Apache Parquet or other format like CSV Geometry is encoded in a character-based standard format Automate the ETL in multiple concurrent processes Don't write any code beyond any SQL or macro commands required by the target environment I'll also throw a bone to any coders lurking in my no-code blog space, see below 😉, i.e. some Python tips on creating Parquet files (Note: Parquet files are a supported item type in ArcGIS Online from September 22nd 2021. Share some!) Billions of features are in scope with this pattern but I'm using a more modest data-set for demonstration purposes, only 2.3 million point features. 2.3 Million Point Features My data is in 12 feature classes, you can have any number. The pattern I will show works with data split into separate parts that can be processed concurrently. If your data is monolithic then either split it yourself spatially (oriented fishnet anyone?) or by adding a field signifying a batch identifier populated by row position - you can drop the field during processing. I mentioned Snowflake, Big Query and Redshift warehouses. In all cases you can stage Parquet files where the target environment can see them and then load from the Parquet files. For spatial data, the Parquet files will need geometry encoded in a format understood by the target environment (Snowflake and Big Query support GeoJSON and WKT, Redshift supports WKT). I will only provide a worked example with GeoJSON going to Snowflake. My demo data is point geometry and the field I use to store the GeoJSON has a width of 100, if you are using polyline or polygon data you should investigate how wide your most point rich features are when encoding the field. For example I selected a very point rich polygon in a layer and as GeoJSON it is 2,071,156 characters: with arcpy.da.SearchCursor('NZ Property Titles','shape@') as cursor:
for row in cursor:
print(len(str(row[0].__geo_interface__)))
2071156 Note that Data Interoperability can control the decimal precision used by GeoJSON; for geographic data a value of 7 is reasonable, the same polygon then uses 1,274,064 characters. For example the first coordinate goes from (172.90677540000001,-41.12752416699993) to (172.9067754,-41.1275242). Remember every byte counts! Note: For Big Query, Data Interoperability has a GoogleBigQueryConnector hub transformer that can load CSV to tables. This may be simpler than sending Parquet and using the bq command environment to load data, I have not investigated the scenario. Let's dig into my particular workflow. The secret sauce is to create two ETL tools, the first marshals the jobs and calls a WorkspaceRunner transformer that calls the second tool, which does the work. It is very simple, here is LoadManager.fmw, it takes a list of arguments, in my case feature class names in a geodatabase: LoadManager WorkspaceRunner starts up to 7 FME processes that run a target tool until the job queue is consumed. Processing is likely to be CPU bound while worker processes extract, encode and upload the dataset files. I allowed each process to run two jobs which ate my incoming datasets in 6 processes. WorkspaceRunner Here is LoadWorkerParquet.fmw. LoadWorkerParquet It also is a simple tool, it reads geodatabase, writes a local parquet file then sends the parquet file to Snowflake where the data is copied into a table. I'll let you inspect the SQLExecutor yourselves but as an aid to understanding, after variable substitution here is what a statement looks like (to Snowflake): create or replace file format Canterburyparquet_format
type = 'parquet';
create or replace temporary stage stageCanterbury
file_format = Canterburyparquet_format;
put file://C:\Work\Parquet\TitlesCanterbury.parquet @stageCanterbury;
copy into "INTEROPERABILITY"."PUBLIC"."Titles"
from (select
$1:id::number,
$1:title_no::varchar,
$1:status::varchar,
$1:type::varchar,
$1:land_district::varchar,
$1:issue_date::timestamp,
$1:guarantee_status::varchar,
$1:estate_description::varchar,
$1:number_owners::varchar,
$1:spatial_extents_shared::varchar,
to_geography($1:geom::varchar)
from @stageCanterbury); Once the parquet file gets to Snowflake the ingest is blazing fast. By the way, I learned to do this by reading the help, I'm no Snowflake DBA. What performance should you expect? At writing I'm trapped at home like the rest of us but on my home WiFi I get 2.3 million features loaded to Snowflake in 6 minutes, so with a decent computer and wired network I think conservatively 25 million point features per hour. Of course for a production environment and a really big job you could use multiple computers, certainly the target cloud warehouses will scale to take the throughput. In the blog download you'll notice I include a second worker tool, LoadWorker.fmw, this was for me to compare performance of the usual way to write to Snowflake with 100K features per transaction, it was way slower. Now back in core Pro 2.9 my data is loaded to Snowflake and I can throw queries at it and enjoy the scaled compute experience. Snowflake in Catalog Pane I mentioned a Python option for creating parquet files, it is in the blog download but here it is too: # Pro 2.9+ example creation of a parquet file from a feature class
# Geometry is encoded as GeoJSON in a field 'geom'
import arcpy
import pyarrow.parquet as pq
arcpy.env.overwriteOutput = True
# Source feature class
Canterbury = r"C:\Work\Parquet\Parquet.gdb\Canterbury"
# Create in-memory feature class in WGS84
with arcpy.EnvManager(outputCoordinateSystem='GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]]',
geographicTransformations="NZGD_2000_To_WGS_1984_1"):
arcpy.conversion.ExportFeatures(
in_features=Canterbury,
out_features=r"memory\Canterbury")
# Add the geom field (non-point geometry will require a wider field
Canterbury = arcpy.management.AddField("memory\Canterbury","geom","TEXT",None,None,100,'',"NULLABLE","NON_REQUIRED",'').getOutput(0)
# Derive GeoJSON
with arcpy.da.UpdateCursor(Canterbury,['shape@','geom']) as cursor:
for row in cursor:
row[1] = str(row[0].__geo_interface__)
cursor.updateRow(row)
# Drop geometry by creating a Table
esriTable = arcpy.conversion.TableToTable(Canterbury,"memory","CanterburyTable").getOutput(0)
# Make arrow table
arrowTable = arcpy.da.TableToArrowTable(esriTable)
# Write parquet
pq.write_table(arrowTable,r'C:\Work\Parquet\TitlesCanterbury.parquet',
version='1.0',
compression='SNAPPY') Have fun moving that data around at scale!
... View more
09-17-2021
11:28 AM
|
2
|
0
|
3866
|
|
BLOG
|
"You'll need to write a connector". That's what people used to say when a need arose to build an integration between ArcGIS and another system. Nowadays however, the information technology landscape has matured and how apps communicate with each other has centered on a handful of patterns that everyone (who wants to stay relevant) uses. ArcGIS Data Interoperability walks and talks in this space. A clear winner is called REST, and since I'm not a computer scientist I will not go into what you can read for yourself, all I care about is that pretty much everything on the web can send and receive data in a way I can easily use (mostly JSON, with some holdouts still using XML and a few stretching usability with protocol buffer payloads but I'll keep an eye on them for you, I think that will pass 😉.) with ArcGIS Data Interoperability. I need to add a graphic before the search engines get bored with me and I don't get indexed. Here is a workspace (StubOutAPIJSONReading.fmw in the post download) I used to initially explore an API (details to follow): Exploring an API My sample API is published by the good folks at Clarity Movement. Clarity was founded in 2014 to tackle the global air pollution crisis and now provides cost-effective, scalable, and reliable air quality monitoring to customers in more than 60 countries around the world. Clarity's solution enables governments and communities to collect higher-resolution air quality data by supplementing existing regulatory monitors with dense networks of continuously calibrated air quality sensors. I like Clarity's Measurements endpoint as a good example of the most common pattern you will encounter, namely handling JSON data returned from an HTTP call. Before you panic about things like transport protocols and JSON parsng, relax! Data Interoperability handles it all for you, which is just as well because otherwise you would have to read stuff like this: {
"_id": "6137a995cd42fd51fcda7083",
"device": "609935dcc9348052e0c5d917",
"deviceCode": "AY989QV6",
"time": "2021-09-07T17:00:00.000Z",
"location": {
"coordinates": [
-120.90439519586954,
36.01958888490562
],
"type": "Point"
},
"recId": "averaged:AY989QV6:hour:2021-09-07T18:00:00",
"characteristics": {
"relHumid": {
"value": 30.428397178649904,
"weight": 4,
"raw": 30.428397178649904
},
"temperature": {
"value": 30.882064819335939,
"weight": 4,
"raw": 30.882064819335939
},
"pm2_5ConcNum": {
"value": 24.072547912597658,
"weight": 4,
"raw": 24.072547912597658
},
"pm2_5ConcMass": {
"value": 13.646130166709775,
"weight": 4,
"raw": 21.126213550567628,
"calibratedValue": 13.646130166709775,
"epaNowCast": 15.097605519225573
},
"pm1ConcNum": {
"value": 22.670581817626954,
"weight": 4,
"raw": 22.670581817626954
},
"pm1ConcMass": {
"value": 12.186892986297608,
"weight": 4,
"raw": 12.186892986297608
},
"pm10ConcNum": {
"value": 24.33531427383423,
"weight": 4,
"raw": 24.33531427383423
},
"pm10ConcMass": {
"value": 28.42475652694702,
"weight": 4,
"raw": 28.42475652694702
},
"no2Conc": {
"value": -6.413380280137062,
"weight": 4,
"raw": -6.413380280137062
},
"pm2_5ConcMass_24HourRollingMean": {
"value": 15.003665288163619,
"weight": 88,
"raw": 27.317325342785229,
"calibratedValue": 15.003665288163619
},
"pm2_5ConcNum_24HourRollingMean": {
"value": 26.923120065168903,
"weight": 88,
"raw": 26.923120065168903
},
"pm10ConcMass_24HourRollingMean": {
"value": 37.67677915096283,
"weight": 88,
"raw": 37.67677915096283
},
"pm10ConcNum_24HourRollingMean": {
"value": 27.273549459197306,
"weight": 88,
"raw": 27.273549459197306
}
},
"average": "hour"
} The job is to turn this data into a hosted feature service, and automate service maintenance. A boon of having a GIS background is you likely know some Python, and JSON payloads like the above look just like Python dictionaries and lists. You only need to learn a couple of tricks before you can tackle pretty much any REST API. Trick #1: Using HTTPCaller. HTTPCaller HTTPCaller lets you simply fill in a form to make a web call and receive a response (JSON in this case). First you must read your API doc and determine the required parameters, plus any optional ones you want, and whether to use a GET or POST method. GET is usually for shorter URLs, POST supports longer URLs and also uploads supplied in a Body. Another aspect is authentication. Many APIs require an API key (like you see here) or a token. Typically keys do not expire and tokens do, plus tokens require a generation step (which may be done separately via HTTP or via OAuth2 in a web connection you set up for your account). My example requires a key. My call requests a JSON array of 10 air quality measurements and returns it in an attribute named _response_body. Trick #2: Unpacking JSON I'm headed to ingesting arbitrary numbers of air quality measurements but first I must figure out how to extract data from the JSON. I requested 10 air quality measurements so the returned JSON will be an array. Each measurement in the array will have the same schema used by the fields of my features - the above JSON is a measurement feature. I need to figure out extracting field values. The simplest way to do this is write one array feature's JSON to a file and use the data-aware capability of JSONExtractor to let me build queries without coding any JSON queries. If I temporarily set the JSONExtractor to read from a JSON file then I get a handy-dandy picker to build my JSON queries. Then I can copy-paste the JSONExtractor into a production tool, set it to read from the incoming feature's JSON attribute it will extract and expose all the available field values. How easy is that! Here is my JSONExtractor while I'm reading from a file and I'm building my queries: JSONExtractor Now its all the way down hill to creating and maintaining my feature service. Clarity2GDB.fmw uses the JSON exploration done above to write out a feature class in my Pro project's home geodatabase - there is some renaming of attributes done, and of course they get their desired types set when writing. Clarity2GDB Then after creating my feature service from the geodatabase feature class I can recycle the work to maintain the feature service with RefreshClarityService.fmw, which only differs at the writer step. You'll notice only changed features are written to the target service, the workspace has a parameter that sets the history interval for my features, features in the service older than that get aged out. RefreshClarityService This final tool can be scheduled or run on demand. That's it! I have conquered the complexities of web integration and have a feature service I can use to power my maps, apps and dashboards. Feature Service The blog download has the ETL tools I used, less a functional API key, please contact Clarity if you want to test and implement an integration.
... View more
09-08-2021
10:07 AM
|
2
|
0
|
2308
|
|
POST
|
Hello everyone, here is the recording of our session at the 2021 Esri User Conference, please comment here with any questions.
... View more
08-30-2021
06:59 AM
|
1
|
0
|
1142
|
|
BLOG
|
This blog describes Pro 2.9 (unreleased at writing) functionality - ask your Esri representative about ArcGIS Knowledge! Some things in information technology seem to be perennial examples of delivering data, not information, and if you're going to be data-driven it's information that you want. My case in point here is visualizing and analyzing relationships amongst georelational datasets, a key space ArcGIS inhabits and something I have always struggled with beyond the basics. You can apply joins and relates to map layers but these seem to run out of steam pretty fast in terms of power and usability, like how to visualize cardinality and how to do performant queries. Plus, if your datasets are from different sources it gets even harder. I have taken detours into coded approaches but learned they don't scale. ArcGIS Data Interoperability and ArcGIS Knowledge to the rescue! How come that pairing? Well Data Interoperability at Pro 2.8+ includes the Tech Preview version of the Esri Knowledge graph database reader/writer and solves the 'all-source' problem for building and maintaining Knowledge graph databases. Not only is the reader/writer flexible, it is also fast. At writing, Knowledge is still under construction and I'm using alpha Pro 2.9 software, but the topic is such a fit for being data-driven I couldn't resist. Here are some graphics from my ETL work to populate a graph, the workspaces are in the post download. The graph I'm building is property data, the nodes are centroids of cadastral titles plus other entities for owners and encumbrances (usually leases and mortgages), with relationships like 'owns' and 'encumbers'. Loading Entities (Nodes) Loading Relationships Source Property Nodes There are over 10 million entities and over 10 million relationships in the graph. I simplified my data model a little to ignore some legal details (there is a thing called an estate which allows more complex relationships between titles and owners) to give me with a graph where property title points (the blue dots) have one or more ownership shares over them and ownership shares have zero or more encumbrances. Title points are obviously spatial, owners and encumbrances are tabular. Here are some feature counts: Data Model You'll see the data load was in two parts, first entities then relationships. This is because entity relationships are made using automatically generated GlobalID fields, so entities have to be created first, you'll get the idea from the workspaces. Entity GlobalIDs become relationship origin and destination GlobalIDs. Graphs live in an Enterprise Portal, I'm using Enterprise 10.9.1/Pro 2.9 as my portal and client. There are innumerable queries you might make of your graph, this is facilitated interactively using either a thing called a Link Chart or using the Cypher query language. First a simple link chart. My data isn't really the type where connections will be freshly discovered interactively via link chart exploration, all relationships are already known, but you can select and add entities to a link chart to investigate your data. This is my first foray into Knowledge so I'm keeping it simple. Here is who owns some titles somewhere: Basic Link Chart I didn't use the interactive tools to build the chart entities, I used a Cypher query: match (ee:Encumbrancee {name:'Her Majesty The Queen'})-[oe:owns_encumbrance]-(e:Encumbrance)-[he:has_encumbrance]-(t:Title {land_district:'Otago'}) return ee,oe,e,he,t limit 5 This found 5 titles in a specific land district encumbered by a single encumbrancee. I'll leave link charts behind at this point, but they come with tools to populate them and are a great way to explore connections. There are bigger patterns to discover! For example where are titles encumbered a lot? Unencumbered (green) and Encumbered (red) titles If I was doing this for real I might join demographic variables to my title points before loading them into my graph, which would let me analyze population segments. There are interesting things to learn without studying demographics. An advantage of graph databases is they are fast to query for aggregate statistics compared to equivalent SQL statements, for example lets look at the distribution of encumbrancee (financial institution or lessor) market share. Encumbrance Holdings By Institution This result dropped out of my graph in a few seconds using the query you can see in the control: match (e:Encumbrance) where e.name is not null return e.name, count(*) as book order by book desc The encumbrance holder data has a long tail, lets say we are interested in the big, commercial lenders who I'll say have 10,000 or more encumbrances and are companies. Big Lenders That summary is over the entire dataset, you'll notice three institutions are neck and neck in the market, then market share drops off quickly and there are 12 who make my cut. Is there anything different about my study area? I made a query to find out: match (e:Encumbrance) with e.name as lender , count(*) as book where book > 10000 and lender contains 'Limited' with collect(lender) as biglenders match (t:Title {land_district:'Otago'})-[:has_encumbrance]-(e:Encumbrance) where e.name in biglenders return t, e Here is the map and a chart: The Lending Landscape I might be able to make a case for hot spots where some institutions are doing better than others but its the chart that is interesting, the top four institutions' holdings are not following the national distribution. We can look into encumbered holdings in a study area: match (o:Owner)-[:has_owner]-(t:Title {land_district:'Otago'})-[:has_encumbrance]-(e:Encumbrance) where e.name contains 'Limited' with o.prime_other_names + ' ' + o.prime_surname as owner, o.corporate_name as company, count(*) as holdings return owner, company,holdings order by holdings desc Holdings Or a related query, which non-farm titles are encumbered by the big lenders? match (e:Encumbrance) with e.name as lender , count(*) as book where book > 10000 and lender contains 'Limited' with collect(lender) as biglenders match (t:Title {land_district:'Otago'})-[:has_encumbrance]-(e:Encumbrance) where e.name in biglenders with t,e match (o:Owner)-[:has_owner]-(t)-[:has_encumbrance]-(e) where not (o.corporate_name contains 'Farm' or o.corporate_name contains 'Pasture') return o,t,e I hasten to add this still caught a lot of farms as the data model doesn't really support land-use classification queries but of course if I have land-use areas I could do an overlay of title points beforehand and do it for real. Non-farm Encumbrances by Bank So while at writing I'm ahead of the required software release I hope this gives you an idea of the art of the possible with ArcGIS Knowledge to make querying complex relationships in big data simple and fast. This was my first foray into Knowledge and I learned a lot, including the basics of the OpenCypher graph query language you see above. As I say in the spoiler alert, reach out to your Esri representative about release plans (Pro 2.9) and if you're really keen the Early Adopter Community at Pro 2.8.
... View more
08-24-2021
02:27 PM
|
1
|
3
|
3041
|
|
POST
|
Yes provided you have Data Interoperability extension licensed on the server, and on your desktop (authoring) machine - which is hopefully Pro. The canonical way is to embed your fmw source in your ETL tool, run it and publish the history item. The workflows are a little different depending on whether there is a portal present, but its the same as any web tool in either case. You must take care with parameters, they have to be supported by geoprocessing or cast to ones that are using Python snippets in a Model you wrap your ETL tool in. To be clear you don't have to use a Model, only when an input or output needs to be cast. Email bharold@esri.com if you need more tips.
... View more
08-23-2021
01:29 PM
|
1
|
0
|
2131
|
|
BLOG
|
If you are sharing hosted feature services, optionally with child services like Map Tile, WFS and OGC services, and your source data changes regularly, you will want to automate refreshing service data without breaking item identifier or metadata elements so your customers' maps and apps keep working. This blog shows how - using ArcGIS Data Interoperability. First the obligatory map of our study area and data, street addresses in Norway: Addresses in Oslo Just to be clear I'll restate the scenario. You are sharing a hosted feature service which may be large You may be sharing services published from the hosted feature service The source data might not be managed in ArcGIS The source data changes regularly and you want to apply the changes to the service(s) You don't want Portal or Online item identifiers to change when you refresh the data You want to automate this maintenance with minimal downtime You don't want to write any code It is well understood that Data Interoperability can detect dataset changes and apply them to a publication copy of the data. This works well and allows for zero downtime if you write changes incrementally to a feature service. However, when you are dealing with millions of features this can be time consuming, both to read the original and revised datasets and to write the change transactions. It also risks encountering network issues during very long transactions. We need an option that just replaces service data efficiently and quickly. This can be done by maintaining a file geodatabase copy of the source data on your Portal or Online and replacing service data using a truncate and append workflow. Here is the Data Interoperability tool that shows the pattern. Workbench The blue bookmark is whatever ETL you need to get your data into final shape, in my case I'm downloading some data, doing some de-duplication and make a few tweaks to fields. The pale green bookmark is where I write the data to a zipped file geodatabase and overwrite a file geodatabase item in Online. The tan bookmark is where I truncate my target feature layer. The brighter green bookmark is the final step where I call the append function that reads from the file geodatabase item and writes into the target feature layer. All very simple isn't it! For roughly 2.7M street address features and from my home network the whole job, including waiting for the asynchronous append operation to complete, takes about an hour. But wait there's more! I published Map Tile, WFS and OGC services from my target feature service, here is everything in my Online project folder: Services The caches for each child service take a few extra minutes to refresh but the process is automatic (Vector Tile services need a manual cache rebuild from the item settings page). If I was doing this for a production environment I would move processing to an Enterprise server like this earlier blog describes and schedule the task to run overnight at an appropriate interval. There you have it, automated, efficient bulk refresh of hosted feature services and their derived products. The tool I'm describing is in the blog download, have fun! Note for Enterprise users: Currently ArcGIS Enterprise does not support file geodatabase (filegdb) as an append format, you must use shapefile (or Excel or CSV if working with tables). To append shapefiles it is likely you will need to specify a fieldMappings dictionary in the layerMappings parameter to map the field names in your shapefile to the target feature service.
... View more
08-05-2021
01:41 PM
|
3
|
0
|
1937
|
|
BLOG
|
At the 2021 Esri User Conference a very popular 'track' was ArcGIS Field Maps, if you missed the sessions you can catch up here. Of course there are rich automation capabilities in products like Workflow Manager and solutions from Esri partners but I'm going to show you how to create your own no-code automation that connects external systems with Field Maps using ArcGIS Data Interoperability. The scenario I have in mind is you have an external app that emits data events and you need to send field jobs for each (or selected) event(s). My sample takes 311 service calls coming from a Socrata site that updates hourly, but another source I hear of regularly is Salesforce. There are a large number of web apps that are in scope for Data Interoperability. First the obligatory screen grabs of live action on my phone showing job receipt and field map response: An incoming job in Microsoft Teams Mobile: Field Maps Mobile opened from the map link: Now I won't repeat the details here but using the pattern I outlined in this preceding post you would set up the automation of job orchestration from your Pro or (preferably) Enterprise environment to run on a schedule. There are two prerequisites needed for the automation: A Field Map with an editable target hosted feature service. A Spatial ETL Tool that processes the incoming events and sends jobs. I recycled the Socrata feed from the post I refer to above for my field map data. The web map having this layer I configured to refresh on a 1 minute interval as we could be working in near real time in the field. The ETL tool is in the blog download but here is a screen shot: Workbench The tool reads the 311 feed (you might be reading anything) and synchronizes the state of the data with a hosted feature service. Then inserted (i.e. new) features each trigger a post to a Microsoft Teams channel webhook (you will have your own business logic on which data events trigger which field map response). I include an email option too, which is disabled in the image above. You might use a number of messaging options available in Data Interoperability. Everything hinges on constructing a map link according to the rules in this topic at the field map links entry. Full disclosure, this is my first excursion into ArcGIS Field Maps (this may be apparent to veterans). I'll let you read the how-to topic on field map construction and configuration. However, I think I have shown a simple, no-code way to automate field job orchestration which may help you when working with external systems that can be leveraged using ArcGIS Data Interoperability.
... View more
07-30-2021
01:27 PM
|
1
|
1
|
1407
|
|
BLOG
|
Distributed collaboration is a feature of Enterprise and Online that enables the sharing of edits to feature services and file items. You may be like me and took in a recent session at Esri's User Conference, which got me thinking about expanding the net of environments which can share data updates the same way - namely automatically, on a schedule, between Portals and Online plus many more environments. Data like these 311 service requests in Dallas, Texas which update hourly: The same business drivers discussed in the UC session that indicate using ArcGIS' distributed collaboration may also apply for data which isn't in a Portal or Online, particularly if data isn't shared to or from the ArcGIS system. You can still implement automated collaboration using ArcGIS Data Interoperability. You get a no-code experience just like distributed collaboration. I'm going to show a pattern using Data Interoperability extension with processes authored in Pro and using Enterprise as the compute and scheduling resource. If you have a Pro machine with high uptime you could use just the Pro machine. What is in scope for automated collaboration? Hundreds of formats and systems. What data velocity is reasonable to automate? Like core distributed collaboration, this approach isn't suited for strictly real-time or very large edit-feature counts. Event driven real-time integrations are better handled with webhooks (see my earlier post) and big data editing should be centralized outright in one organizational group. My example reaches into a Socrata instance once an hour and synchronizes a few thousand features. I would be comfortable with a synchronizing a million features once a day. The goal here is sharing data without downtime and giving your audience a persistent connection to use in maps and apps, this translates into portal or Online items that retain their item identifier, with the underlying data being efficiently replaced. Here is a file geodatabase item plus a hosted feature service item that are both simultaneously refreshed each hour. How? Here is the processing workspace: The blog download has this workspace and another that I used to create data for the target items. I'll walk you through this one (RefreshDallas311). The stuff on the left culminating in the green bookmark reads the 311 feed from the web and writes a zipped file geodatabase which is then used to overwrite the file geodatabase item. A star of the show here is the ArcGISOnlineConnector (which also works with Portals). The other star of the show is on the right, the ChangeDetector, which calculates which features are new, changed, unchanged or deleted and sends them to the feature service writer with an appropriate fme_db_operation format attribute to set the transaction type per record (ObjectIDs are joined for updates and deletes, if the feature service is edit tracked or already in a distributed collaboration you would join GlobalID). Processing is simple! Now lets cover how I got the process onto my server and scheduled. I'm not publishing a web tool, I scheduled an executable on the server, namely C:\Program Files\ESRI\Data Interoperability\Data Interoperability AO11\fme.exe with RefreshDallas311.fmw as the argument to the exe. The process will require any credentials used to be copied to the server and any local paths to be correct. In this case the workspace also requires a reader (Socrata) and transformer (ArcGISOnlineConnector) be installed from FME Hub. Firstly, as the arcgis service account owner (i.e. a user that exists) on the server I created a folder somewhere for this stuff to live, C:\Users\arcgis\Desktop\DistributedUnlimited. I created a shortcut to "C:\Program Files\ESRI\Data Interoperability\Data Interoperability AO11\fmeworkbench.exe" on the desktop so I could start Workbench conveniently. Credentials I needed to copy are web connections for my accounts at ArcGIS Online, Socrata and GMail. To copy credentials you open the Workbench app from the Analysis ribbon in Pro and then the Tools>FME Options control, go to the Web Connections tab and access the grid of available connections, mine are: Right click on each one needed to be migrated and export to an XML file. I copied them to the folder created on the server. Then I copied RefreshDallas311.fmw into the server folder. I opened Workbench (no workspace) and imported the XML file web connections, manually checking they work. I then opened RefreshDallas311.fmw, the Socrata and ArcGISOnlineConnector packages auto-installed. I checked the FeatureWriter local path to the zipped file geodatabase is valid, namely in the server folder. I ran the workspace, I had to change my Python compatibility to Python 3.7+. At the top of the log file was: Command-line to run this workspace: "C:\Program Files\ESRI\Data Interoperability\Data Interoperability AO11\fme.exe" C:\Users\arcgis\Desktop\DistributedUnlimited\RefreshDallas311.fmw Now, for the arcgis user, I created a scheduled task using the command and argument, running hourly. I'm done, I have automated a distributed, unlimited collaboration! The blog download has my ETL tool (Pro 2.8). My server is Enterprise 10.9. Don't forget you need ArcGIS Data Interoperability installed and licensed at both ends! Thanks to Dallas, TX.
... View more
07-21-2021
12:40 PM
|
1
|
0
|
2074
|
|
IDEA
|
Kelly, Data Interoperability extension can detect geometry duplicates, it is however an additional license. If you try an evaluation copy I can help with showing you how.
... View more
07-14-2021
12:20 PM
|
0
|
0
|
4140
|
|
BLOG
|
Hi, usually this sort of issue is fixed by re-authenticating the web connection, you can do this by going to the Web Connections pane in FME Options and right clicking on the connection, a menu option will be available.
... View more
07-08-2021
02:54 PM
|
1
|
0
|
9506
|
|
BLOG
|
ArcGIS Online hosted feature services are foundational. If changes in feature service data require a response you don't have to step outside the ArcGIS system to automate it. This blog is about using feature service webhooks to trigger geoprocessing service jobs that operate on the change set to do anything you want. It is conceptually simple, here is the Model in the blog download that defines my geoprocessing service: The geoprocessing service (also known as a 'web tool') receives the webhook JSON payload, extracts the changesUrl object, then gets and operates on the change data using a Spatial ETL tool built with ArcGIS Data Interoperability. You could use a Python script tool too, but then you'll have to write code, and Data Interoperability is no-code technology, so start from here if you want to get going quickly 😉. I used an 'old friend' data source for my feature service, public transport vehicle positions in Auckland, New Zealand, available on a 30 second interval: The data is quite 'hot' - no shortage of trigger events! - which was handy for writing this blog. Your data might not be so busy. Anyway, I initially defined a webhook on the feature service that sent the payload to webhook.site so I could see what a payload looks like. Here is one: [
{
"name": "WebhookSite",
"layerId": 0,
"orgId": "FQD0rKU8X5sAQfh8",
"serviceName": "VehiclePosition",
"lastUpdatedTime": 1625684942414,
"changesUrl": "https%3a%2f%2fservices.arcgis.com%2fFQD0rKU8X5sAQfh8%2fArcGIS%2frest%2fservices%2fVehiclePosition%2fFeatureServer%2fextractChanges%3fserverGens%3d%5b2322232%2c2322500%5d%26async%3dtrue%26returnDeletes%3dfalse%26returnAttachments%3dfalse",
"events": [
"FeaturesCreated",
"FeaturesUpdated"
]
}
] You'll see in my Model I used a Calculate Value model tool to extract and decode the percent-encoded changesUrl object. In theory this could easily be done in the Spatial ETL tool and I wouldn't need the Model at all, the Spatial ETL tool could be published standalone, but I ran into issues so fell back to a Python snippet (i.e. Calculate Value). I'm breaking my no-code paradigm story aren't I, but its the only code in this show and you don't have to write it for yourselves now! I'll get to the saga of payload processing but first a word on the form of webhook you'll need for your feature service. Here is mine: Firstly notice the HookURL goes via HTTP. ArcGIS Online feature services are hosted on arcgis.com while my geoprocessing service is on amazonaws.com. I'm not an IT administrator and didn't know how to set up a certificate that would support HTTPS trust between the domains, so I enabled HTTP on my server. Payloads contain no credentials and all downstream processing uses HTTPS to access change data, if you have questions (or advice!) then please comment in the blog. Secondly notice the Content Type is application/x-www-form-urlencoded. If you use application/json your geoprocessing service will not receive the payload. Now about processing, here is the ProcessPayload tool: Webhook change data is accessed (by default) in a secure fashion that is asynchronous. The changesUrl is used to start an extraction job and returns a statusUrl. The statusUrl is used to request a resultsUrl which can be used (when it has a value) to return a response body that contains the change data. This requires looping, which is done in the ArcGISOnlineWebhookDataGetter custom transformer (available on FME Hub). This transformer will download as a linked transformer by default, make sure you embed it into your workspace (a right click option). I edited my copy to check for job completion on 10 second intervals and give up after 10 retries, my data usually arrived on the second retry so that worked for me. The change data comes out of the ArcGISOnlineWebhookDataGetter as a big ugly JSON object which is hard to unpack, so I avoid the whole issue by reading the change data directly out of the source feature service using queries on ObjectID I get from the change data response. Sneaky, lazy, but effective, and best of all you can recycle this approach for any feature service! When the change data is read you can perform whatever integration you want to do with it. In my case I just write OGC GeoPackages of the Adds and Updates features and email them to myself, so not a real integration, but you get the idea. For the record, the 'rules' for publishing Spatial ETL tools as geoprocessing services are that the FME workspace must be entirely embedded, you must export any web connection or database connection credentials used by the tools to the arcgis service owner's account on each processing server (get an XML export file by right clicking on the connection in the FME Options control, as the arcgis user on the server start Workbench from fmeworkbench.exe (AO11 version for portal machines) then import the XML file) and most importantly before you start make sure Data Interoperability is installed and licensed! For my purposes I used a server that was not federated to a portal, but you can go either way. All this was created in Pro 2.8.1 and Enterprise 10.9. The blog download has my web tool and Model, let me know how you get on out there. Have fun!
... View more
07-08-2021
12:45 PM
|
5
|
0
|
2536
|
|
POST
|
All, FYI creating a feature service from a Snowflake connection is under consideration.
... View more
07-06-2021
08:55 AM
|
1
|
2
|
2033
|
|
POST
|
Workbench starting is a good test for licensing working.
... View more
06-10-2021
07:10 AM
|
0
|
0
|
727
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | 10-03-2025 05:45 AM | |
| 1 | 11-21-2025 05:34 AM | |
| 2 | 10-06-2025 05:36 AM | |
| 1 | 11-03-2025 05:14 AM | |
| 3 | 11-04-2025 08:41 AM |
| Online Status |
Offline
|
| Date Last Visited |
14 hours ago
|