|
BLOG
|
Hi Keiran
I haven't tried this and don't have a graph handy to try but what you could try is to maintain a field of type GUID in your graph that is separate to the system managed GlobalID. I'll alert the graph team to your enquiry, and we'll get an answer for you.
... View more
12-06-2024
05:50 AM
|
1
|
0
|
307
|
|
POST
|
Hello Andrew You have run into a known issue, we will patch it as soon as possible. There is a workaround. If you run your tool from the Workbench app, near the top of the translation log you will find a message containing the console command to run fme.exe with your tool path as an argument, plus any other parameters (but see below*). If you use Windows Task Scheduler to configure a task with the code it will work. *If any input parameters do not change at run time (which is typical for scheduled tasks) then do not publish the parameter and it can be omitted from the command arguments. Another option is to publish your tool as a web tool and schedule that, this requires Enterprise 11.4. See this example: https://community.esri.com/t5/arcgis-data-interoperability-blog/etl-pattern-scheduling-web-tools/ba-p/1546641 Our apologies for the issue. It will either be patched in Pro 3.4 or fixed in Pro 3.5 - we are working on that.
... View more
11-19-2024
11:45 AM
|
0
|
5
|
1813
|
|
BLOG
|
Often, ETL is not one-off, recurrence is needed to incorporate data change over time. Until the ArcGIS Pro 3.4 and ArcGIS Enterprise 11.4 releases, the supported automation patterns for this included ArcGIS Data Pipelines and scheduled Notebooks or tools using the REST API task scheduler, but a new no-code option, web tool scheduling, is delivered in ArcGIS Pro 3.4 and ArcGIS Enterprise 11.4!
You can use any type of geoprocessing tool to create your scheduled web tool - core system tool, ModelBuilder tool, Python script tool or Spatial ETL tool. For my blog subject matter I'm using an ArcGIS Data Interoperability Spatial ETL tool, because it can consume my source data, an RSS feed, specifically a Common Alerting Protocol (CAP) feed, as published by many agencies worldwide, including FEMA in the USA. My CAP data is weather alerts in New Zealand, refreshed twice daily. The feed will have no entries if the weather is good 😉. CAP is XML-based, easily handled by ArcGIS Data Interoperability. I want to mirror the CAP feed status to a hosted feature layer in ArcGIS Online.
Below is a sample alert status map in ArcGIS Pro, for October 9th 2024. The blog download has a couple of CAP alert XML documents, if you want to see some raw data.
CAP Weather Alerts
As the labeling suggests, the yellow feature is a high wind watch, the blue lines (zoomed in, orange polygons) are snow warnings for roads through some mountain passes. If we zoom in to the northernmost feature we can inspect it. It is Lewis Pass, which has two geothermal spring resorts along the route, so if you do get delayed by snow you can wait it out in comfort!
Snow alert through Lewis Pass
A few days later, there is a heavy rain watch:
Fiordland rain watch
For the area, rain only comes in one type - heavy - so it's no surprise the prediction of an alert upgrade from watch to warning (orange) became true at the next update in 12 hours, plus some new alerts arrived:
West Coast rain
And the next day - yet more weather!
Yet more weather!
Regular updates like this are a classic case for a scheduled web tool, in fact that's how the data was refreshed for me overnight. What does that ETL look like?
My data flow maintains a hosted feature layer in ArcGIS Online from the current CAP status. My ETL tool is quite simple, here it is (also in the blog download, requires ArcGIS Data Interoperability for Pro 3.4, and ArcGIS Enterprise 11.4 if shared as a web tool).
CAP alert ETL tool
First a token is generated (using an EsriOnlineTokengetter, for a local portal you would use an EsriPortalTokenGetter), then the upper stream reads the RSS feed and writes an upsert transaction to the target feature layer - any new alerts become new features and any data changes to existing features are applied. Upsert support requires the data have a uniquely indexed non-null field in the layer, as discussed in an earlier blog. The lower stream tests for expired alerts and deletes them. Note that the ETL tool has no parameters, because the input RSS feed and output feature layer do not change so can be set as not published at authoring time.
I'm showing a new, recommended ETL pattern here for maintenance of hosted feature layers, namely generating a portal token within the ETL tool rather than sharing web connections to the hosting server, which is an awkward step we can avoid. The target feature service is read and written using the Esri ArcGIS Server Feature Service format and a supplied token, with the option to verify SSL certificates unchecked. If your security requirements require it, you'll need to supply a trusted certificate.
Having run my ETL tool locally in ArcGIS Pro I can share the history result as a web tool and schedule it. There is some mental arithmetic to do when scheduling. The CAP feed is updated by 9AM and 9PM "local time" which at writing is NZDT, or UTC+13. My Pro machine is currently on PDT, which is UTC-7, so to start my schedule at the next available 12-hourly slot, 9AM NZDT, I calculate this is 8PM UTC or 1PM PDT.
Web tool scheduling
In real life, meteorological offices allow for more frequent bulletins than my schedule above, but you get the idea. At the link for the scheduled webtool you can pause, delete or edit the schedule.
The blog download has the tool source fmw file, some sample CAP XML files (which I used when authoring the ETL tool to get the XML parsing right), and a lyrx file that shows how I used data-driven rendering of hex codes in the data. Not shown - creating an initial file geodatabase feature class used to instantiate the target feature service, but you can easily do this from the supplied workspace. Don't forget the alert identifier constraints needed for upsert support!
So there you have it, configuring a scheduled web tool that will run indefinitely and maintain a hosted feature layer!
Now there is nothing stopping you from automating your ETL!
... View more
11-06-2024
11:43 AM
|
3
|
4
|
2608
|
|
POST
|
Hi, yes we have seen this intermittently but have not been able to isolate the cause.
If you are using a non-embedded workspace (i.e. your tool source is a .fmw file) then starting Workbench from the Windows Start menu should not error. If you see a .recover file with the same base name as the .fmw file then accept any message to use it - this will recover your latest edits.
... View more
11-06-2024
06:19 AM
|
0
|
1
|
1111
|
|
POST
|
Andreas, I can supply an alternative workflow and have a blog on the topic which I'm publishing November 7th when ArcGIS Pro 3.4 releases - because it uses a new feature. The subject matter is weather alerts in CAP protocol, which I can see is available in your weather feed. Empty entries are OK. I'll let the pipelines team respond on what you are experiencing.
... View more
11-01-2024
06:04 AM
|
0
|
0
|
1114
|
|
BLOG
|
Tens of thousands of hosted feature layers are made daily in ArcGIS Online.
Below is one I made - street address points in the city of Los Angeles, sourced from the city's open data site.
I didn't make the layer with ArcGIS Data Pipelines, you can tell by the tuned widths of the text fields in the Fields view.
Los Angeles address points and schema
But I can automatically maintain the layer (it has daily changes) with a simple pipeline. Go ahead and click the link, the pipeline is shared to everyone - here's what you'll see:
The Pipeline
Crazy simple, isn't it! The secret sauce is schema mapping. Knowing the output schema is defined by the existing layer, and editing the Input schema appropriately to match the field types, the Map Fields tool lets me connect the incoming field names to the output field names with no further data preparation required.
Here is how Map Fields looks in my case (sorry, not enough real estate to show the whole dialog):
Field Map
Having connected everything (ObjectID is not mapped) and set the output write method to Replace, I can set up a schedule for the pipeline to run after hours and I'm done!
So, all you owners of hosted feature layers, if ArcGIS Data Pipelines can reach your source data, you have a lightweight option to maintain your data, regardless of how you created it.
... View more
09-27-2024
12:01 PM
|
1
|
0
|
751
|
|
BLOG
|
Hello everyone. It is worth noting that upsert writes are asynchronous, which for small jobs (as can be created by change detection) has some overhead, so my example scenario isn't ideal. In production you might consider upserts for cases where there are larger update and insert transactions, and use synchronous insert, update and delete write modes for smaller jobs.
... View more
09-27-2024
06:39 AM
|
0
|
0
|
814
|
|
BLOG
|
Hi Everyone, here is a real run with a day's worth of changes in a 1M-record city address dataset found and written in 2 minutes 41.6 seconds (45 updates, 124 inserts, 34 deletes), less than half the time it would take just to read the target feature service into the tool. Note I edited the tool to write updates, inserts and deletes, not upserts and deletes, as the asynchronous nature of upsert writes is less suited to small jobs. Enjoy!
Changeset Write
... View more
09-26-2024
07:10 AM
|
1
|
0
|
771
|
|
BLOG
|
If you're synchronizing an external data source into a hosted feature layer in ArcGIS Online or ArcGIS Enterprise, then writing each periodic changeset is something you want to be as efficient as possible. Once your datasets get into the hundreds of thousands or millions of features, calculating the optimal insert, update and delete transactions becomes very attractive, minimizing downtime and transaction failure risk. The problem is, deriving edit transactions requires reading the target hosted layer, and this is an expensive operation. This blog shows an approach that avoids querying your target feature layer in favor of using a file geodatabase export of the data, automatically downloaded locally, then read performantly to find the changeset, without impact in extra item storage. Here is the approach in action: BulkUpsert2Avoiding feature layer queryIn an earlier post I showed an example of deriving upsert and delete transactions by reading a target feature layer, which is a serialized approach. My target layer has over 1M features - reading it takes several minutes. The above does better, against the same service. The post download has a toolbox with the Pro 3.5 ArcGIS Data Interoperability ETL tool in it - see BulkUpsert2. The tool reads a CSV file at a URL (also avoiding serialization, which is the default for the server), but also accesses portal content to trigger a file geodatabase export, wait in a loop until the export is finished, then downloads and reads the target feature layer content, not the service, and finally deleting the export item (the downloaded file geodatabase is also automatically deleted). This is core functionality. There are two mint green custom transformers in the tool. An EsriOnlineTokenGetter retrieves a token needed for a later HTTP call (if you are working with an Enterprise portal you would use an EsriPortalTokenGetter). Pro Tip! The blog download now contains the tool BulkUpsert2WebConnection which avoids retrieving a token by using an OAuth2-based web connection. The web connection is supported by the Esri ArcGIS Connector package, which supplies the Esri ArcGIS Feature Service format used in the workspace. The web connection is then used by an HTTPCaller transformer, eliminating the need for a token . If your organization enforces multi-factor authentication then retrieving a token is blocked and the custom transformers will not work. It is now recommended practice to install the package, create a new web connection and use it for feature service read/write and ArcGIS REST API http calls. The other custom transformer is a looping transformer you can inspect in its tab labeled FeatureServiceExportLooper. It checks the export job status every 2 seconds until the job is complete. Note that how long an export job takes is dependent on how big your service is and also how busy the host portal is - I have seen ArcGIS Online queue jobs as well as run them immediately. Here is a run taking a little over a minute (just for the export) at a busy time in ArcGIS Online: FeatureServiceExportLooperFeatureServiceExportLooperThe net result however is a significant net gain. Here is a screen capture from the earlier blog and the serialized approach - note the session duration. Reading the target feature layer takes longerReading the target feature layer takes longerSo that's how to avoid reading hosted feature layers with a serialized approach. Now you have an option to time consuming changeset construction!
... View more
09-24-2024
11:53 AM
|
5
|
2
|
1565
|
|
BLOG
|
@StuartSmith1 The Python script that writes item detail metadata should work with any version of Pro having the arcpy.metadata module, and does not need Data Interoperability. The workflows using the ArcGISOnlineConnector are using Data Interoperability at the Pro 3.3 release but may work at an earlier release, this had not been tested.
... View more
09-24-2024
06:10 AM
|
0
|
0
|
667
|
|
BLOG
|
A comment in LinkedIn prompts me to make clear what keeps the article's pagination approach simple. You're usually expanding arrays returned from each API call but do not do this inside your looping custom transformer as then your loop continuation test might require sampling or counting features, both of which require blocking transformers, which will then require an externally linked custom transformer - much harder to manage. I have made that mistake, and it put me off looping until I figured it out! If you inspect the loop continuation test in the UpdateFS workspace you'll see I test if the response is JSON and does not match the regular expression \[\s*\]. The server returns "[ ]" (with a space) in the array, hence the regex allowing any number of spaces, including none.
... View more
09-16-2024
10:20 AM
|
0
|
0
|
953
|
|
BLOG
|
The Problem
I'll try out a new acronym here - B2G - Business To GIS - meaning the delivery of externally managed data into ArcGIS. It is very common that external ("business") systems offer a REST API you can call to get common format data like JSON, but also very common that the server delivers a dataset in multiple chunks (i.e. "pages"), and that the JSON isn't in a well-defined dialect like GeoJSON - you have to parse it into columns, potentially from struct or array objects.
This post is about conquering the two challenges of pagination and parsing, without coding, using ArcGIS Data Interoperability.
My subject matter data is Federal Communications Commission customer complaints data about unwanted calls, filtered for 2024. If you want to see how it looks as incoming JSON click here for 1000 random records, but here's the end result as a feature service:
FCC Customer Complaints - unwanted calls
Some research reveals there are several common pagination approaches:
Offset and limit parameters defining the starting row position and row count, reading ordered data.
This approach may include another parameter for which property sets the sort order
Page number and (optionally) page size parameters.
Query based pagination, where a start row is defined by a logical query, implying any next query.
Time based query, a special case of query-based pagination.
Cursor based pagination, where the API provides page names, including any next page, in the result.
Combinations of the above.
The first two approaches are the simplest (and only ones I have used), the server does the page building calculations and the client need only keep track of a simple counter, sending requests until the data is exhausted.
Here is an example of an offset and limit based API and here is an example of a page based API.
For an example of #6, combining #3 & #4, you can read up on querying hosted feature layers. ArcGIS takes care of building layer queries for you.
How about parsing made simple too? We have to turn records like this into typed columns and geometry!
{
"issue_type" : "Phone",
"caller_id_number" : "830-210-2001",
"state" : "IL",
"method" : "Wired",
"advertiser_business_phone_number" : "None",
"issue_time" : "8:41 am",
"issue" : "Unwanted Calls",
"zip" : "60629",
"type_of_call_or_messge" : "Live Voice",
"issue_date" : "2024-01-04T00:00:00.000",
"id" : "3739134",
"location_1" : {
"latitude" : "41.781382",
"human_address" : "{\"address\": \"\", \"city\": \"IL\", \"state\": \"\", \"zip\": \"60629-5219\"}",
"needs_recoding" : false,
"longitude" : "-87.732853"
}
}
No problem! Let's get started.
Pagination
We're calling a REST API here, and it offers a paginated response. In ArcGIS Data Interoperability that means calling the HTTPCaller transformer in a loop, in this case with offset, limit and order parameters, incrementing the offset value each call, until all data is received and the response is an empty array. You might not have known you can loop in a visual programming environment, but you can, and it's easy. The mint green custom transformer "UnwantedCallsLooper" near top left in my workspace does the job. It is a looping custom transformer.
Parent ETL workspace
The custom transformer is embedded (included) in the Main canvas, it lives in its own eponymously named tab. You create custom transformers by selecting one or more ordinary transformers in the Main canvas then a context menu accessible by right-click offers a custom transformer creation choice. In my case I selected just a single HTTPCaller transformer.
After creating the custom transformer there were more editing steps to set up looping:
Add a loop return connection using a context menu choice in the canvas
Add a test for completion ahead of the loop return
Increment the offset parameter
Below is how my custom transformer looks. At run time each feature arriving from the Main canvas has an offset attribute used by the HTTPCaller transformer. The limit and order parameters do not change so are "hard coded" in the HTTPCaller. The lower stream is the loop-until condition - if the response is not an empty array the data isn't exhausted, so increment the offset and loop!
Looping custom transformer
Parsing
Each API response is an array of JSON structs and is contained in an attribute named "_response_body". The array is exploded into separate features with a JSONFragmenter. The JSON Query parameter just means we're fragmenting the top level array and the other settings just ensure delivery of fragments that look like the code block above.
JSONFragmenter
Now we parse each fragment feature with a JSONExtractor.
JSONExtractor
The Extract Queries grid defines the output attribute names and where the data comes from in the fragment. The JSON queries follow a simple syntax you can key in, but there is a simple trick to getting a picker to let you navigate the JSON structure and automate query generation. Ahead of the JSONExtractor, temporarily sample a single feature and write it to a file, then temporarily set your JSONExtractor input source to this file and you'll get a handy picker for your queries! When you have populated your extract queries remove the Sampler and reset your JSONFragmenter input source to be the document in _response_body.
So that is pagination and parsing de-mystified! In the blog download there are two workspace source fmw files. CreateFC creates a file geodatabase output and is where I figured out the data processing, like fixing data errors that defeated creation of a datetime field. I wanted an initial feature class anyway to symbolize and share as my target hosted feature service. UpdateFS borrows the data processing steps in CreateFC but contains the looping custom transformer and some logic to detect and apply data changes whenever the ETL tool is run, which is what you'll likely need in production.
Please comment in this post with any questions and observations. Page away!
... View more
09-16-2024
08:02 AM
|
1
|
1
|
2386
|
|
BLOG
|
Here is a run with some sample feature counts. Upsert with feature counts
... View more
09-09-2024
01:47 PM
|
0
|
0
|
1121
|
|
BLOG
|
Thanks for the question, I edited the article to hopefully make it clearer where upsert capability is supported in ArcGIS. My worked example uses ArcGIS Data Interoperability, which can work against hundreds of data sources, but see also core geoprocessing supports upsert in the Append geoprocessing tool and ArcGIS Data Pipelines also has an upsert capability.
... View more
09-09-2024
06:01 AM
|
0
|
0
|
1191
|
|
BLOG
|
Today's featured transaction is upsert, ably assisted by delete. If you are maintaining a geodatabase or hosted feature layer (ArcGIS Online or ArcGIS Enterprise), unless you're doing something drastic like a truncate, you can package all your ETL edits up into these two methods using ArcGIS Data Interoperability (but also see core options below). Inserts and updates travel together as upserts, while deletes speak for themselves. For example use case data, I'm using the City of Los Angeles Open Data Address Points. If you're keen to take a look at the raw material, this link will trigger a CSV file download (161MB+, 1M+ rows). Los Angeles address points The map shows the result of ETL into a hosted feature layer, which is my target information product. It could also be any type of geodatabase. The raw data is typical of many ETL sources, with these properties: The data is not immediately accessible to ArcGIS The data schema is a little cryptic The data changes frequently, but only a small fraction of a big dataset Edits may be inserts, updates or deletes No metadata fields exist to track edits The data has a persistent primary key field #1 & #2 above are basic ETL challenges which are easily solved. #3 suggests that upserts and deletes are candidate methods, while #4 is what makes upserts possible. Using a match key is the secret sauce for this post, but like any good cooking show the recipe comes after a look at the result! Upsert & Delete ETL Tool This single ETL tool supports two stages in the lifecycle of my target information product: Creation of the feature service Applying updates to the feature service on demand To create the feature service, the greyed-out writer labeled Creation would be enabled and the rest of the workspace would not exist. After service creation I disabled this writer and added the rest of the workspace. The workspace still reads the source CSV file at a public URL, but also reads the target feature service (which of course must already have been created), calculates the upsert and delete transactions, and applies them. It's a simple and powerful ETL tool but there is critical enabling work needed outside the ETL tool. The REST API Append method that supports upsert operations requires the match key field (House_Number_ID in my case): Have a unique index Not allow null values Meeting these conditions requires two simple processing steps outside the ETL tool before upsert processing will work. In my ETL tool, there is no way to set these properties, and there are two relevant factors: The Alter Field geoprocessing tool does not support changing the allow nulls property for a feature layer field, but the Add Attribute Index geoprocessing tool does allow creating a unique index on a feature layer field. To work around the allow nulls problem I exported the initial feature layer to my project default geodatabase using the Export Features tool and in the field map control unset the allow nulls property for the House_Number_ID field. Export Features to Geodatabase With the output feature class in Pro, I then created a map layer, taking the opportunity to apply symbology other than the pink lemonade default, then overwrote my target feature layer. With the overwritten target layer, I then added a unique index for my match key field: Add Unique Index Now the target layer is in shape for configuring upsert (and delete) processing! The ChangeDetector transformer is what generates the upsert and delete change sets. Here are the settings: Change Detector Parameters Using match attributes: Detection Fields Change detection comes at the expense of reading the target features in addition to the source features but does deliver an optimally small edit payload. For my subject matter data the whole job takes 5 minutes. To reiterate, while I'm showing a hosted feature layer as my target information product, upsert write capability is also available for geodatabase features via ArcGIS Data Interoperability. For completeness, I'll give a shout out to two other places upsert capability is offered in ArcGIS, namely the Append geoprocessing tool and Data Pipelines Add and Update output option. However, if you need to delete features as well as upsert them, then in core geoprocessing you'll need to use Delete Features or Delete Rows or in Data Pipelines replace the output feature layer. This pipeline does that job for the feature layer my ETL tool operates on. Data Pipeline Replacing Los Angeles Addresses I'll take the opportunity here to call out that using Data Pipelines to maintain hosted feature layers in ArcGIS Online that were created by separate ETL processes is perfectly valid. In my pipeline you'll see the Map Fields tool that greatly assists connecting the schema coming from a CSV file with how I defined the schema using Data Interoperability. So there it is, upsert is there for you, ready to go! The blog download has my Spatial ETL tool plus associated toolbox with the models.
... View more
09-06-2024
12:37 PM
|
2
|
4
|
2202
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | 10-06-2025 05:36 AM | |
| 1 | 11-03-2025 05:14 AM | |
| 3 | 11-04-2025 08:41 AM | |
| 1 | 10-23-2025 01:24 PM | |
| 2 | 10-22-2025 09:17 AM |
| Online Status |
Offline
|
| Date Last Visited |
yesterday
|