|
BLOG
|
At writing it's the week after the 2025 Esri International User Conference, and in holding up the ArcGIS Data Interoperability and ETL Patterns topics in the Esri showcase all week I can report that most interest for inbound data flow was around maintaining hosted feature services from external data. This post will show you a technique I have blogged about before but with an important update, how to most efficiently do bulk change detection, so read on. Here's my subject matter data, street addresses for the City of Los Angeles, who kindly make the data available at their open data site. The address data is maintained daily. Beverly Hills isn't in Los Angeles Data integration strategies include a variety of approaches, and there are generally multiple "right answers", a good overview can be had by watching a presentation by my Esri colleagues earlier this year. In the case of my subject matter data, all the options discussed in the presentation (ArcGIS ModelBuilder, python scripts python notebooks, ArcGIS Data Pipelines and ArcGIS Data Interoperability) are all valid approaches. But this is the Data Interoperability community space, so with focus on that option I have a processing tip for you. You'll hear in the presentation linked above that maintaining a hosted feature service is very common, and minimizing downtime while doing so is important. One way is to maintain two feature services, edit one at a time and do a source swap between them. If using ArcGIS Data Interoperability however, you have another option available, the ChangeDetector transformer, which can very quickly derive a transaction that includes only added, updated or deleted records, almost always a very efficient transaction. The trick though is getting the existing and new bulk data to the transformer so it can calculate the delta transaction! This is how I recommend doing that: Using a web connection to drive change The ETL workspace FMW source is in the post download. There is no escaping retrieving new incoming data by download, but you can avoid streaming in the current state of the target feature service (which is comparatively slow) by asking the portal to generate a file geodatabase export, and downloading that. The export job call requires a token, which is difficult to generate if your org is like mine and enforces multi-factor authentication (MFA). The trick is to use web service authentication in the HTTPCaller transformers that initiate the export job and check job status in the custom looping transformer that waits for export job completion. Web service authentication supplies a token behind the scenes so you don't have to supply it as an HTTP URL parameter. That is the central trick behind this approach. You'll see from the workspace image above that a daily change transaction for Los Angeles' addresses is a tiny transaction at the end of a few minutes taken reading the data. The log file (not shown) tells me 67 features took part in the transaction, which took 0.5s.
... View more
07-25-2025
08:48 AM
|
4
|
0
|
915
|
|
BLOG
|
Hi Sam, I saw Kieran's presentation at the Peak conference - best of show!
... View more
06-27-2025
05:53 AM
|
0
|
0
|
759
|
|
IDEA
|
Hear hear - I completely agree. I did most of my geoprocessing in ArcCatalog too.
... View more
06-25-2025
05:31 AM
|
0
|
0
|
793
|
|
BLOG
|
Problem Definition Knowledge Graph entities (also known as nodes) are linked by relationships; entity ESRI__ID values are foreign keys in the relationship columns ESRI__OriginID and ESRI__DestID. The ID values are derived from the GlobalID data type, and are system generated. Because ESRI__ID values need to exist on entities before you can use them in relationships, many people think you need to create or maintain a graph in two stages, first entities and then separately for relationships, and they end up with two ETL tools to manage, or more if processing is done per combination of entity and relationship. This is unnecessary. This blog shows how you can maintain a graph using one ETL tool that writes both entities and relationships in one run. Graph Scenario Firstly, see the obligatory - and very busy - map! Today's graph subject matter is worldwide airport and flight data for 24 hours either side of ETL tool run time, so about half the flights are in the past and half are scheduled for the near future. Note the flight paths do not model actual aircraft routes, the graph is only intended to model connectivity. More on use case scenarios is below. World airports and flights The ETL Tool The data is made available by FlightAware from their AeroAPI endpoints, accessed by this ETL tool - available in the blog download. You will need your own API key. Single pass ETL tool for graph maintenance Even if you don't have AeroAPI access, download and unzip the blog attachment and install the content as follows (requires ArcGIS Data Interoperability for Pro 3.5+): Create.fmw - put this workspace source file in a Pro project home folder Optionally create an ETL tool using the fmw as the source LoopingAirportsGetter.fmx - a custom transformer used by Create.fmw Put this in your user profile folder C:\Users\<yourusername>\Documents\FME\Transformers There is a lot of useful material we'll cover in the tools, but to get immediately to the blog's main goal - how to write entities and relationships in one workspace - you'll see in Create.fmw that you first write entities into the graph with a FeatureWriter transformer, then use the FeatureWriter's Summary output port to trigger reading the entities back into the workspace with a FeatureReader transformer - they then have the ESRI__ID values you need. The workspace is not laid out compactly to show this sequencing but look for the transformer named FeatureWriter that writes airports and you'll see a direct connection from its Summary port to the transformer named FeatureReader reading the airports just written. Summary ports output a single non-spatial feature with a few identifying and statistical properties after the write transaction is committed. Writing relationships can be done with ordinary Esri Knowledge Graph writers as they have no downstream dependency. If that's all you came for today then no need to read further, but if you enjoy deep dives into ETL then you'll likely learn something in the rest of the post, so read on! I'm making a graph with data sourced from an API, and one that follows standard modern practice - REST calls return paginated JSON responses, and the whole API has an OpenAPI specification. You can inspect the API at this URL and notice the link to the OpenAPI specification. Since the OpenAPI specification is available, it can be imported into an OpenAPICaller transformer, which turns HTTP call construction into a form-filling exercise. Here is the first OpenAPIcaller in the workspace. Notice I'm asking for 100 pages (1500 records) of airports data and the header includes a tool parameter for the API key and a request to receive a JSON response. The airports schema isn't very wide so 1500 records doesn't overload the HTTP GET response and cause errors, but I am only getting an initial record set, not all airports data. OpenAPICaller for Airports The API supports pagination. If a request doesn't return the last records available on the server then a JSON object named next (a URL) is available in the response, sending it as a request returns the next set of pages. This lends itself to a loop to get all data, which is what the LoopingAirportsGetter custom transformer does. LoopingAirportsGetter As the next URL is built for us we can use a simple HTTPCaller in the custom transformer, not another OpenAPICaller. Now we have all airports data and can write out the entity type. If you inspect the tool, you'll see that after the airport entities are written into the graph they are read back out (with ESRI__ID values) and another OpenAPICaller gets flights for each airport. This time 50 pages of data are retrieved per call (the schema is wider) but we can have 25 calls in flight at any time as we're not paging through one large cursor on the back end, but each airport's flights. OpenAPICaller for Flights There are a handful of airports worldwide with more than 50 pages of flights (750 records) over 48 hours and these are retrieved with an HTTPCaller if next is not null from an initial request. Note the start and end query parameters. These are UTC timestamps in ISO format, generated at tool start time by scripted parameters - so some code crept into my ETL tool! This could be done with transformers. The Graph in ArcGIS I'll let you surf the tool to inspect the entity & relationship construction logic, but basically the entity types are Airports (points) and Flights (2-point lines) and the relationships are airports have departures on flights (HasDeparture), flights may have connections to other flights (HasConnection), and flights have arrivals at airports (HasArrival). The business logic used for connections is flights are connected if an inbound flight touches down between 1 and 4 hours before the outbound flight takes off. In real life there might be other factors like agreeing code shares, but this is just a demo! Here is the graph data model view, airports and flights have relationships to each other and flights have connections with other flights. The Document entity is not used. FlightAware Graph Data Model Now let's make an analytic query! Let's say I'm a law enforcement officer and I want to ask airlines and airports to check passenger manifests and recent video footage for a suspected jewel thief who I think left Los Angeles to travel to Berlin, or is about to do so. What airlines, flights and airports make most sense to enquire with? Of course I break out my openCypher skills and use my daily-updated graph! I'll let you step through the code, but what it does is find the shortest flying time path between Los Angeles and Berlin-Brandenberg, to a maximum of 4 flight legs. match path = (origin:Airports)-[:HasDeparture|:HasConnection*0..3]->(:Flights)-[:HasArrival]->(destination:Airports)
where origin.name = 'Los Angeles Intl' AND destination.name = 'Berlin-Brandenburg'
with path, nodes(path) as flights
unwind flights as flight
with path, sum(case when flight:Flights then flight.filed_ete else 0 end) as totalDuration
return path, (totalDuration/3600) as totalDuration
order by totalDuration
limit 1 A path is returned from my query.... Los Angeles to Berlin The route is Los Angeles to Berlin via John F Kennedy Intl and London Gatwick. Here is the path added to a map in ArcGIS Pro: Los Angeles to Berlin Route Now I can contact my law enforcement colleagues worldwide with a focused request! Discussion This demonstrates both the single-tool approach and a lot more besides around using API data and Knowledge Graphs. The ETL workspace in the download is in a ready to run state assuming you have an API key and have already built the graph before - it is updated. The tool can be run using ArcGIS Pro's regular tool scheduling feature, say every day. You will not be in this state to begin with, but you'll see some Creator transformers that can be used to run the tool manually in parts, say to create the Airports entities. Work this way by temporarily disabling Creators or other transformers that are in streams you don't want. I created empty relationships manually as Knowledge supports specifying the origin and destination for relationships (so it can display the data model). Comment in this post if you have questions or observations. Have fun with your graph ETL!
... View more
06-24-2025
10:15 AM
|
4
|
2
|
4807
|
|
IDEA
|
06-24-2025
05:41 AM
|
0
|
0
|
486
|
|
IDEA
|
I asked around, it looks like Data Interoperability can't be imported into ArcPy, this is what is available: https://doc.arcgis.com/en/arcgis-online/reference/use-arcpy-in-your-notebook.htm You could call a web tool though.
... View more
06-24-2025
05:40 AM
|
0
|
0
|
488
|
|
IDEA
|
As far as I know, provided the ArcGIS Server machine hosting the notebook has Data Interoperability installed and licensed it should be available to the advanced runtime. Best to check with arcpy.CheckExtension('DataInteroperability') before you invest a bunch of time in it - please let this thread know!
... View more
06-23-2025
05:30 AM
|
0
|
0
|
508
|
|
IDEA
|
Hi Oiligriv Depending on what you want to do, this might be already offered. If you want to run a web tool on the same server then you should be able to import the toolbox and run it. If you want to schedule a tool in a local tbx this should also work. If you can share more details we'll be able to respond.
... View more
06-21-2025
07:40 AM
|
0
|
0
|
537
|
|
BLOG
|
Hi, the notebook imports ArcPy and therefore requires an advanced Notebook.
... View more
06-16-2025
06:17 AM
|
0
|
0
|
270
|
|
BLOG
|
@Youssef-Harby Hi, thanks for reading my post and for the comment! The notebook attached to this post reads remote GeoParquet and writes to a project home file geodatabase feature class, so it isn't an in-memory view of data. I have another blog that does create in-memory data, including from GeoParquet (requires ArcGIS Data Interoperability extension). Core ArcGIS Pro caches GeoParquet behind the scenes, so isn't an in-memory experience either. There are many Arrow fans at Esri, you can see how to use the format via Python like here and here. We're still pretty early in the GeoParquet journey in ArcGIS, for example how nested data types are going to be used to push complex data models down into the base table (for sharing purposes) is a topic that interests me, and also how people might use GeoParquet to share constantly evolving big data managed in a branch versioned enterprise geodatabase, so the more feedback from customers the better!
... View more
05-19-2025
06:52 AM
|
1
|
0
|
805
|
|
BLOG
|
With the release of ArcGIS Pro 3.5, the stars align a little more when it comes to the use of GeoParquet. You can now work with local GeoParquet files for your mapping and analysis needs, but it is also much easier to ingest big GeoParquet data from an S3-API-compliant object store! This post is about how simple it is to bring remote GeoParquet data into your project. The enabling technology is DuckDB, now included in the default Python environment in ArcGIS Pro 3.5 - no more package management just for this spectacularly useful client technology. Here is an example, the entire Overture Maps Foundation divisions dataset accessed from their AWS S3 object store and written to my project home geodatabase. Overture Divisions Automation is key to GIS happiness, so to access this data I created a simple notebook which you can find in the post download. You'll need ArcGIS Pro 3.5 to run it, or an earlier release with your Python environment extended with DuckDB 1.1+. It takes me about 6 minutes to download the 1m+ features to my project home geodatabase, but a big chunk of that is taken up in a couple of best-practice steps, namely sorting the features on area (descending) and repairing any geometry issues. The sort step is so small features display on top of large features, the geometry repair is commonly needed for point-rich data that "tiles the plain' like these divisions do. The lift and shift itself is fast. I'll let you inspect the notebook for yourselves, but note the option to apply an attribute or spatial filter on the features you download, for example within a bounding box in lat/long or the name of a country. Instead of manually download a set of very large parquet files from S3 you now have a simple tool to go get what you want, any time you like!
... View more
05-14-2025
01:24 PM
|
5
|
3
|
2168
|
|
POST
|
Thanks for the clarification Thomas, we'll look into making this behavior easier to use.
... View more
05-14-2025
06:04 AM
|
0
|
0
|
1324
|
|
POST
|
Thomas, while we're looking at this, to get going you can explicitly set the interpreter: https://community.safe.com/general-10/how-do-i-direct-fme-to-use-the-active-cloned-python-environment-in-arcgis-pro-instead-of-the-default-and-unmodifiable-environment-24257
... View more
05-13-2025
12:53 PM
|
0
|
2
|
1339
|
|
POST
|
Hi Thomas, I also see the default environment path logged, but in my case the cloned environment is available, let us look into this. Back ASAP.
... View more
05-13-2025
05:47 AM
|
1
|
0
|
1362
|
|
POST
|
Hi Matt, yes this can be done. Create a layer file then use the Esri ArcGIS Layer reader, set to point at the LYR or LYRX file. Definition queries are honored, it's a really great workflow.
... View more
05-09-2025
08:06 AM
|
1
|
1
|
547
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | 10-06-2025 05:36 AM | |
| 1 | 11-03-2025 05:14 AM | |
| 3 | 11-04-2025 08:41 AM | |
| 1 | 10-23-2025 01:24 PM | |
| 2 | 10-22-2025 09:17 AM |
| Online Status |
Offline
|
| Date Last Visited |
14 hours ago
|