|
BLOG
|
Continuous Integration / Continuous Delivery (CI/CD) is a software development methodology. I'm borrowing the term and applying it to data, not software, specifically data that endlessly changes over time and that you need to integrate into your systems of record, well, continuously! Always one to jump in the deep end, my example is taking a feed of public transport vehicle positions (accessed via a REST API) that refreshes every 30 seconds and pushing the data through to a hosted feature service in an ArcGIS Enterprise portal and also into a spatially enabled table in Snowflake. I will use a web tool for the processing as high availability is obviously advisable. Here are bus, train and ferry positions a few moments ago (early Saturday local time) in Auckland, NZ. Bus, train & ferry positions in Auckland I'm probably working at the extreme end of continuous bulk data integration frequency, I expect the vast majority of integrations are performed at intervals of hours or days, but at least you'll know what can be achieved. Bear with me while I have some fun with my integration scenario 😉. Lets say I work at Fako (a fictional name), who have solved the first-mile/last-mile problem of e-commerce. Fako has identified that commuter networks are very efficient for bringing people and retail goods together, buying or selling, when the riders are the buyers and sellers. In partnership with transport operators we remove a few rows of seats in each vehicle, both sides of the aisle, and replace them with grids of smart storage lockers to which internet shopping can be 'delivered' by our stevedores at our warehouses co-located with transport terminals. Buyers order from any web site for delivery on a day and route, sellers sell on our website and we transfer items to routes anywhere on the network. We're freight forwarders. Fako's mobile app lets customers use their phone to unlock the locker their item is in any time during their trip. Some lockers are refrigerated, we have our own meal kit line. A very popular feature of our mobile app lets riders bid in auctions for abandoned items. Fako pays transport operators the equivalent of a rider fare per item, greatly boosting their effective ridership. Fako doesn't have to buy a fleet of delivery vehicles and the transport operators are getting increased revenue. Business is booming! Fako's back end systems run on Snowflake. To make everything work, Fako needs to maintain the network status continuously as spatially enabled Snowflake objects. Let's see how! First the boring way, for which I happen to have disqualified myself by choosing a frequency Windows scheduled tasks don't support, would be to copy a Spatial ETL workspace source fmw onto my Data Interoperability server and configure a scheduled task based on the command line documented in a log file from a manual run as the arcgis user: Command-line to run this workspace: "C:\Program Files\ESRI\Data Interoperability\Data Interoperability AO11\fme.exe" C:\Users\arcgis\Desktop\ContinuousIntegration\VehiclePositions2Snowflake.fmw You should carefully consider this option for your situation, it is robust and simple. Now for the non-boring way. Like I said, I pushed myself in this direction by working with data that updates in bulk at high frequency. I make a web tool that performs the integration then calls itself after waiting for the source data to update. Whoa a web tool that calls itself, no webhook and no scheduling? Its crazy simple (possibly also just crazy). I made two Spatial ETL tools, one real and one a dummy that does nothing, but has the same name and parameters (none in this case). I shared a history item of the dummy version of VehiclePositions2Snowflake as a web tool and recorded the submitJob URL. It is important the web tool be asynchronous so when it gets called it doesn't block the workspace waiting for a response: https://dev99999.esri.com/server/rest/services/VehiclePositions2Snowflake/GPServer/VehiclePositions2Snowflake/submitJob Then edit the real ETL tool, in the final HTTP step, to call the submitJob URL. Run the tool and from its history item overwrite the dummy web tool. I'll let you surf the tool yourself but basically the upper stream fetches the vehicle data and synchronizes it to the portal and Snowflake and the lower stream waits for this and for 30 seconds to elapse then makes the HTTP call. The Self Integrating ETL Tool Then just run the web tool once manually and you're off to the races, it will repeat endlessly. I'm sitting here refreshing my Pro map (no cache on the feature service layer) and seeing the transport fleet move around. In Snowflake my data is also refreshing: Snowflake console Back to some boring details, don't forget when publishing Spatial ETL tools as web tools Data Interoperability must be installed and licensed on each tool hosting server and when using web connection or database credentials like I am here, go to the Tools>FME Options dialog and export the required credentials (right click for the menu) to XML files, put these on your server and import them into the Workbench environment as the arcgis service owner. If manually running a workspace on the server you might have to change the Python environment too. Lastly, while the blog download has an FMW file the tool you publish to your server should have an embedded source. Now that was fun!
... View more
04-16-2021
01:27 PM
|
4
|
0
|
3828
|
|
IDEA
|
Thanks for this idea Phil, on re-reading your idea I'm not sure I can read all attributes and geometry of deleted features given GobalID. This is possible with versioned enterprise geodatabases but as far as I know not feature services but I'll check.
... View more
03-09-2021
12:19 PM
|
0
|
0
|
4485
|
|
BLOG
|
A colleague brought me this problem, a utility customer with large, versioned enterprise geodatabases wished to maintain what amounts to replicas synchronized daily (overnight). Geodatabase replication was not feasible (I took his word for this, something to do with geometric networks, but in any event if the target was a feature service definitely the case). The target was to be in Web Mercator and not the source low-distortion coordinate system. Normally I relish every opportunity to pull a ChangeDetector transformer out of my hat as its is a very flexible, fast way to derive INSERT, UPDATE & DELETE change sets that are then efficient to write. The problem in this case though was data scale, reading the data into my ETL workspace would take hours (please don't take this as a general statement about Data Interoperability ETL workspaces, it's just enterprise geodatabases are busy things and reading very large batches of data can take time, and this database had over 10 million features). Time to visit a little known feature of Data Interoperability's enterprise geodatabase reader (GEODATABASE_SDE short name) - reading version differences. Here is how to use it. You're going to need a version that is a direct child of Default that you never edit, it doesn't matter what other versions you have but only edits posted to Default will propagate to the mirror. I call my child version 'Deltas'. Your initial target system (geodatabase, database or feature service) must be a copy of Default. Your daily workflow is to get your edits into Default then compare version differences with Deltas. If you think about it, after a daily edit post Deltas is a view of the database one day before Default. That means its 'older', or an 'ancestor' of sorts (made my head spin too at first that a child is a logical ancestor but bear with me). When adding the geodatabase reader to your ETL workspace, set the Read Version Differences property and use a connection to Deltas as the transactional common ancestor. I know, its weird, but it works. When the reader executes the features returned will be in the context of edits needed to make Deltas look like Default, and will have a format attribute fme_db_operation set to INSERT, UPDATE or DELETE. Now all you have to do is apply the differences. I'll walk you through the sample workspace pictured below. The reader uses the option to merge all selected feature types using a * wildcard. This lets you use a single reader for the entire set of versioned feature classes - did I mention how powerful this option is? There is always a format attribute fme_feature_type available to let you see what source feature class anything came from. The AttributeExposer lets me access fme_db_operation and an attribute FACILITYID, a unique key within each feature type that lets me support update operations. If you don't have such a key field you will need to handle updates as delete/add pairs which I haven't modeled here. Inserts and updates can go directly to the target database or feature service but deletes are a bit tricky. The FeatureReader reads one Deltas feature at a time with a SQL where statement that selects by ObjectID and you must also set the accumulation mode to Merge Initiator and Result to get the attributes from the deleted features onto the feature (they come through as null otherwise - they were deleted!). Then the data goes to the writer, which has FACILITYID set as the match column. After your synchronization completes the simplest way to be ready for the next day's run is to drop and recreate the Deltas version. You could automate this with a shutdown script like in this article. In the customer system about 2000 edits across dozens of feature classes were 'posted' to the target in less than 10 minutes. A sample ETL workspace is in the blog download. Comment here for any clarifications.
... View more
03-03-2021
09:25 AM
|
2
|
0
|
2012
|
|
POST
|
Well you can but it would be extracting data from Redshift to do it, not a direct read.
... View more
03-01-2021
08:11 AM
|
1
|
0
|
7631
|
|
POST
|
Data Interoperability extension supports Redshift now and will add Redshift Spatial (both read and write) at Pro 2.8
... View more
03-01-2021
07:43 AM
|
1
|
3
|
7638
|
|
POST
|
This tool wraps the Data Interoperability function in a Python script tool: https://pm.maps.arcgis.com/home/item.html?id=834e3ba8034e4e7f83d9fc4fcfb5713c
... View more
02-23-2021
08:28 AM
|
0
|
0
|
2470
|
|
BLOG
|
Good to hear, but by the way, your screen shot of 10.8.1 for Desktop above relates to the ArcMap install, but I could see the Pro install was OK as the Analysis ribbon commands are active.
... View more
02-10-2021
04:57 AM
|
0
|
0
|
5698
|
|
BLOG
|
I replaced the .pth file with one that expects Python 3.7 in Pro 2.7, please download the zipfile again, your process should work.
... View more
02-08-2021
11:55 AM
|
0
|
0
|
5710
|
|
BLOG
|
Hi, please check you have Data Interoperability extension installed and licensed, the import error looks like a missing install or license issue. As your real goal is to detect changes in a feature service it may be more efficient to use a Spatial ETL tool directly in a model. If you are able to share some sample data I can show you how that would work.
... View more
02-08-2021
05:09 AM
|
0
|
0
|
5724
|
|
POST
|
You should be OK then, but in case you get into any arguments with geodesists make sure you record that the heights are AGL. If the CSV data is available electronically somehow, say FTP, HTTP, email, cloud storage etc. then you can automate download and processing end to end with Data Interop.
... View more
01-28-2021
09:05 AM
|
1
|
0
|
4944
|
|
POST
|
I was thinking things like visibility analysis; constructing your 3D geometry is OK with VertexCreator, you just need 'ground' to visualize it. Z from the AGL values is OK. There are specialist solutions for radio propagation which I don't know anything about but Esri has a specialist if you need.
... View more
01-28-2021
08:42 AM
|
0
|
2
|
4951
|
|
POST
|
If you have AGL heights then you'll need a ground surface for any analysis.
... View more
01-28-2021
08:20 AM
|
0
|
4
|
4959
|
|
POST
|
Adam, WGS84 elevations are defined in meters above/below the ellipsoid so convert feet to meters except if your Z values are from the center of the earth then you'll need to additionally do a vertical datum transformation with CsMapReprojector. A wider issue is using WGS84 at all, if you're doing anything involving visibility or density or area a local projected coordinate system would be better.
... View more
01-28-2021
08:08 AM
|
0
|
6
|
4967
|
|
POST
|
Adam you can chain two NullAttributeMapper transformers to do this, in one change 'Yes' to a new value 1 and in the other 'No' to 0. Somewhat unhelpfully BulkAttributeMapper is an alias name for the transformer.
... View more
01-25-2021
05:46 AM
|
1
|
2
|
3366
|
| Title | Kudos | Posted |
|---|---|---|
| 1 | Wednesday | |
| 1 | Wednesday | |
| 1 | Tuesday | |
| 4 | 05-28-2026 05:58 AM | |
| 1 | 05-15-2026 06:54 AM |
| Online Status |
Offline
|
| Date Last Visited |
Thursday
|