Select to view content in your preferred language

ETL Pattern: Avoid Serializing Bulk I/O

108
2
4 weeks ago
ShareUser
Esri Community Manager
0 2 108

If you're synchronizing an external data source into a hosted feature layer in ArcGIS Online or ArcGIS Enterprise, then writing each periodic changeset is something you want to be as efficient as possible.  Once your datasets get into the hundreds of thousands or millions of features, calculating the optimal insert, update and delete transactions becomes very attractive, minimizing downtime and transaction failure risk.

The problem is, deriving edit transactions requires reading the target hosted layer, and this is an expensive operation.  This blog shows an approach that avoids querying your target feature layer in favor of using a file geodatabase export of the data, automatically downloaded locally, then read performantly to find the changeset, without impact in extra item storage.

Here is the approach in action:

Avoiding feature layer queryAvoiding feature layer queryAvoiding feature layer queryAvoiding feature layer queryIn an earlier post I showed an example of deriving upsert and delete transactions by reading a target feature layer, which is a serialized approach. My target layer has over 1M features - reading it takes several minutes.  The above does better, against the same service.

The post download has a toolbox with the Pro 3.3 ArcGIS Data Interoperability ETL tool in it - see BulkUpsert2.  The tool reads a CSV file at a URL (also avoiding serialization, which is the default for the server), but also accesses portal content to trigger a file geodatabase export, wait in a loop until the export is finished, then downloads and reads the target feature layer content, not the service, and finally deleting the export item (the downloaded file geodatabase is also automatically deleted).  This is core functionality.  There are two mint green custom transformers in the tool.  An EsriOnlineTokenGetter retrieves a token needed for a later HTTP call (if you are working with an Enterprise portal you would use an EsriPortalTokenGetter).  The other custom transformer is a looping transformer you can inspect in its tab labeled FeatureServiceExportLooper.  It checks the export job status every 2 seconds until the job is complete.  Note that how long an export job takes is dependent on how big your service is and also how busy the host portal is - I have seen ArcGIS Online queue jobs as well as run them immediately.  Here is a run taking a little over a minute (just for the export) at a busy time in ArcGIS Online:

FeatureServiceExportLooperFeatureServiceExportLooperFeatureServiceExportLooperFeatureServiceExportLooperThe net result however is a significant net gain.  Here is a screen capture from the earlier blog and the serialized approach - note the session duration.

Reading the target feature layer takes longerReading the target feature layer takes longerReading the target feature layer takes longerReading the target feature layer takes longerSo that's how to avoid reading hosted feature layers with a serialized approach.  Now you have an option to time consuming changeset construction!

2 Comments