Using Data Interoperability to sync data to ArcGIS Online

7227
19
05-28-2020 09:39 AM
JSchroeder
Esri Contributor
11 19 7,227

What do you do when you've found something so beautiful that you have to have a copy of it? Turn to ArcGIS Pro and the Data Interoperability extension, of course! The Data Interoperability extension (from here on, shortened to Data Interop) is a very powerful tool for ArcGIS Pro and can be used to simplify your data ETL (Extract, Transform, Load) processes across the ArcGIS platform. The true power of Data Interop is the ability to work within a ModelBuilder like environment, meaning that you do not need to write any code to work with over 400 different data formats. Chances are that if you are working with an obscure datatype, it is supported – see the 400 + supported formats here.

 

I’m not going to go into detail with obscure data formats, rather, I'm going to highlight a very common workflow that many of the organizations I work with can take advantage of – syncing data from an Enterprise geodatabase (GDB, aka ArcSDE) to hosted feature layers (ArcGIS Online or ArcGIS Enterprise). This is very advantageous in working with Survey123, because Survey123 does not support writing to traditional versioned datasets, and generally works best with hosted feature layers.

 

WHY DATA INTEROPERABILITY

A common application where we might deploy this type of workflow would be for hydrant inspections. Many organizations use Survey123 with their hydrant inspections because of the simplicity of this solution, and it supports adding related records for keeping an archive of inspection data. If your hydrants are in a versioned feature class, then Survey123 cannot work with it. We can manually export the hydrants from our GDB, but of course they will become outdated over time. This workflow will keep your versioned (or unversioned) GDB data in sync with your hosted feature layer(s) in ArcGIS Online. Note that this workflow works just as well with a file GDB, although let’s have a more serious discussion if you are using personal GDBs or shapefiles as a source for your authoritative data.

 

Another common application for using Data Interoperability is keeping ArcGIS Online Open Data in sync with your authoritative data. If your Open Data originates from non-GIS data sets, this is more of a no-brainer, and to make things easier to share with the public, ArcGIS Online supports standalone tables without any geography.

 

ALTERNATIVE TO DATA INTEROPERABILTY

I recommend that your first option to sync data to ArcGIS Online is to use Distributed Collaboration, which is supported at ArcGIS Enterprise 10.5.1 and later. Benefits are that the scheduling is automatically setup on your production Enterprise environment, and you can sync on demand. However, there are some requirements which may not be an option for some organizations. First, you will need to have access to an ArcGIS Enterprise Base Deployment (10.5.1 or later). You will also need to enable the sync capability on your Enterprise feature service. If this workflow works for you and your data, then stop reading and finally start that 3D building model of your city that you always dreamed of.

 

If this workflow does not work for you, then I suggest firing up Data Interop with ArcGIS Pro. I have attached a sample workspace (.fmw file) that is embedded within an ArcGIS Pro Package to this post and a couple of introductory videos explaining the process. Data Interop workspaces can be imported into a Toolbox as a Spatial ETL tool, and new at ArcGIS Pro 2.5, scheduled to run on the frequency and recurrence of your choice (hourly, daily, weekly, every other week, etc…).

 

HOW

The Workspace is quite simple, and I will discuss the important aspects. There are two readers, three transformers, and one writer. The first reader connects to ArcGIS Online to read the hosted feature layer that I originally published from the file GDB, and the second reader connects to the file GDB (our authoritative dataset, and which will more commonly be residing in an Enterprise GDB or ArcSDE). The transformers are an AttributeManager, a DateTimeConverter, and a ChangeDetector. The ChangeDetector is straightforward, we instruct it to detect changes among attributes, geometry, or both.

 

Data Interoperability Sync Workspace (2 Readers, 3 Transformers, 1 Writer):

Data Interoperability Sync Workspace

 

The DateTimeConverter formats the file GDB dates to be a compatible format because the original dates in the file GDB are being read without timestamps (“20080903” aka September 3, 2008), and the ArcGIS Online reader is formatting the data with timestamps (“20080903000000” aka September 3, 2008 @ 12 AM). The ChangeDetector treats these dates as different, so we use the DateTimeConverter to get the format correct (%Y%m%d%H%M%S shown as: 20080805000000 vs 20080805). Your dates and times may not need formatting, but if so, the DateTimeConverter is very useful – you can also add or subtract hours with the DateTimeCalculator if your times are in different zones (ex: UTC vs Central Time Zone).

 

The AttributeManager transformer is being used to change the ArcGIS Online GlobalID to UPPER case as it is being read in as lower case (“{@UpperCase(@Value(GlobalID))}”). Finally, we have an ArcGIS Online Writer which takes the output from the Updated, Inserted, and Deleted ports and uses this information to Add, Delete, or Update the corresponding features in ArcGIS Online. This Writer needs the GlobalID from the file GDB in order to process Updates and Deletes. There is a flag called "fme_db_operation", which the ChangeDetector sets to instruct the Online writer which operation to perform.

 

This workflow becomes more beneficial as we increase the size of our dataset. However, if you are writing a smaller dataset and don’t want to spend the processing power determining which features have been updated, inserted, or deleted, there is a much simpler option which is to overwrite all features. This depends on the size and/or number of features in your dataset, and the time it takes to re-write these features to ArcGIS Online. If you have a large dataset containing many thousands of features, or polylines and polygons which contain many vertices, then you might not want to spend the time it takes to delete and recreate all features. By only writing changes, the process will be much quicker and lighter on Internet bandwidth. The less data we are sending over the Internet, the more reliable our ETL process will ultimately be. If overwriting all features works for your workflow, you can get rid of the ArcGIS Online Reader (no need to compare as we will be overwriting all data), the three transformers, and change the settings on the writer to “INSERT” and “Truncate First: Yes”. These settings will delete all features in the hosted feature layer and perform an append from all features in your source dataset, effectively recreating the dataset. This is considered a heavy online process in that it copies all data, not just the changed data.

 

For those of you who remember any detail from the 2nd paragraph may recall that I discussed syncing data from an Enterprise GDB, not a file GDB. What gives? Well, there is not much of a hurdle changing the input from a file GDB to an Enterprise GDB and the underlying process is the same. The reason I chose the file GDB is to share the data with you. To convert the Data Interoperability workspace to an Enterprise GDB, all you have to do is add the “Esri Geodatabase (ArcSDE Geodb)” reader and configure it with your database connection file (the .sde file you use in ArcMap or Pro to connect to and edit your data).

 

Below you will find an ArcGIS Pro Project Package (.ppkx) that has been created for ArcGIS Pro 2.7. When you open the Data Interop Workspace (GDBToArcGISOnline), the file GDB location will still be accurate because it is using the relative path from a User Parameter, however, you will have to replace my ArcGIS Online connections (both the Reader and Writer) with ones you have access to publish to. You will also have to publish the Hydrants feature layer that shipped with the Pro Package and loads with the map. Keep the schema the same and add a new ArcGIS Online reader to point to the new Hydrants dataset you just published.

 

That’s it, happy ETL'ing! Other basics if you have not used the Data Interoperability Extension before:

  • Data Interoperability licenses do not come with ArcGIS Pro and need to be purchased separately. Your organization may or may not already be licensed for this extension.
  • Data Interoperability is a separate install, so it needs to be installed on top of ArcGIS Pro. You can assign the licenses from ArcGIS Online or ArcGIS Enterprise. Download the extension installation package from My Esri.
  • You can incorporate a Spatial ETL tool with ModelBuilder. You can have geoprocessing tools pass data over to a Data Interoperability workspace, and vice versa in a ModelBuilder model.
  • There is a free Training Seminar located here, which will highlight the many different ways you can use this extension.
 
Data Interoperability Licensing and ArcGIS Pro Location Basics:

 

 

Diving into the Workspace settings and automating the tool:

 

 

## Updated the ArcGIS Pro Project Package to Pro 2.7 and changed the Spatial ETL tool to relative path so you do not have to re-point the path.

19 Comments
DavidRunneals2
Occasional Contributor

After switching from using FME (ArcGIS Data Interop) to using distributed collaborations to push data to ArcGIS Online, I would recommend going with FME/Data Interop every time and skip using collaborations. In fact we are going to be starting the process of reverting back to using FME/Data Interop from distributed collaborations because collaborations have some major issues that development lacked to put in the documentation, but I found out the hard way after going back and forth with support/development for 3 months which is that they don't support data that is truncated and reloaded (which is pretty much every data warehouse out there). We have data that our FME workspace has been writing at once a week (writes to SDE data warehouse), but the last time the service on AGOL was updated through the collaboration was in FEBRUARY!

Natalie_Campos
New Contributor III

@JSchroeder

I tried to download the package and use the SpatialETL tool provided. But the tool path was set to your machine.  I don't want to post it.  If you could help me get access to the workbench that would be great.

JosephCarl2
New Contributor III

@JSchroeder   - Great post and very informative, thank you.  You mention at the end of the second video the only caveat being you have to be logged into the server, if we change that in task scheduler to 'Run whether the user is logged on or not' can we get around that caveat?

 

JSchroeder
Esri Contributor

@JosephCarl2 - Yes, you can use Task Scheduler to run when not logged in, although this is challenged by the way ArcGIS Pro is licensed and accessed on a Windows PC. It has been some time since I attempted to do this, and I recall having to use a service account which has access to ArcGIS Pro licensing. The user account that you normally log into the PC with does not have permissions to run Pro when not logged into the Windows domain. A Windows service account with proper permissions would be used in place of a normal user account where the Pro application resides. Here is a bit of information which talks about some of the issues, making sure that "Sign me in automatically' is checked in ArcGIS Pro: Authorize Pro outside the application

I hope this information works for your task.

 

BruceHarold
Esri Regular Contributor

@JosephCarl2   Another option is to use Task Scheduler to execute fme.exe with the fmw file (and other parameters) as arguments.  The command syntax is exposed at the top of every workspace log file when run interactively.  If parameters don't change then don't publish them and they can be omitted.

AJ_devaccount
Occasional Contributor

Hi @JSchroeder, just to clarify, would Data Interop (or FME, we haven't explored which one we would prefer) allow bi-directional sync between AGOL and Enterprise Geodatabase? So whenever data in the EGDB is updated, the AGOL copy would be updated. And whenever the AGOL copy is updated, the EGDB data will also be updated. Ideally we would like to have this sync automated and scheduled. The AGOL feature layers would be used in Field Maps and Survey123 btw. 

BruceHarold
Esri Regular Contributor

@AJ_devaccount Yes you can do this, scheduling from a Pro client is simplest but it can be executed on an Enterprise machine that has Data Interoperability installed if you need high availability.  However, are you sure the core distributed collaboration does not meet your needs?  Let's work through that before committing to the ETL approach.

AJ_devaccount
Occasional Contributor

Thanks @BruceHarold, we might end up going with a distributed collaboration. But as I understand it, it's only from 10.9 that a bi-directional sync is enabled? Also Our Enterprise deployment in on-premise, while some of our AGOL layers will be public.

A few qs about DC:

1) Does that mean we can have a scheduled sync so that anytime the Enterprise geodatabase is updated, Portal and AGOL copies will be updated as well?

2) With a scheduled sync, anytime the AGOL version is updated, Portal copy is also updated. Is it possible to have this Portal copy automatically update the EGDB data as well? Doesn't have to be the original EGDB data, a new version is also fine, as long as there's EGDB data that automatically updates when the AGOL copy is edited.

3) Distributed collaboration needs a Portal feature layer to be created, to sync EGDB and AGOL right? Whereas Data Interop can directly sync EGDB and AGOL?

BruceHarold
Esri Regular Contributor

I would have to check with the Enterprise team but it does ring a bell that bi-directional collaboration is a recent feature.

You can effectively 'sync' any combination of EGDB, Portal and AGOL datasets in both directions but you would have to figure out the business logic as to which system owns the desired feature state.  For example field work might need to be ingested to the EGDB from Portal or AGOL feature services but separately some features added to the EGDB might need to be pushed to the feature services.  This logic might be driven by created_date and edited_date, or key field comparisons for new features.

jill_es
Esri Contributor

Hi @AJ_devaccount - you're correct, shared editing in distributed collaboration was introduced in ArcGIS Enterprise 10.9.

It is important to note that this can take place with hosted content, as well as referenced (whether branch versioned or non-versioned with archiving enabled).  Based on what you described, it sounds like you're using referenced services. With that in mind, and to answer your questions:

1) You can indeed have a scheduled sync so that anytime the source Enterprise geodatabase is updated, the feature service will be updated in Enterprise and ArcGIS Online.

2) As soon as the feature service is updated in Online, this will be reflected in the Enterprise portal. This will then be reflected in the enterprise geodatabase - just like when an edit is made to a feature service (without distributed collaboration involved).

3) Correct, distributed collaboration needs a feature layer accessible in the Enterprise portal to sync content between Online and Enterprise (and enterprise geodatabase).

Not sure if you've seen this but this blog may be of help for more information on distributed collaboration with shared editing: https://www.esri.com/arcgis-blog/products/arcgis-enterprise/sharing-collaboration/distributed-collab....

AJ_devaccount
Occasional Contributor

Thanks very much @jill_es! Sorry for all the questions, I'm pretty unfamiliar with this, and trying to get all the details right before deciding if we should implement.

1) Our Enterprise Portal is on-premise behind a firewall. While most of our AGOL content are only shared internally, some are public. Only the internally shared AGOL will be updated by staff using Field Maps and Survey123. Would a referenced service still enable the AGOL copy to be accessible and updated on apps since Portal is behind our firewall?

2)I've read that DC using Portal referenced services won't work for AGOL Survey123. It'll have to be shared via copies between Portal and AGOL. Would Portal and AGOL copies be updated if EGDB data is updated? Also can the Portal copy be synced with the referenced service, so any changes in AGOL will be reflected back to the EGDB data?

jill_es
Esri Contributor

Hi @AJ_devaccount - going to try and answer your questions by general topic here.

Regarding distributed collaboration, there won't be an issue with ArcGIS Enterprise being behind a firewall.  Just know that when setting up the distributed collaboration, ArcGIS Online will have to be the host and ArcGIS Enterprise will be the guest.  There's a bit more on this on our About distributed collaboration product documentation.  (To get into the details a bit, because Enterprise will be making requests to Online and Online will only be responding – Online will never be making requests of Enterprise.). If you set up distributed collaboration by copies, you could have a referenced service in Enterprise and a hosted service in Online.  Any edits made in Online would then make its way to Enterprise - thus updating the source enterprise geodatabase.

ArcGIS Field Maps can be used with ArcGIS Enterprise services behind a firewall but you will have to use a VPN to connect directly to these services or work offline.  There's a great blog about working offline with Field Maps that goes into more details of how to do this exactly.

Regarding Survey123, you are correct - you can only use hosted feature layers (either in ArcGIS Enterprise or ArcGIS Online).  You can't use referenced feature services.  There's a bit more about this in the Use Survey123 with existing feature layers product documentation.

AnnaritaMacri
New Contributor II

@JSchroeder , can these same steps and transformers parameters be used for the opposite where AGOL hosts the edits and the SDE receives the changes? 

BruceHarold
Esri Regular Contributor
JTessier
Occasional Contributor II

Would be interested in a meetup with @JSchroeder to discuss how best to bridge the gap (syncing options) between referenced layers in eGDB  and hosted layers in ArcGIS Enterprise that many apps like Survey 123 work with. 

JTessier
Occasional Contributor II

And interested in a roadmap for more seemless options in the future for ArcGIS Enterprise @jill_es .

AnnaritaMacri
New Contributor II

@BruceHarold @jill_es @JSchroeder - Thanks for the support.  My group is successfully syncing deltas from AGO to SDE 🙂 

Is it more complex if we want to sync bidirectional AGO <> SDE?  Is there any documentation on a bidirectional workbench?

Thanks! 

BruceHarold
Esri Regular Contributor

Hello Annarita, since you mention a "workbench" I'll not wait for Jill to talk about replication and collaboration scenarios.  You can do bi-directional synchronization between ArcGIS Online and enterprise geodatabase with Data Interoperability, but you will need to decide which system is parent and child when the same feature is edited in both systems.  Maybe one system is where features are created and deleted but features can be edited in either, you make the rules.  If you want to prefer one system's edits then run a workspace that commits its' edits to the child first, then a second workspace to fetch edits from the child.  If you do not have a shared key field then you can only generate inserts and deletes, not updates.  I'm thinking here of using the ChangeDetector transformer to generate the deltas.

jill_es
Esri Contributor

@JTessier, I would appreciate you elaborating on what seamless options you're looking for in ArcGIS Enterprise.  Please feel free to reach out to me directly or reference/log an Idea.

In addition to the distributed collaboration options mentioned earlier in this thread, @BruceHarold outlines the Data Interoperability well.

About the Author
Jason is a Product Engineer on the Utility Solutions Pipeline Team located in Alpharetta, GA. Prior to joining Esri, Jason worked in GIS for 7 years at Southern Company Gas.