Select to view content in your preferred language

Upsert for selective feature service updating using GeoAnalytics Engine

117
0
3 weeks ago
SBattersby
Esri Contributor
2 0 117

Keeping spatial data up to date is a common challenge when working at scale, especially when analytics results need to be continuously published back to operational systems. With ArcGIS GeoAnalytics Engine, you can write analysis outputs directly to feature services.  When updating feature services, you may not want to re-write everything with any update.  Using the upsert option, you can update existing features and add new ones in a single, efficient operation. In this post, we’ll look at how upsert simplifies incremental updates to feature services, helping you keep authoritative layers current while preserving stable feature IDs and supporting downstream apps and dashboards

In this blog post, we’ll walk through process for doing this in order to show how it works.  To demonstrate, we’ll create a new feature service, read the data from that feature service, perform an update to a few rows, then use upsert to alter just those rows in our saved feature service with the updated data.

Create a feature service

To demonstrate how upsert works, we’ll start by creating a feature service.  To keep things simple, we’ll just read a data layer from Esri’s Living Atlas of the World and write that to our account in ArcGIS Online. This will be our “new” feature service that we’ll update.

First we import the GeoAnalytics Engine library and the Pyspark SQL functions

# imports
import geoanalytics
import pyspark.sql.functions as F

Next, we’ll read in a dataset of US major cities to use as our example data. In this case, we’re just going to use a few attributes from the original Living Atlas feature service.  For this example, we’re going to remove the OBJECTID field that is automatically generated in a Feature Service and replace it with a field called “My_ID.”  This is just used to show the creation of a unique indexed OBJECTID when we write our new feature service (otherwise we end up with a second OBJECTID field when we write it in the next step of our demonstration).

# read in a dataset to use as an example for upsert
# US major cities dataset from ArcGIS Living Atlas of the World
myFS="https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/USA_Major_Cities_/FeatureServer/0"
myFSDataFrame = spark.read.format('feature-service').load(myFS)\
    .select("OBJECTID", "NAME", "STATE_ABBR", "PLACE_FIPS")\
    .withColumn("My_ID", F.col("OBJECTID"))\
    .drop("OBJECTID") # drop the original OBJECTID field since we will be using My_ID as the upsert matching field instead

myFSDataFrame.show(5)

SBattersby_0-1777592138533.png

Using the resulting DataFrame as our sample dataset, we’ll write that to our ArcGIS Online account so that we have a new feature service that we can edit. 

To do that, we’ll connect to our ArcGIS Online account:

# register a gis where we will write the data - using AGOL credentials
geoanalytics.register_gis("myGIS", "https://arcgis.com", username="my_username", password="my_password")

And then we write the DataFrame to our ArcGIS Online account.  Note that when we do this initial write, we’re setting an upsertMatchingField.  This specifies the layer field that will be used when matching features with upsert. We can use any field that has a unique index – and in this case, it’s “My_ID,” a unique identifier. 

When we create our new layer and specify the upsertMatchingField, a unique index is created on our specified field (My_ID).

# write to my gis
# specifying that "My_ID" field will be the upsert field (creates unique index)
myFSDataFrame.write.format('feature-service')\
    .option("gis", "myGIS")\
    .option("serviceName", "us_cities_upsert_example_with_unique_index")\
    .option("layerName", "layer")\
    .option("upsertMatchingField", "My_ID")\
    .save()

Update some of our data

We can pull the data back down from this layer to confirm the contents are as expected, and then to identify a few rows that we will update using upsert. Note that we’re specifying the option to connect to our ArcGIS Online account. It is important to ensure that you have registered the ArcGIS Online account where your data lives so that you can access the data for reading and writing!

# read our data from the new feature service - we will use this DataFrame to demonstrate the upsert functionality
myNewFS = "https://your_service_URL"
myNewFSDataFrame = spark.read.format('feature-service')\
    .option("gis", "myGIS")\
    .load(myNewFS)

myNewFSDataFrame.show(5)

SBattersby_1-1777592219976.png

Our new feature service has all of our original fields, plus a new OBJECTID field that was automatically created when we write the feature service. 

Now let’s see what happens when we want to update a subset of our data.  To keep things simple, we’ll just update the records with the first two IDs – in other words, we’re going to replace “Alabaster” with “New City 1” and “Albertville” with “New City 2”:

# edit two rows to update
updated_row = myFSDataFrame.filter("My_ID in (1, 2)")\
    .withColumn("NAME", F.when(F.col("My_ID") == 1, "New City Name 1").otherwise("New City Name 2"))

updated_row.show()

SBattersby_2-1777592251104.png

If we want to update our feature service on ArcGIS Online with these new records – but only update these two rows of data – we can use upsert to do that.  Let’s write this DataFrame with two updated rows back into our feature service.  When we write this data back, we set the mode to “append” and we’ll specify our upsertMatchingField as “My_ID.”  This will match the unique IDs for those records and swap out the attributes for just those rows.

# write the updated rows back to the feature service using upsert
service_url = "https://your_service_URL"
updated_row.write.mode("append").format('feature-service')\
    .option("gis", "myGIS")\
    .option("serviceURL", service_url)\
    .option("upsertMatchingField", "My_ID")\
    .option("layerName", "layer")\
    .save()

Now, we can check that the changes went through by reading our feature service back in and looking at the data:

myfs_upsert_url  = "https://your_service_URL"
myFSDataFrame_after_upsert = spark.read.format('feature-service')\
    .option("gis", "myGIS")\
    .load(myfs_upsert_url)

myFSDataFrame_after_upsert.show(5)

 

SBattersby_4-1777592335717.png

And only two rows needed to be updated!  This will typically be more efficient than overwriting the whole table. 

Conclusion

I hope that this has been a helpful demonstration for how you can selectively update feature services efficiently using GeoAnalytics Engine! 

 

Contributors