Select to view content in your preferred language

Storing output data to ArcGIS feature services

828
0
02-15-2023 12:35 PM

Storing output data to ArcGIS feature services


Introduction

The latest release of ArcGIS GeoAnalytics Engine (v1.1) includes new, standardized capabilities for easily writing output data to feature services, hosted either in ArcGIS Enterprise or ArcGIS Online.  This makes it easier to visualize finalized results from analysis, as well as share information products to stakeholders.  In this tech article, we’ll go in depth on how to leverage these new functions with examples for 3 output scenarios:  (1) Online layers, (2) Enterprise layers backed by the relational data store, and (3) Enterprise layers backed by the spatiotemporal data store.  Let’s get started!

Preparing data

First, we’ll prepare some data to use for these examples.  The following assumes that you are already familiar with setting up GeoAnalytics on your Spark environment, and authorizing the library in a notebook.

Below, we’re loading some open data from Los Angeles on crime incidents from 2010 to 2018.  This set of data has almost 1.7 million records.

 

df = spark.read.format("csv").load("s3://gax-demo/bdfs/LA_Crime_2010_2018/", header=True)
df.count()

 

data_load_result.PNG

 

 

 

Since this data starts in a non-spatial format, we’ll set the geometry.  This code extracts the latitude and longitude information from a column named Location, converts it into a point on the fly, and sets it as the geometry.

 

df = df.withColumn("x", F.regexp_extract("Location ", "\\((.*), (.*)\\)", 2).cast('float')).withColumn("y", F.regexp_extract("Location ", "\\((.*), (.*)\\)", 1).cast('float'))
df = df.withColumn("geometry", ST.srid(ST.point("x", "y"), 4326)).st.set_geometry_field("geometry")

 


Next we’ll say we’re interested in summarizing this data so we can visualize high level patterns in ArcGIS.  We’ll run a density analysis on this data, but first we’ll transform the data into a projected coordinate system, both for performing the analysis, and because a spatial reference will be required when saving output data to a feature layer.

 

dfwm = df.withColumn("geometry", ST.transform("geometry", 3857))
from geoanalytics.tools import CalculateDensity
density = CalculateDensity() \
            .setWeightType(weight_type="Uniform") \
            .setBins(bin_size=0.5, bin_size_unit="Miles", bin_type="Hexagon") \
            .setNeighborhood(distance=1, distance_unit="Miles") \
            .setAreaUnit(area_unit="SquareMiles") \
            .run(dfwm)
density.count()

 

density_calc_count_result.PNG

 

 

 


This summary result has 6314 rows, which is obviously a lot smaller than the input data.  We can take a quick look at it with the built-in plotting method.

 

axes = density.st.plot(figsize=(15,15), cmap_values="density", cmap="plasma")
axes.set(xlim=(-1.323e7, -1.313e7),ylim=(3.98e6, 4.08e6))

 

crime_density_LA_plot.png

 

 

 

Writing data to Online-hosted feature layers

Now let’s say you want to store this output data to a feature layer in your ArcGIS Online organization.  You can set up the connection and authenticate using register_gis. The URL parameter is optional in this case, as ArcGIS Online is the default ‘GIS’ in this function.

 

geoanalytics.register_gis("myOnlineOrg", "https://arcgis.com", username="User", password="p@ssw0rd")

 

Note:  Use this approach only for testing and development; see security recommendations below in the Best practices section.

With your connection active and authenticated, you’re ready to write data to a layer.  In the below example, we’re writing the data to a new service with the name crime_density_LosAngeles.  A layer name is not specified separately, so in this case, the layer name will be the same as the service name.  Note that the service name needs to be unique within your Online organization.

 

service_name = "crime_density_LosAngeles"
density.write.format("feature-service") \
     .option("gis", "myOnlineOrg") \
     .option("serviceName", service_name) \
     .option("tags", "crimes, density analysis") \
     .option("description", "Density analysis on LA crime data 2010-2018, half-mile bins") \
     .save()

 

After executing the above, wait to see the completion message, and then if you browse to your content in the Online home app, you will see the new layer:

crime_density_published.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

You can now add this to a web map to design symbology, mash it up with other data, and configure lightweight apps like StoryMaps for sharing your analysis with stakeholders.

Writing data to Enterprise-hosted feature layers

If you want to store your output data to ArcGIS Enterprise, the workflow is very similar.  You’ll want to register your GIS, but in this case make sure to include the URL parameter, which needs to point to the portal.   

 

geoanalytics.register_gis("myPortal", "https://example.com/portal", username="User", password="p@ssw0rd")

 

When connecting to Enterprise, you can use credentials from the built-in ArcGIS identity store, or from your Lightweight Directory Access Protocol (LDAP) server.  Please see the help documentation for more information.

After connecting to Enterprise, you can use the same syntax to store the data to a layer.  In this example, we’re giving the layer a different name from the service.

 

service_name = "crime_density_LosAngeles"
density.write.format("feature-service") \
    .option("gis", "myPortal") \
    .option("serviceName", service_name) \
    .option("tags", "crimes, density analysis") \
    .option("description", "Density analysis on LA crime data 2010-2018, half-mile bins") \
    .option("layerName", "density_analysis") \
    .save()

 

If you’re working incrementally, and you make a modification to your analysis result and want to capture that, you can use the ‘overwrite’ mode to replace a layer in an existing, destination service.  For this workflow, you’ll specify the URL of the existing service (as opposed to the service name), and then identify the layer you are overwriting:

 

service_URL = "https://base2.ga.geocloud.com/server/rest/services/Hosted/crime_density_LosAngeles/FeatureServer"
density.write.format("feature-service") \
    .option("gis", "myPortal") \
    .option("serviceUrl", service_URL) \
    .option("layerName", "density_analysis") \
    .mode("overwrite") \
    .save()

 

Note that you don’t need to re-specify tags and description when overwriting a layer.  The service’s previous tags and description will be retained.

Spatiotemporal layers

Next, let’s say you want to store some of the raw, input data in ArcGIS as well, for example to mash up with the analysis results in a web map or configurable app.  For larger datasets, if your organization is leveraging the Enterprise spatiotemporal big data store, you can specify that as the output destination for the data in the feature layer.  This can be done using the option parameter datasourceType:

 

service_name = "crimes_LosAngeles"
dfwm_subfields.write.format("feature-service") \
    .option("gis", "myPortal") \
    .option("serviceName", service_name) \
    .option("tags", "crimes") \
    .option("description", "Raw data on crimes in Los Angeles between 2010 and 2018") \
    .option("datasourceType", "spatiotemporal") \
    .save()

 

Note that you don’t need to re-specify tags and description when overwriting a layer.  The service’s previous tags and description will be retained.

crimes_LA_spatiotemporal.PNG

 

 

 

 

 

 

 

 

 

 

 

 

 

The spatiotemporal big data store can be distributed across multiple machines for both high availability and scalable visualization, because data loads can be balanced across the cluster of data nodes.  This can enable queries on large data to be returned more quickly.  It is recommended to use the spatiotemporal big data store if the data you wish to store in ArcGIS is in the millions of records or more.

Best practices

  • Output data size: When writing output data to ArcGIS, consider the type of feature data store that you have available to you. 
    • If using the standard feature data store in Online, or the relational data store in Enterprise, you may want to keep the output data size to less than a few million features in most circumstances. 
    • In all cases, allow at least several minutes for the writing process to finish.
  • Output data throughput: When writing output data to a feature service, you can also make use of the maxParallelism parameter to tune write throughput.  The default value is 12.  Setting lower values will reduce write throughput and decrease load on the destination datastore, whereas setting higher values may allow you to achieve higher throughput.
    • When writing output data to Online, consider your output data rates in conjunction with the type of feature data store you have in your subscription.  More intensive output data rates may benefit from a higher level feature data store.
    • When writing output data to Enterprise, you'll want to monitor the impact on your data store and ensure that other user ingest or consumption workflows are not being impacted. 
  • Overwriting vs truncating:  The overwrite and truncate modes are both options for replacing a layer in an existing feature service.  However, in some cases overwrite may increment the ID of the output layer (i.e. from 0 to 1, etc).  If maintaining the ID of the output layer is important, you may want to consider leveraging the truncate option for replacing output data.
  • Security:  Most of the examples above demonstrate registering the GIS with a username and password embedded in the notebook code.  It’s recommended to only use this approach for active testing and development. 
    • When saving a notebook for later use, you can include the username in the code but let the password be supplied by the interactive dialog box. 
    • In a production notebook that will need to be shared with others, leave both the username and password blank and have each notebook user utilize their own ArcGIS login.

 

Wrap up

We hope this overview has been helpful as you integrate big data analysis insights with ArcGIS.  For more information, check out the main help topic on writing to feature services, as well as the tutorial, which also covers how you can use the ArcGIS API for Python to create output feature layers. And let us know in the Ideas board in the GeoAnalytics Engine Community Site if you have suggestions for new product features.

Version history
Last update:
‎02-15-2023 12:54 PM
Updated by:
Contributors