Introducing Data Pipelines in the ArcGIS API for Python 2.3.0 Release

DuncanMackey · ‎07-03-2024

We’re excited to announce a new experimental Data Pipelines module included in the 2.3.0 release of ArcGIS API for Python. This has been a frequently requested feature, and you can now use the API to integrate Data Pipelines into your own systems or orchestration frameworks. Run data pipelines and get the results in ArcGIS Notebooks, other platforms such as Azure Functions and AWS Lambda, and anywhere else you can install the ArcGIS API for Python. This allows for more advanced orchestration workflows, such as using Azure Functions to run a data pipeline when new files are added to an Azure Blob storage container.

Note that this is an experimental feature, and function calls and behavior may change in future ArcGIS API for Python releases. If you have feedback on the API, please post in our community forum!

To give a quick tour of this new capability I will walk through how to call an existing data pipeline from ArcGIS Notebooks and view the results.

You will need either ArcGIS Notebooks or access to the ArcGIS API for Python version 2.3.0 in your environment. If you are just getting started with Data Pipelines, visit the documentation. If you're interested in learning more about the ArcGIS API for Python, you can do so here.

Let’s get to it!

Step 1: Find or create a data pipeline to run.

Find or create the data pipeline you would like to run using Python. Note the title of the data pipeline item. If you do not have an existing data pipeline, follow the steps here to create your first data pipeline.

In this example I will be running a data pipeline I titled "My Data Pipeline for ArcGIS API for Python".

Step 2: Create a new ArcGIS Notebook.

Next, let's create the notebook that we will use to run our data pipeline. This step can be skipped if you are using your own Python environment.

Navigate to ArcGIS Notebooks. If this is not visible in the top bar of ArcGIS Online, your account may not have the role required to access ArcGIS Notebooks.
Click New notebook > Standard to create a new standard notebook.

Now we can get started running our data pipeline from our new notebook.

Step 3: Import and call the new datapipelines module.

In a new cell, add the following code and run all cells in the notebook:

# import the datapipelines module
from arcgis import datapipelines
# search for the data pipeline item
item = gis.content.search(query="My Data Pipeline for ArcGIS API for Python")[0]
# run the data pipeline
run = datapipelines.run_data_pipeline(item)

Once this cell is run, the run variable will contain a PipelineRun object that has a number of properties and methods allowing you to inspect the status and results of the run.

Step 4: Get the results of the run

In a new cell, add the following code and run the cell:

run.result()

You will see the status of the run logged in the cell until it completes. Once complete, the result will be printed showing the outcome of the run.

Step 5: Explore the PipelineRun object.

Here are the properties and methods that you can access on the PipelineRun object:

Properties:

run.status  # contains the current status of the run
run.properties  # contains general properties of the run including start time

Methods:

run.result()  # waits for the run to finish and returns the results
run.cancel()  # cancels the run

To recap, we searched for our data pipeline item, ran the data pipeline, and showed the results of the run. This could fit into any number of broader data prep workflows, whether on ArcGIS Online or another platform that has access to the ArcGIS API for Python, and allows for more flexibility in how data pipelines are run.

Thanks for following along, and feel free to leave any questions in the comments below!