Data Pipelines Blog

DuncanMackey · ‎09-17-2024

Learn how to use the ArcGIS API for Python to schedule custom tasks not possible in the Data Pipelines app.

DuncanMackey · ‎07-03-2024

We’re excited to announce a new experimental Data Pipelines module included in the 2.3.0 release of ArcGIS API for Python. This has been a frequently requested feature, and you can now use the API to integrate Data Pipelines into your own systems or orchestration frameworks. Run data pipelines and get the results in ArcGIS Notebooks, other platforms such as Azure Functions and AWS Lambda, and anywhere else you can install the ArcGIS API for Python. This allows for more advanced orchestration workflows, such as using Azure Functions to run a data pipeline when new files are added to an Azure Blob storage container.

Note that this is an experimental feature, and function calls and behavior may change in future ArcGIS API for Python releases. If you have feedback on the API, please post in our community forum!

To give a quick tour of this new capability I will walk through how to call an existing data pipeline from ArcGIS Notebooks and view the results.

You will need either ArcGIS Notebooks or access to the ArcGIS API for Python version 2.3.0 in your environment. If you are just getting started with Data Pipelines, visit the documentation. If you're interested in learning more about the ArcGIS API for Python, you can do so here.

Let’s get to it!

Step 1: Find or create a data pipeline to run.

Find or create the data pipeline you would like to run using Python. Note the title of the data pipeline item. If you do not have an existing data pipeline, follow the steps here to create your first data pipeline.

In this example I will be running a data pipeline I titled "My Data Pipeline for ArcGIS API for Python".

Step 2: Create a new ArcGIS Notebook.

Next, let's create the notebook that we will use to run our data pipeline. This step can be skipped if you are using your own Python environment.

Navigate to ArcGIS Notebooks. If this is not visible in the top bar of ArcGIS Online, your account may not have the role required to access ArcGIS Notebooks.
Click New notebook > Standard to create a new standard notebook.

Now we can get started running our data pipeline from our new notebook.

Step 3: Import and call the new datapipelines module.

In a new cell, add the following code and run all cells in the notebook:

# import the datapipelines module
from arcgis import datapipelines
# search for the data pipeline item
item = gis.content.search(query="My Data Pipeline for ArcGIS API for Python")[0]
# run the data pipeline
run = datapipelines.run_data_pipeline(item)

Once this cell is run, the run variable will contain a PipelineRun object that has a number of properties and methods allowing you to inspect the status and results of the run.

Step 4: Get the results of the run

In a new cell, add the following code and run the cell:

run.result()

You will see the status of the run logged in the cell until it completes. Once complete, the result will be printed showing the outcome of the run.

Step 5: Explore the PipelineRun object.

Here are the properties and methods that you can access on the PipelineRun object:

Properties:

run.status  # contains the current status of the run
run.properties  # contains general properties of the run including start time

Methods:

run.result()  # waits for the run to finish and returns the results
run.cancel()  # cancels the run

To recap, we searched for our data pipeline item, ran the data pipeline, and showed the results of the run. This could fit into any number of broader data prep workflows, whether on ArcGIS Online or another platform that has access to the ArcGIS API for Python, and allows for more flexibility in how data pipelines are run.

Thanks for following along, and feel free to leave any questions in the comments below!

Sarah_Hanson · ‎05-20-2024

This blog provides answers to questions asked during the May 2024 webinar on ArcGIS Data Pipelines.

BethanyScott · ‎02-29-2024

This blog outlines the breaking changes introduced in the February 2024 update of ArcGIS Data Pipelines.

BethanyScott · ‎02-29-2024

ArcGIS Data Pipelines is no longer in beta and is now available for general use.

BethanyScott · ‎10-26-2023

This blog outlines the breaking changes introduced in the October 2023 update of Data Pipelines (beta).

DuncanMackey · ‎06-15-2023

Learn more about Data Pipelines and what it can do by following along with this workflow. We'll explore transforming, previewing, spatially joining two datasets, and more!

BethanyScott · ‎06-15-2023

Read the introductory Data Pipelines blog and watch a quick video to get started.

BruceHarold · ‎06-07-2023

Security, security, security...

Data Pipelines Blog

Other Boards in This Place

Data Pipelines Questions

Data Pipelines Documents

Data Pipelines Blog

Data Pipelines Events

Data Pipelines Ideas

Latest Activity

Scheduling custom Data Pipelines tasks with the ArcGIS API for Python

Introducing Data Pipelines in the ArcGIS API for Python 2.3.0 Release

Introduction to ArcGIS Data Pipelines Webinar: Q&A

Breaking changes in the February 2024 update of ArcGIS Data Pipelines

ArcGIS Data Pipelines is Now Available for General Use (good-bye, beta!)

Breaking changes in the October 2023 update of Data Pipelines (beta)

Workflow: Get started in Data Pipelines

ArcGIS Blog: Introducing Data Pipelines in ArcGIS Online (beta release)

Why does my public URL error in Data Pipelines when it works in my browser?

Introducing Data Pipelines in the ArcGIS API for Python 2.3.0 Release

Introduction to ArcGIS Data Pipelines Webinar: Q&A

Breaking changes in the February 2024 update of ArcGIS Data Pipelines