Select to view content in your preferred language

Scheduling custom Data Pipelines tasks with the ArcGIS API for Python

194
0
09-17-2024 08:43 AM
Labels (2)
DuncanMackey
Esri Contributor
1 0 194

Using ArcGIS Data Pipelines, you can create and manage your scheduled tasks directly in the app. However, not all possible schedules are supported in the user interface. For example, you could not create a schedule to run only Monday through Friday during work hours. In this post I will show you how to use the ArcGIS API for Python to create a data pipeline task with a custom schedule.

To complete this workflow, you will need either ArcGIS Notebooks or access to the ArcGIS API for Python version 2.3.0 in your local Python environment. If you are just getting started with Data Pipelines and want to learn more, visit the documentation. If you're interested in learning more about the ArcGIS API for Python, you can do so here.

Step 1: Find or create a data pipeline to schedule.

First, find the id of the data pipeline you want to create a custom schedule for. The easiest way to find the id of the data pipeline is from the URL of the item page in your content.

For example, here is a data pipeline URL with the item id in bold:

...arcgis.com/home/item.html?id=f3d67501369b4c81b6a1d7b8ddbe4f2d

You can also find the item by searching for it using the ArcGIS API for Python.

If you do not have an existing data pipeline, follow the steps here to learn how to create one.

Step 2: Define your custom schedule.

Next, define the custom schedule you want your task to run on. When creating a custom task with the ArcGIS API for Python, any valid cron expression can be used to define a schedule. Click here to learn more about cron expressions. For this example, I will be defining a schedule that runs my data pipeline hourly only on weekdays between the hours of 8:00 am and 5:00 pm Pacific Time.

Here is my final expression:

0 15-23 ? * 1-5

Let’s break this down piece by piece:

0MinuteRun on the start of each hour.
15-23HourRun each hour between 15:00 and 23:00 UTC.
?Day of month
Day of month is undefined, as I instead define day of week.
*MonthRun during every month.
1-5Day of week
Run from Monday to Friday.

 

Step 3: Create the task using Python.

Lastly, we need to create the task itself. Below is a quick example of how to use the ArcGIS API for Python to create the task using a data pipeline id and the custom cron schedule from above. I have also set the timeoutInMinutes parameter to the shortest supported duration (15 minutes), as I know my data pipeline does not take more than 5 minutes to run.

Note: Tasks that are created using Python may include scheduling options that are not available in the Data Pipelines app. As a result, the task may appear incorrectly when opened in the app, and editing them in the app is not supported.

 

from arcgis import GIS
# Log in to ArcGIS Online. This is not necessary if
# you are writing the script in ArcGIS Notebooks.
gis = GIS("https://www.arcgis.com", "username", "password")
# Define the custom schedule
schedule = "0 15-23 ? * 1-5"
# Create the task
task = gis.users.me.tasks.create(
    cron=schedule,
    item="f3d67501369b4c81b6a1d7b8ddbe4f2d",
    title="My Custom Task",
    parameters={"timeoutInMinutes": 15},
    task_type="RunDataPipeline",
)

 

Once you have created your task, you can find and explore its run history in the Data Pipelines scheduled task page. See Work with existing tasks to learn how to view past runs, pause, and resume tasks.

For more information on creating a task using ArcGIS API for Python, visit the documentation.

With that, you've seen an example of how to create a custom task using the ArcGIS API for Python, enabling you to create more complex schedules than the app supports. You can use this to better tailor your schedules to your organization's requirements.

Thanks for following along, and feel free to leave any questions or ideas for future posts in the replies!