Is there a graceful way to exit a scheduled notebook run early (i.e. without raising an exception that looks like a failure)? If we use raise SystemExit() or sys.exit(), it raises an exception which is detected as a failed notebook run. This will cause the scheduled task to be disabled. We've also tried a few other methods (like https://stackoverflow.com/a/56953105) with the same result.
What we want to do is check a feature layer for new records, then skip the rest of the notebook (the next dozen cells or so) if there are no new records. There's no point in re-processing this data if we've already ingested/transformed/exported it as that will waste credits and API calls.
We could wrap every single cell in an if statement that skips it if there's no new data, but that seems really clunky.
Solved! Go to Solution.
Here's how I solved it:
Instead of trying to break in the middle of the notebook, I put the check for updated data in a separate notebook and scheduled that notebook to run every hour. If there is updated data, this "controller" notebook uses arcgis.notebook.execute_notebook() to run the notebook that processes the data.
One thing I really like about this approach is that I can use a standard Python kernel to do the checking, then spin up a more expensive advanced kernel only if it's necessary. It also means I have fewer notebooks to schedule because I can use one notebook to call multiple other notebooks.
In version 2.0 I'm going to use some parameter passing to simplify maintenance, break up my target notebooks to be a little more granular, and add some keywork/tag searching so I can have the controller script find target feature layers automatically instead of hard-coding item IDs. There are also some cool things that seem possible with job management, but the Python API docs for working with them on AGOL are inscrutable and there are no examples to start from.
Here's how I solved it:
Instead of trying to break in the middle of the notebook, I put the check for updated data in a separate notebook and scheduled that notebook to run every hour. If there is updated data, this "controller" notebook uses arcgis.notebook.execute_notebook() to run the notebook that processes the data.
One thing I really like about this approach is that I can use a standard Python kernel to do the checking, then spin up a more expensive advanced kernel only if it's necessary. It also means I have fewer notebooks to schedule because I can use one notebook to call multiple other notebooks.
In version 2.0 I'm going to use some parameter passing to simplify maintenance, break up my target notebooks to be a little more granular, and add some keywork/tag searching so I can have the controller script find target feature layers automatically instead of hard-coding item IDs. There are also some cool things that seem possible with job management, but the Python API docs for working with them on AGOL are inscrutable and there are no examples to start from.
I like your accepted solution. I have worked with dynamic data feeds before where we broke the problem down into discovery, ingest, and processing as independent components that message each other when they have something new to pass along. It sounds like your problem is complex enough to merit a solution like that, especially if some of them can be run with fewer credits.
For simpler needs, I would simply break down my code into functions that are called. The idea is that almost everything you need to do is in a function def. At the end of your script, you have very few lines of code that call those defs, but only if new features are found.
My pseudo-code might look like this:
def get_new_feature_count():
# get count of new features here
return count
def do_stuff():
# data processing you perfom on new features
return action_status
new_count = get_new_feature_count()
if new_count:
result = do_stuff()
It would be cool to know if there's an equivalent to sys.exit() or raise that will exit a Notebook gracefully (without the side effects you mentioned).