Notebook Task stuck on 'Executing'

976
9
07-12-2021 07:33 AM
Labels (1)
AaronKoelker
Occasional Contributor II

I have a notebook script which runs fine manually, and runs okay on a scheduled Task most of the time. But a few times now the the Task will go off at its scheduled time and then get stuck 'Executing' perpetually. This causes each subsequent scheduled execution to get 'Skipped' (see attached screenshot). This doesn't ever seem to resolve on its own. I've let it go on for a few days to see what would happen, but it never resolves. 

The only way I've found of fixing it is to delete the Task from the Notebook entirely. (I also clear and restart the kernel, but I'm not sure that's necessary to fix the issue or not). 

I'm not really sure how to troubleshoot this further. I don't get an actual error code, since it is still 'Executing', so I have no idea where or on what it's getting hung up on. Again it runs fine manually, and I've seen it run up to a few dozen times before hitting this 'Executing' snag, so the problem seems inconsistent. I've started tracking the days and times it happens, but don't see a clear pattern yet, other than it might have something to do with the weekends.

Any ideas or tips on how to go about troubleshooting this, since I can't replicate the error when running it myself? 

It may also be a beneficial enhancement to have something that either lets the user see where the script is at while a Task is running (so you could see where it is hung up, at least) or have an option to interrupt and restart the Task without having to delete and recreate it from scratch. 

-Aaron
0 Kudos
9 Replies
Tim_McGinnes
Occasional Contributor III

You could try and narrow down where the problem is occurring by writing\appending  information to a logfile while the script is running.

There is a built-in logging module in Python, or you could just write out to a text file.

0 Kudos
AaronKoelker
Occasional Contributor II

Thanks Tim, that's a good idea. I will give it a try.

-Aaron
0 Kudos
Josh-Cullen
Esri Contributor

Hi Aaron!

This is an issue we have seen with other ArcGIS Notebook users as well - you aren't alone. It does however seem specific to the notebook being run. Were you able to determine the cause of the problem?

Right around the time you asked this question, we were actively rolling out fixes and enhancements for this. A task may still get stuck, but it shouldn't get stuck for any longer than 24 hours at the moment. In the future we will be adding the ability to close/delete active notebooks that are running remotely via the Manage Containers panel in the Notebook page.

Thanks.
Josh

0 Kudos
AaronKoelker
Occasional Contributor II

Hey Josh. I haven't figured out the issue in the script quite yet, but I've implemented some logging as Tim McGinnes suggested above and am now waiting for the Executing "error" to happen again. Although it sounds like if you tweaked things to prevent it from being stuck longer than 24 hours, I may need to be more proactive in how often I check it, or have it actually archive the logs.

Whenever I do figure it out, I'll be sure to provide an update here.

Thanks!

-Aaron
0 Kudos
AaronKoelker
Occasional Contributor II

@Josh-Cullenbit of an update on this -- it seems the Executing issue can still last beyond 24 hours. Included a screenshot below though not sure how legible it will be -- but the task got stuck executing on August 31, and I paused it after noticing it was skipping on Sept 2. On September 8, I reactivated the same task and the Execution problem persisted, skipping every task after. I'll recreate the task to fix it for now -- but I think the troublesome part here is that we have no way of being notified when this issue occurs. I have notification emails setup within the script notifying me when it runs into an error, and set up some logging as Tim suggested in this thread, but unfortunately the Executing issue prevents the script from running at all once it gets stuck.

Some sort of built-in notification system could be really handy, if feasible.  Or a setting to cancel a Notebook if it runs longer than X amount of time.

Thanks for your time.

 

Task was paused on Sept 2 and reactivated on Sept 8Task was paused on Sept 2 and reactivated on Sept 8

 

-Aaron
0 Kudos
RobertAkroyd1
New Contributor II

@Josh-Cullen 

I've found the same, with a task getting stuck on executing, and subsequent runs all skipped for a long time.  Case 02899276 describes this.  It would be OK if you could actually see the log/output of the stuck run, but it's greyed out.

Today, I've added logging to log info to an AGOL table so I could see where it was getting to before getting stuck.  As it happened, it got stuck not long after, so I've only got one detailed record so far, but it might have been on a line of python that did a requests.get operation to make a call to a URL.

response = requests.get(theURL, data=data, headers=headers)

Apparently harmless, until you realise that the default timeout is None, so could just sit there forever if the other end doesn't respond.

I've changed that code to now be:

response = requests.get(theURL, data=data, headers=headers, timeout=theTimeoutSecs)

theTimeoutSecs is set to 60 for now.

I'll now wait and see if it ever gets stuck again.

We've also suggested it would be a great enhancement if scheduled notebook tasks could have a configurable timeout, like you can in Windows Scheduled Tasks, so that it would get automatically killed off if it ran for longer than X minutes/hours.  That would let subsequent runs have a chance of running.

JasonJordan00
Occasional Contributor

It's good to know there are people struggling with this issue as well. I had this logged as BUG-000140140 earlier this year but recently received a message from support saying it was unreproducible. 

So far I haven't noticed a pattern to when it begins skipping. It'll go for several weeks and not miss a beat, then suddenly for another week it skips more than it runs.

Most of the script is simple calculations within a Pandas dataframe, with a URL request and pushes to a feature layer. If I had to guess it is timing out on one of these connections. I'm going to implement the timeout as well as the logging method and see if I can contribute some findings here.

 

JasonJordan00_0-1636031310091.png

 

0 Kudos
RobertAkroyd1
New Contributor II

It's now logged as BUG-000143789 .

I even reproduced with the most simple Notebook ever which only had this code (so definitely not [my] code-related):

from arcgis.gis import GIS
gis = GIS("home")
print(gis)

 

I've also submitted enhancement request ENH-000143598 which is "Allow the ability for a task in Scheduled Notebook to be automatically ended/killed if it runs for longer than X minutes/hours" - similar to what you can configure in Windows Task Scheduler.

Josh-Cullen
Esri Contributor

Hi Robert & Jason,

Thank you for reproducing with that simple notebook! It confirms that the issue is not code related. It looks like I'm still waiting for that bug and enhancement to trickle through to my side from support, but I will get started on the investigation.

Thank you both for your help.