Hi there, I'm fairly new to Notebooks and need some help debugging one that fails sporadically.
I've created a Notebook (in ArcGIS Online) that has four main steps:
I've scheduled this Notebook to run every 15 minutes (which is the greatest frequency possible).
It's been running for a month (I initially set it off on June 14th), but sometimes fails to complete successfully. Twice in the last week it's failed to run on multiple successive runs and has been disabled automatically by ArcGIS Online. I've added try/except statements around all bits of code that are doing something, e.g. in the function to export an item to a temporary FGDB as shown below:
def backup_item_as_fgdb(itemid):
item = gis.content.get(itemid)
try:
result = item.export(item.title + "_backup_" + now_dt(), "File Geodatabase", tags= "VEL occupancy backup", snippet="Backup of {} taken on {}".format(item.title,dt.datetime.now().strftime("%c")))
print("VEL occupancy data successfully exported to FGDB.")
except:
print("An error occurred exporting the data to an FGDB: " + item.title)
try:
# move exported file to relevenat folder (in my ArcGIS Online account)
result.move(agol_backup_folder)
print("Temporary FGDB backup file successfully moved to folder: " + agol_backup_folder)
except:
print("An error occurred moving the FGDB backup to the specified folder: " + agol_backup_folder)
return result.id
But when it fails, I don't get any useful messages out of the task details view for the failed task, the errors block just says:
"errors": [ "", "[ERROR] - Terminating execute notebook job 32a37804e23d4deb834897fe28694e6a as scheduled notebook execution timeout 15 minutes exceeded." ],
Rather than printing the exceptions do I need to do something else with them? Do I need to return them?
I'd like to know at which of the four main steps it's failing so I can investigate further. Is anyone able to give any advice on how I can debug this better? Something somewhere is taking more than 15 minutes to run (which it shouldn't be) but it only happens sporadically and I really want to find out where the problem is so that I can be confident that this will run successfully all the time as it's our main backup/archive process for this project.
Any help would be greatly appreciated. Full task details of the failed task included below in case it's any help.
Thanks in advance - Ian
{ "result": { "jobId": "32a37804e23d4deb834897fe28694e6a", "type": "executeNotebook", "status": "FAILED", "username": **my username was here**", "startTime": 1689190241118, "endTime": 1689191226757, "messages": [ "Input Notebook Path: /arcgis/home/.tasks/32a37804e23d4deb834897fe28694e6a/c13203e4fe554e8094d233ed5ed84db8.ipynb", "Output Notebook Path: /arcgis/home/.tasks/32a37804e23d4deb834897fe28694e6a/output.ipynb", "Start processing time: 2023-07-12 19:30:42.046459" ], "errors": [ "", "[ERROR] - Terminating execute notebook job 32a37804e23d4deb834897fe28694e6a as scheduled notebook execution timeout 15 minutes exceeded." ], "inputs": { "itemId": "c13203e4fe554e8094d233ed5ed84db8", "updatePortalItem": true, "saveInjectedParameters": false, "notebookParameters": "{}", "runId": "826766592cd04b60888579ca881d75ee", "taskId": "39f24117f0e5434bbb0c0a925829b54f" }, "results": {}, "customAttributes": { "isCancelled": false }, "jobError": null, "jobType": null, "serverId": null, "notebookId": null, "itemId": null, "openNotebookProgress": null, "notebookUrl": null } }
Hi, did you get anywhere with this? I'm facing a similar item export issue and it's driving me round the bend! There's seemingly no pattern to when it happens and am finding it impossible to debug. My notebook usually takes 2-3mins but at times it will timeout at 60mins and fail as a result
Hi @JoeBullard
Sorry to hear you're having the same/similar problem. I know what you mean about it driving you round the bend, it's so frustrating.
I did make a bit of progress in that I narrowed down the issue and worked out that it's the first step in my four step backup routine that is the problem, which is the step that exports the layer to a temporary FGDB. I have absolutely no idea why this fails sometimes, and like you I can't spot any pattern.
To try and mitigate the issue I added "try" and "except" statements around the line of code that exports the dataset to FGDB. If an exception is caught during the export, I just get it to try and export to FGDB again.
This has resolved most of the issues I was experiencing. It's rare that both the first and second attempts to export the dataset as FGDB fail, but it does sometimes happen which then causes the Notebook to be terminated.
I never managed to get to the bottom of why the export to FGDB sometimes fails, and I didn't log a support case with ESRI as it's such an intermittent problem and impossible to replicate.
Hope this helps.
Thanks for this - yes have resorted to try/except blocks but still seems to happen on occasion. I'm going to raise a ticket I think.
Hi @JoeBullard, if you get anywhere raising a support ticket with ESRI I'd be very grateful if you could update this thread as I'd really like to get to the bottom of the issue.
Did any of you solve this issue? I am getting the same.
Hi @GeorgeAtkins, I'm afraid to say I never managed to get to the bottom of this problem. I didn't raise a support ticket with Esri UK as it's sporadic so impossible to know when it will happen again. @JoeBullard - did you ever raise a support ticket with Esri?
It is still a problem for us though, I just don't know what to do about it, which is really frustrating. In our particular scenario, the Notebook is scheduled to run every 15 minutes, and so far today it's failed to run on 4 occasions. Some days it doesn't fail at all (for example on the 12th May there were no failures). In our workflow missing the occasional backup doesn't really matter, what causes problems though is when four successive backups fail in a row as that terminates the batch process running the Notebook and it has to be manually restarted. Sod's Law dictates this usually happens on the first day of an extended period of leave...
I've never spotted a pattern in terms of days/times of the failures. I have a hunch that it's related to the 'regional data hosting' setting of our AGOL organisation (you can view this in AGOL by going Organization > Status). The data for our AGOL organisation is stored in Europe, not the US of A.
It would be interesting to know if your AGOL organisations are set up to store data in Europe too @GeorgeAtkins and @JoeBullard. I think it's only a problem for AGOL organisations that are hosted in Europe as we've had other similar issues with our European hosted organisation e.g. https://community.esri.com/t5/arcgis-online-questions/services-on-https-services-eu1-arcgis-com-not/.... We have access to another AGOL organisation that is hosted in the USA that doesn't have these problems.
If anyone ever gets to the bottom of this I would love to know what the problem is and how to workaround it.
Hope this helps - Ian.