Select to view content in your preferred language

Flow error/fail to run notification

235
4
2 weeks ago
ModernElectric
Frequent Contributor

Good day.

I have a series of Power Automate flows for our Survey123 platform. Many of the Survey123 reports that are submitted are time sensitive and are required in order to proceed with other task(s) within the company. 

There are times that a flow fails to run and I am not aware of it until management complains that they haven't seen a specific PDF report.

Trying to come up with a solution that I am notified (E-mail) if a flow fails and/or a nightly E-mail log/report showing the successful completion of all flows and any failed/error flows.

Any guidance would be appreciate.

0 Kudos
4 Replies
abureaux
MVP Frequent Contributor

Scopes! You want Scopes.

They are a little unassuming, but super important for a healthy flow.

How Does Power Automate Interpreter Failures

First, we need to understand how a fail works in Power Automate.

  • Let's say we have a flow with four components labelled A, B, C, and D
  • In this flow, D is our final step and an email to the client
  • We want to be notified if something breaks, so we drop in step E and set its "Run After" to "Fail"
  • Then we run our flow and step B fails

In this scenario, we will not trigger our step E. That is because Step B failed, and everything after it was actually "Skipped". That is where Scopes come in. If anything inside a Scope fails, the entire Scope fails.

How to Set Up Error Handling in Power Automate

As I've already mentioned, scopes. You want to put your entire flow into one scope, and your error handling into a second scope. The only exceptions being: 1) Initialize Variable cannot be added to a scope (super annoying), and 2) Flow Trigger (this one makes more sense).

abureaux_0-1763395297830.png

Nomenclature

This type of setup is normally referred to as "Try" and "Catch", where Try is your workflow, and Catch is your error handling. 

Terminate

When you do this type of setup, you need to guide Power Automate a little. When your error handling executes successfully (as it should), your flow will actually register as a success. This makes looking at your flows in Power Automate annoying. To solve this, always ensure the final step of your Catch is a "Terminate" (set to either Fail or Cancelled depending on your needs -- I go with Fail because I use Cancelled for other things).

abureaux_1-1763395696734.png

What Should Go Into a Catch?

It really depends on your needs. What you see above is a relatively "normal" Catch that I use. In fact, I probably have a good couple hundred flows with this general set-up.

  • Compose - This calculates and formats the difference in time between the start of my flow and this point in the flow. This gives me a "flow run duration" which can help me triage my Inbox at a glance. If you are curious, this is the formula I am using:
    dateDifference(variables('flowStart'), formatDateTime(utcNow(), 'yyyy-MM-dd HH:mm'))
  • Delay - This ensures that I don't create a backlog. Basically, when there is a failure early on in a flow, there is a chance that it resets so quickly that it ends up being the next thing to be processed. If there are a bunch of things needing to be processed, this can create a backlog. I don't always use a delay, but this flow in particular is trigged at least 600 times per day, so delays are a real concern.
  • Send email - this sends me an email letting me know that there is an issue. This is a key component to the Catch, and I go into it more below
  • Update Item - All my flows run in two parts: Part 1 grabs items from my Esri Portal and dumps them into SharePoint (aka my "Router"), and Part 2 sequentially grabs items from my Router for processing. This step re-sets the item in the Router so if can be re-processed. Basically, I try to be as hands-off as possible in my workflows. I have better things to do with my time.
  • Condition - I can end up with 4-5 of these. These just delete items from databases if my flow got that far. Basically, when something goes wrong, I get rid of or reset the additions it made, or else I'd end up with duplicate data when the flow re-runs.
  • Terminate - Another key component to the Catch. You need this.
Error Email Contents

When something goes wrong, you will want to go to that flow to assess the problem and potentially make a correction. Add this to the email (as an expression):

concat('https://emea.flow.microsoft.com/manage/environments/', workflow()['tags']['environmentName'], '/flows/', workflow()['name'], '/runs/', workflow()['run']['name'])

 Here is what it looks like in the flow:

abureaux_2-1763396461404.png

You can pretty it up and add more content if you want. But this is the absolute minimum you need.

Conditions

In case you want to reset other databases, here is how I use contions...

Step 1: Initialize Variable

Just make a boolean and set it to false (technically not required to set it to false, but it helps keep things simple for future you).

abureaux_4-1763396575389.png

Step 2: Set Variable

Do this immediately after the thing you are going to reset. In this example, I set if after my SharePoint Create Item.

abureaux_5-1763396612500.png

Step 3: Condition within the Catch

The condition ensures you don't try to reset something that doesn't need to be reset. As simple as that.

abureaux_3-1763396559000.png

 

ModernElectric
Frequent Contributor

@abureaux It is very obvious that you are way more advanced compare to me and your knowledge super-seeds me 10-fold and your flow(s) and process(s) are most likely way more complexed compared to my simple Survey123 report creation and e-mail flows.

With that said, I went very basic and elementary in my flow(s) and just created a parallel branch with the "Run After" settings of successful or failed. Yeah, no duh ah?

ModernElectric_0-1763678868565.png

One of the key factors that leads to a failed flow in Power Automate for me is when there is an ESRI outage affecting AGOL and Hosted Feature Layers (something that those above me that rely on these reports do not quite understand). If there is an unknown factor causing an interruption with AGOL and the feature layers that power the Survey123 process, I have a way of being notified instantly. 

Perhaps someone else that shares my limited knowledge of Power Automate and coding, could also find this helpful. 

Again @abureaux Always appreciate your expertise.

0 Kudos
abureaux
MVP Frequent Contributor

Depending on how you currently do error handling and recovery, you may wish to do something similar to me. 

Basically...

  1. Ditch the "When a survey response is submitted" step and replace it with a "Schedule".
  2. In your survey, add a question with a field dedicated to process automation.
    1. For me, I use S123 templates with a bunch of standard questions already added so I don't need to duplicate work.
    2. One of those questions is a calculate called "prime_process" with this calculation -- if(1=1,'FALSE','') -- which ensures that no matter when the user opens the survey (e.g., if they resubmit), this question will always result in a FALSE outcome. (Minor note, but this is a text field, not a binary field. I just happen to be using "FALSE" and "TRUE")
  3. In your flow, use the Arcgis "Get data from feature layer" connector, set Output format to "JSON", and the Where clause to something like prime_process='FALSE'
    abureaux_2-1763741365227.png
  4. Then you need two simple things (in this order):
    1. Arcgis' "Update a record in a feature layer" or the "batch" equivalent (depends on your workflow - but batch is probably best in most cases) to update all items with, for example, prime_process="TRUE". This ensures you mark the submissions as "processed" so you don't end up reprocessing them.
    2. Arcgis' "Update a record in a feature layer (batch)" (I can't think of any reason to not use batch here) within your error handling step. This one will reset all failed jobs to, for example, prime_process="FALSE". This ensures they are reprocessed on the next cycle.

I'm sure I made this sound harder than it really is. But this is (generally speaking) the best process automation solution I've found for this work. It ensures 1) nothing is missed, and 2) if there is an error, an automatic re-try occurs.

AkshayHarshe
Esri Regular Contributor

@abureaux Thanks for your post, this is quite amazing! I learned a few cool things here. It is a great candidate for a blog if you haven't already considered writing one!

Thanks,
Akshay Harshe
0 Kudos