I have a scheduled flow that runs once each day. I've had it in place for many months, and it's been running well with almost no issues. However, it has recently started having very frequent failures that seem to somewhat random and only affect a few items in a loop. I have not changed anything in the flow, and the SharePoint list it's based on is relatively static. In other words, it's performance has changed but nothing I've done makes it obvious why.
The flow connects to a SharePoint list that doesn't change much and only has between about 15 and 20 records. Each list item includes a person record.
The flow does a 'Get all SharePoint items' action and then passes that to an apply to each. The apply to each has concurrency turned on for up to 50 items.
Within the apply to each loop it uses an Update SharePoint Item action to resets a status in one of the list columns. Next, it uses an if condition to check another status flag to see if the user needs to be contacted (all do by default). If so, it generates an approval request to that user and concurrently updates the SharePoint list item to show the approval has been sent. The approval has a timeout of 8 hours.
Finally, depending on the results of the approval request, it updates the SharePoint item it updates a column in the same list with the result. The whole flow then terminates with success. It has a parallel branch to time out after eight hours and then terminate.
Normally, this runs every day without fault. However, it's had an increasing rate of failures in the last two weeks. When it fails, most of the flow for most users still work as expected. However, some users never get the approval generated. I have a run after failure on the create approval action that sends me an email on failure, so I can see the users for whom it failed, but I can't see why. The number of users can be from about half of the list, but is usually just a small proportion, about two or three. I can't see a pattern to who it fails for, though it is more frequent for some users than others.
My best guess is that there is some sort of throttling happening, but I have no idea how to check for this or control it.
The images below show:
I've checked this again today because there were more failures (today and yesterday).
For yesterday's failure, on a run that already completed, I can't look in any detail because the flow times out, which means in the history shows it never went down the branch that has the important actions (the action I need to look at is shown with the red arrow - I can't open the action, even though it partially completed). I may have the layout wrong because the flow actually follows both paths - it first looks up the items, etc. But when the timeout occurs, it only captures the right path of the flow. Perhaps I need to join the parallel paths back to one terminate action?
I also can't look at the box in detail while the flow is running, see below. I'm not sure why this is, but I suspect it's because the action includes an approval, and not all approvals have been completed, so it treats this action as still running.
Thank you for the suggestion. I was aware of this and had checked for errors. Unfortunately, it does not give me sufficient information as to why any of the items failed. It only tells me which ones failed, and there is no clear pattern to those.
Unfortunately, I don't have the full error state right now because my previous flows terminate after a timeout period and seem to make the full apply to each details unavailable. But I have checked here, and it doesn't have enough information to diagnose the problem.
Hi! Thanks for the detailed explanation, you should be able to see more details about the errors by opening the failed runs in the Run History:
Best regards,
Community Support Team - Mari
If this post helps, then please consider Accept as solution to help the other members find it more quickly.
WarrenBelz
146,645
Most Valuable Professional
RandyHayes
76,287
Super User 2024 Season 1
Pstork1
65,997
Most Valuable Professional