Answered

AI consistency

(0) Share

Report

Posted on by SM-10061723-0

hello
I am working on a flow that can iterate through files in my sharepoint (no more then 5 pages per file) and have an AI analyze the contents. The AI is given a list of criteria (about 45 different criteria), like "does this document involve this specific project" or "does this document involve this organization". It then fills out a JSON with either a 1 (true) or a 0 (false) if the files matches that criteria. it also asks the AI to provide reasoning and evidence to support its claim. the flow then Populates a spreadsheet and continues to the next file.
I have completes the PowerAutoamte flow. It is able to open files, run a custom prompt, and then populate the spreadsheet with no problem. The issue is with the AI. I am getting the two following issues:

The Ai is not consistent enough. Even when using premium GPT-5 reasoning, it still is not consistent when given a 1 or a 0 to each criteria. Each time I run the flow I will get different outputs. I have tried changing the wording of the prompt but nothing seems to change the consistency. Could there be too many criteria, and that is what is causing the confusion? Or could it be something else that I can do to help with the consistency?
The second issue is that I occasionally get a Response Content Filtered Error. I believe this is because Microsoft's filters are blocking the output of the AI after is analyzed the document? It I not consistent with which documents will get flagged for their output, as sometimes the flow runs with no error, and sometimes one of the files randomly gets flagged. Any way around this?

Any help would be greatly appreciated. Thanks!

Categories:

AI Builder

I have the same question (0)

All responses (3)

Answers (1)

Sort by

Verified answer

chiaraalina 2,348 Super User 2026 Season 1 on at

Like (1)

Report
Copy link

Link copied!

Hi @SM-10061723-0

Yes, could be that 45 criteria is too many for one prompt. Maybe breaking classification tasks into smaller batches reduces cognitive load and improves accuracy. LLMs have limited "working memory" during inference and evaluating 45 decisions + reasoning strains their ability to maintain consistency.

Try chunking:

First option to try:
Pass 1: Run 10-15 high-priority criteria
Pass 2: Run 10-15 organizational criteria
Pass 3: Run remaining criteria

Orchestrate this with Scope actions in Power Automate or use parallel branches if the criteria are independent.

Second option you could try:

Run the same prompt 3 times and take the majority vote for each criterion

Use a Apply to each to invoke the AI action 3 times
Parse the three JSON responses
For each criterion, assign 1 if at least 2 out of 3 responses agree

Third option to try:

Instead of asking the AI to both analyze and score in one step:
Pass 1: Extract relevant text snippets for each criterion
Pass 2: Score each criterion based on the extracted text

This separates retrieval from reasoning and could improving reliability

Also please verify:

Temperature is set to 0

Content moderation level is set to low

And if you haven't already: Use JSON mode in your Prompt. Define a strict JSON schema by using the Customize JSON.

Even with GPT-5 reasoning models, variability is inherent to LLM behavior.

Hope it helps!

Was this reply helpful? Yes No
Vish WR 3,648 on at

Like (1)

Report
Copy link

Link copied!

For consistency: set temperature to 0 and use a strict JSON schema — that fixes most of the drift. Splitting the work into chunks (10-15 criteria per pass) instead of all 45 at once also helps a lot.

For the content filter error: lower the moderation level to Low in the prompt settings, and wrap the AI step in a Scope with a "has failed" branch so one flagged file doesn't kill the whole run

Was this reply helpful? Yes No
David_MA 14,840 Super User 2026 Season 1 on at

Like (0)

Report
Copy link

Link copied!
Since you did not include the actual instructions being used in the AI Builder prompt, there is one key factor that cannot be evaluated, which is how your instructions have been written. As with any set of instructions, the criteria need to be clearly defined and should not be:

too broad

overlapping or conflicting

subjective

missing definitions

based on implied knowledge

asking for interpretation instead of direct detection

For example, you mentioned this as one of the criteria: “does this document involve this specific project.” I assume the actual prompt is something more like: “does this document involve Project X42?”

If that is the case, have you defined how the AI should determine whether a document involves Project X42? For instance, instead of only naming the project, you could provide context such as:

“Project X42 is a next-generation soccer performance system developed in Norway and led by Erling Haaland. It uses wearable sensors in jerseys and the player's shoes to track player speed, stamina, shot power, and positioning during matches and training.”

Without that kind of definition, the AI has to interpret what qualifies as “involvement,” which can lead to inconsistent results.

By defining what Project X42 is instead of just naming it, you remove ambiguity and give the AI something concrete to match against. That reduces interpretation and helps the model make more consistent decisions. It will not fully eliminate variation (especially with 45 criteria), but unclear or undefined criteria will almost always increase inconsistency regardless of how the flow is structured.

Was this reply helpful? Yes No