Copilot Studio

Inconsistant results

(2) Share

Report

Posted on by JD-13041615-0

Hello,

I have a very simple agent that is to release sales order on the current date.

This is my instructions:

Find No_ where Status = 0 and Shipment Date is equal to current date
Set DocumentNo = No_
use

Release Sales Orders

with DocumentNo

The first time I ran the agent it ran perfect, all 96 orders for the current date were release.

I reset my database so all the order were open, the next time I ran the agent it could not find any records.

The third time I ran it the again it only process 10 orders correct. I commented to the agent that I have 96 open orders and you only process 10, it then proceed to process 30 more orders. I kept commenting in the same conversation that had more orders to process. it eventually got them all released. some time, it would state it could not find any orders.

I reset my database again and have tried multiple different instructions and I cannot get the agent to be consistent.

NOTE: I did see that there was a limit of 10 line per query on unpublished agents, but I find that this is not accurate either because if I delete the agent and recreate it and run it for the first time, it works correctly. It is always the second plus queries that seem to go astray.

Is the only way to truly test the agent is to publish it?

Categories:

General topics

I have the same question (0)

All responses (3)

Answers (0)

Sort by

Suggested answer

Vahid Ghafarpour 817 on at

Like (1)

Report
Copy link

Link copied!

I believe for business-critical processing like releasing many orders, Copilot Studio is not ideal as the orchestration layer.

I suggest to move loop out of agent

Agent > query > loop > release

Was this reply helpful? Yes No
Suggested answer

Valantis 5,267 on at

Like (2)

Report
Copy link

Link copied!

Hi @JD-13041615-0,

The inconsistency has a specific cause. After the first run processes 96 orders, all those results stay in the conversation context. On the next run the model is working with a much larger context full of previous data, which confuses it when you give the same instruction again. That's why deleting and recreating the agent works -- you get a clean context every time.

The 10-record limit in the test panel is a confirmed real limitation. Microsoft docs confirm the test panel limits query results for unpublished agents. Published agents don't have the same limit.

Vahid's suggestion is the right fix. The agent should not be looping through records across multiple turns. The correct pattern:

1. Agent calls a single Power Automate flow as a tool
2. The flow queries all 96 orders and loops through them
3. The flow returns a summary back to the agent
4. Agent reports the result in one message

This removes the context problem entirely because the agent makes one tool call and gets one result back.

To answer your direct question: publishing will behave more consistently than the test panel, but the architecture problem will still cause inconsistency in production if the agent is looping across multiple turns.

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No
Suggested answer

Sayali Microsoft Employee on at

Like (1)

Report
Copy link

Link copied!

Hello @JD-13041615-0,

Your behavior is expected and not due to a simple “publish vs test” difference—it’s mainly due to how Copilot Studio agents handle data retrieval and reasoning.

In Copilot Studio, agents are non‑deterministic and optimized for summarization, not bulk processing. They often return partial results (commonly ~10 items) even when more records exist, because the model limits how much data it processes at once and may only consider the first subset of results. This is why you see inconsistent behavior—sometimes 10, sometimes more after follow-ups, and sometimes none—because each run generates a new plan and may pick a different subset or stop early. Additionally, testing in the chat panel is conversational and stateful; the agent may reuse prior context or partially completed actions unless you reset the conversation, which further impacts consistency.
Reference-Test your agent - Microsoft Copilot Studio | Microsoft Learn

So no—the only way to “properly test” is not just publishing. Publishing might improve stability slightly, but it won’t fix the core issue. The real solution is to avoid relying on the agent to process large datasets directly. Instead, you should use a structured action (like Power Automate or a connector query) that explicitly retrieves all records (with pagination if needed) and processes them deterministically, then let the agent call that action.

👉 In short: your inconsistency is by design (AI + data limits), not a bug, and publishing alone won’t solve it—you need to move the data processing into an action/flow for reliable results.

Was this reply helpful? Yes No