Answered

How do I auto‑evaluate connected agent routing in Copilot Studio using the Evaluation?

(2) Share

Report

Posted on by akash.talole

I'm building a multi‑agent setup in Copilot Studio, where one agent routes or delegates tasks to other connected agents. I want to use Copilot Studio’s Evaluation feature to automatically verify whether:

However, documentation mostly covers single‑agent evaluation. I can’t find a clear example of how to structure evaluation datasets or test cases when multiple connected agents are involved.

For those who've done this:

Do you evaluate each agent independently, or evaluate the entire routing chain end‑to‑end?

How do you assert whether routing was correct (e.g., verifying the connected/child agent was actually invoked)?

Is there a recommended pattern for evaluation datasets for multi‑agent flows?

Any limitations or gotchas in the Evaluation tool when using connected agents?

Would appreciate any examples, tips, or best practices you’ve discovered!

the routing logic is triggered correctly,

the right connected agent is selected,

and the final combined response of the agent chain meets expectations.

Categories:

General topics

I have the same question (0)

All responses (2)

Answers (1)

Sort by

Verified answer

Sunil Kumar Pashikanti 2,333 Moderator on at

Like
a
(1)

Report
Copy link

Link copied!

Hi @akash.talole,

You can evaluate multi‑agent setups in Copilot Studio, but the trick is to test three layers separately:

1. Test each agent independently
Use the built‑in Evaluation methods (General quality, Compare meaning, Similarity, Exact match) to verify each connected/child agent returns correct content.

2. Test routing at the parent agent. To confirm the right connected agent was invoked, use:

Plan Validation (from the Power CAT Copilot Studio Kit) → checks whether the parent’s dynamic plan includes the expected child/connected agent (treated as a “tool”).
This is Microsoft’s recommended way to detect handoff.
(Optional) Download analytics transcripts to see separate parent/child sessions as proof of routing.

3. Test end‑to‑end (multi‑turn) flows. Use Multi‑turn tests (Power CAT Kit) to validate:

Parent → correct child agent
Follow‑up clarifications
Final combined response quality

Multi‑turn tests keep one conversation context across steps.

Test Set Structure (simple)
A. Routing tests (parent):
prompt | expected_agent | PlanValidation
B. Child agent tests:
prompt | expected_response | CompareMeaning / GeneralQuality
C. End‑to‑end tests:
Multi‑turn → Step 1: PlanValidation, Step 2+: Response checks.

References:

https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-agent-evaluation-create

https://dev.to/ippu_ito_870812/agent-evaluation-in-copilot-studio-test-methods-thresholds-and-regression-checks-10do

https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/kit-configure-tests

https://learn.microsoft.com/en-us/microsoft-copilot-studio/guidance/multi-agent-patterns

https://learn.microsoft.com/en-us/microsoft-copilot-studio/analytics-transcripts-studio

✅ If this answer helped resolve your issue, please mark it as Accepted so it can help others with the same problem.
👍 Feel free to Like the post if you found it useful.

Was this reply helpful? Yes No
akash.talole 57 on at

Like
a
(1)

Report
Copy link

Link copied!

Thank you @Sunil Kumar Pashikanti for sharing detailed answer. It is really helpful. I will try these solutions.

Was this reply helpful? Yes No