web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Copilot Studio / How do I auto‑evaluate...
Copilot Studio
Answered

How do I auto‑evaluate connected agent routing in Copilot Studio using the Evaluation?

(2) ShareShare
ReportReport
Posted on by 57

I'm building a multi‑agent setup in Copilot Studio, where one agent routes or delegates tasks to other connected agents. I want to use Copilot Studio’s Evaluation feature to automatically verify whether:


     

However, documentation mostly covers single‑agent evaluation. I can’t find a clear example of how to structure evaluation datasets or test cases when multiple connected agents are involved.

For those who've done this:


  • Do you evaluate each agent independently, or evaluate the entire routing chain end‑to‑end?

  • How do you assert whether routing was correct (e.g., verifying the connected/child agent was actually invoked)?

  • Is there a recommended pattern for evaluation datasets for multi‑agent flows?

  • Any limitations or gotchas in the Evaluation tool when using connected agents?

  •  

Would appreciate any examples, tips, or best practices you’ve discovered!

  • the routing logic is triggered correctly,

  • the right connected agent is selected,

  • and the final combined response of the agent chain meets expectations.
Categories:
I have the same question (0)
  • Verified answer
    Sunil Kumar Pashikanti Profile Picture
    2,097 Moderator on at
     
    You can evaluate multi‑agent setups in Copilot Studio, but the trick is to test three layers separately:

    1. Test each agent independently
         Use the built‑in Evaluation methods (General quality, Compare meaning, Similarity, Exact match) to verify each connected/child agent returns correct content.
    2. Test routing at the parent agent. To confirm the right connected agent was invoked, use:
         Plan Validation (from the Power CAT Copilot Studio Kit) → checks whether the parent’s dynamic plan includes the expected child/connected agent (treated as a “tool”).
         This is Microsoft’s recommended way to detect handoff.
         (Optional) Download analytics transcripts to see separate parent/child sessions as proof of routing.
    3. Test end‑to‑end (multi‑turn) flows. Use Multi‑turn tests (Power CAT Kit) to validate:
         Parent → correct child agent
         Follow‑up clarifications
         Final combined response quality
    Multi‑turn tests keep one conversation context across steps.

    Test Set Structure (simple)
    A. Routing tests (parent):
         prompt | expected_agent | PlanValidation
    B. Child agent tests:
         prompt | expected_response | CompareMeaning / GeneralQuality
    C. End‑to‑end tests:
         Multi‑turn → Step 1: PlanValidation, Step 2+: Response checks.
     
    References:
     
     
    ✅ If this answer helped resolve your issue, please mark it as Accepted so it can help others with the same problem.
    👍 Feel free to Like the post if you found it useful.
  • akash.talole Profile Picture
    57 on at
    Thank you @Sunil Kumar Pashikanti for sharing detailed answer. It is really helpful. I will try these solutions.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the April Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Copilot Studio

#1
Valantis Profile Picture

Valantis 665

#2
Vish WR Profile Picture

Vish WR 313

#3
Haque Profile Picture

Haque 227

Last 30 days Overall leaderboard