web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Copilot Studio / How do I auto‑evaluate...
Copilot Studio
Answered

How do I auto‑evaluate connected agent routing in Copilot Studio using the Evaluation?

(2) ShareShare
ReportReport
Posted on by 45

I'm building a multi‑agent setup in Copilot Studio, where one agent routes or delegates tasks to other connected agents. I want to use Copilot Studio’s Evaluation feature to automatically verify whether:


     

However, documentation mostly covers single‑agent evaluation. I can’t find a clear example of how to structure evaluation datasets or test cases when multiple connected agents are involved.

For those who've done this:


  • Do you evaluate each agent independently, or evaluate the entire routing chain end‑to‑end?

  • How do you assert whether routing was correct (e.g., verifying the connected/child agent was actually invoked)?

  • Is there a recommended pattern for evaluation datasets for multi‑agent flows?

  • Any limitations or gotchas in the Evaluation tool when using connected agents?

  •  

Would appreciate any examples, tips, or best practices you’ve discovered!

  • the routing logic is triggered correctly,

  • the right connected agent is selected,

  • and the final combined response of the agent chain meets expectations.
Categories:
I have the same question (0)
  • Verified answer
    Sunil Kumar Pashikanti Profile Picture
    1,077 Moderator on at
     
    You can evaluate multi‑agent setups in Copilot Studio, but the trick is to test three layers separately:

    1. Test each agent independently
         Use the built‑in Evaluation methods (General quality, Compare meaning, Similarity, Exact match) to verify each connected/child agent returns correct content.
    2. Test routing at the parent agent. To confirm the right connected agent was invoked, use:
         Plan Validation (from the Power CAT Copilot Studio Kit) → checks whether the parent’s dynamic plan includes the expected child/connected agent (treated as a “tool”).
         This is Microsoft’s recommended way to detect handoff.
         (Optional) Download analytics transcripts to see separate parent/child sessions as proof of routing.
    3. Test end‑to‑end (multi‑turn) flows. Use Multi‑turn tests (Power CAT Kit) to validate:
         Parent → correct child agent
         Follow‑up clarifications
         Final combined response quality
    Multi‑turn tests keep one conversation context across steps.

    Test Set Structure (simple)
    A. Routing tests (parent):
         prompt | expected_agent | PlanValidation
    B. Child agent tests:
         prompt | expected_response | CompareMeaning / GeneralQuality
    C. End‑to‑end tests:
         Multi‑turn → Step 1: PlanValidation, Step 2+: Response checks.
     
    References:
     
     
    ✅ If this answer helped resolve your issue, please mark it as Accepted so it can help others with the same problem.
    👍 Feel free to Like the post if you found it useful.
  • akash.talole Profile Picture
    45 on at
    Thank you @Sunil Kumar Pashikanti for sharing detailed answer. It is really helpful. I will try these solutions.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Leaderboard > Copilot Studio

#1
Valantis Profile Picture

Valantis 450

#2
Romain The Low-Code Bearded Bear Profile Picture

Romain The Low-Code... 70 Super User 2026 Season 1

#3
chiaraalina Profile Picture

chiaraalina 67 Super User 2026 Season 1

Last 30 days Overall leaderboard