web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Copilot Studio / Error in Evaluations w...
Copilot Studio
Answered

Error in Evaluations with "Compare Meaning" Test Type.

(0) ShareShare
ReportReport
Posted on by 24

Hi everyone,

I’m encountering an issue while running Evaluations in Microsoft Copilot Studio.

When I run an evaluation using the “Compare meaning” test type, half of the test cases throws "error" with the following message in the Evaluation pane:

“Something went wrong while evaluating this test case.”

For this particular question, the evaluation details panel does not display the usual fields such as:

  • Agent Response
  • Knowledge sources cited
  • Topics

These sections are completely missing for the error test case, while other test cases in the same evaluation run normally. (Either fail due to less similar meaning or pass based on threshold.)

What I’ve checked so far:

  • The evaluation is configured with the Compare meaning test type.
  • Other test cases in the same evaluation run successfully.
  • The issue seems isolated to specific questions.

I’ve attached two screenshots:

  • The evaluation pane showing the missing Agent Response, Knowledge sources cited, and Topics sections.

Has anyone encountered this issue before, or knows what might cause a test case to fail evaluation in this way? Any suggestions on what to check or troubleshoot would be greatly appreciated.

Thanks!

 
Screenshot 2026-03-10 094219.png
I have the same question (0)
  • Verified answer
    Sayali Profile Picture
    Microsoft Employee on at
    Hello,
    The “Something went wrong while evaluating this test case” error in Copilot Studio occurs when the evaluation engine cannot generate an agent response before running the “Compare meaning” evaluation. The evaluation process first runs the agent with the test question, captures the response, and then performs semantic similarity scoring. If the agent fails to produce a response, the system cannot generate agent response details, knowledge citations, or topic data, so the test case enters an error state instead of pass/fail.

    This typically happens due to content safety filtering, runtime limits during evaluation (lower tokens, shorter timeouts), tool or connector failures, or complex orchestration paths involving multiple topics or flows. These issues may only appear during evaluation because it runs under stricter constraints than normal chat. As a result, the UI hides response-related sections since no response object exists.

    In practice, the issue can often be resolved by testing the question manually in Test chat, simplifying the expected response, temporarily disabling tools or flows, or narrowing broad knowledge queries. Microsoft has acknowledged that evaluation currently does not expose detailed runtime errors, so all such failures appear as the same generic error message.

     
     
     

     
  • SR-24020711-0 Profile Picture
    24 on at
    @Sayali Thankyou for providing the facts and potential resolutions in a very prompt way. Appreciate your help!
  • Suggested answer
    Sayali Profile Picture
    Microsoft Employee on at
    Hello  ,
    If the response was helpful, could you please share your valuable feedback?
    Your feedback is important to us. Please rate us:

    🤩 Excellent 🙂 Good 😐 Average 🙁 Needs Improvement 😠 Poor
     
  • AS-02041910-0 Profile Picture
    4 on at
    I am currently facing the same issue but I could see the agent response being generated. As you could see that I have multiple scores to test. While for a particular question the General Quality has a Pass/Fail and Text Similarity has a score value but the Compare meaning mertrics enters in to error state. I would find any resolution or explanation for this behaviour.

    Things to point of is , I could this error state for random questions for random metrics for every run.
     
    What might cause a test case to have error in evaluation in this way? Any suggestions on what to check or troubleshoot would be greatly appreciated. Thanks!
    download - 1.png

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the March Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Copilot Studio

#1
Valantis Profile Picture

Valantis 599

#2
chiaraalina Profile Picture

chiaraalina 170 Super User 2026 Season 1

#3
deepakmehta13a Profile Picture

deepakmehta13a 118

Last 30 days Overall leaderboard