web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Copilot Studio / How Evaluation Helped ...
Copilot Studio
Unanswered

How Evaluation Helped Me Improve My Catering Service Assistant

(0) ShareShare
ReportReport
Posted on by 189

I recently used the new Evaluation feature in Copilot Studio for my Catering Service Assistant, and it was a real eye-opener.

My initial test run scored 60%, which seemed okay at first but the detailed results told a different story.


  • The bot performed well on basic support questions (order tracking, cancellations)

  • But failed on important scenarios like:

    • Allergen information

    • Vegan/vegetarian options

    • Same-day catering

    • Corporate packages

  •  

This helped me realize that my bot was handling general queries but lacked depth in business-specific knowledge.

 

What I Did Next


  • Added missing knowledge (menus, dietary options, packages)

  • Improved vague responses to be more specific

  • Covered edge cases with clearer answers

  • Re-ran evaluations to validate improvements

  •  

Key Takeaway

The Evaluation feature helped me move from manual, biased testing to a more structured and data-driven approach.

Instead of guessing what works, I now have clear insights into:


  • What’s working

  • What’s missing

  • What to improve next

  •  

Highly recommend using Evaluation early - it really highlights gaps you might otherwise miss.

Would love to hear how others are using it!

Evaluation Setup.png
Evaluation Result.png
Categories:
I have the same question (0)
  • Vahid Ghafarpour Profile Picture
    817 on at
    One additional step you might find useful is to turn your discovered gaps (allergens, dietary options, corporate packages, etc.) into a reusable test set that you run after every change, so you can track score trends over time and quickly see if new updates accidentally break existing scenarios.
    You could also tag scenarios by business impact (for example, “high-risk” for allergen questions) and prioritize improving those first, which makes your evaluation results even more actionable for stakeholders who care about customer safety and revenue-critical journeys.
     

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Season of Sharing Community Challenge Launch!

Jump in, show your community spirit, and win prizes!

Kudos to our 2025 Community Spotlight Honorees

Expanding mentorship, skilling, and AI innovation

Congratulations to the May Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Copilot Studio

#1
Valantis Profile Picture

Valantis 277

#2
11manish Profile Picture

11manish 206

#3
sannavajjala87 Profile Picture

sannavajjala87 156 Super User 2026 Season 1

Last 30 days Overall leaderboard