web

You’re offline. This is a read only version of the page.

Skip to main content

Power Platform Community

Cancel

Search

Announcements

Welcome to the Power Platform Communities

News and Announcements icon

Community site session details

Session Id :

Power Platform Community / Forums / Copilot Studio / How Evaluation Helped ...

Copilot Studio

Unanswered

How Evaluation Helped Me Improve My Catering Service Assistant

edit

Subscribe (0)

Share

Report

Report

Posted on by Sajeda_Sultana

189

I recently used the new Evaluation feature in Copilot Studio for my Catering Service Assistant, and it was a real eye-opener.

My initial test run scored 60%, which seemed okay at first but the detailed results told a different story.

The bot performed well on basic support questions (order tracking, cancellations)

But failed on important scenarios like:
- Allergen information
- Vegan/vegetarian options
- Same-day catering
- Corporate packages

This helped me realize that my bot was handling general queries but lacked depth in business-specific knowledge.

What I Did Next

Added missing knowledge (menus, dietary options, packages)

Improved vague responses to be more specific

Covered edge cases with clearer answers

Re-ran evaluations to validate improvements

Key Takeaway

The Evaluation feature helped me move from manual, biased testing to a more structured and data-driven approach.

Instead of guessing what works, I now have clear insights into:

What’s working

What’s missing

What to improve next

Highly recommend using Evaluation early - it really highlights gaps you might otherwise miss.

Would love to hear how others are using it!

Evaluation Setup.png

Evaluation Result.png

Categories:

I have the same question (0)

All responses Img

All responses (1)

Answers Img

Answers (0)

Sort by

Vahid Ghafarpour 817 on at

Like
a
(1)

Report
Copy link

Link copied!

One additional step you might find useful is to turn your discovered gaps (allergens, dietary options, corporate packages, etc.) into a reusable test set that you run after every change, so you can track score trends over time and quickly see if new updates accidentally break existing scenarios.
You could also tag scenarios by business impact (for example, “high-risk” for allergen questions) and prioritize improving those first, which makes your evaluation results even more actionable for stakeholders who care about customer safety and revenue-critical journeys.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Join the conversation

Helpful resources

News and Announcements

Welcome to the Power Platform Communities

Quick Links

Season of Sharing Community Challenge Launch!

Jump in, show your community spirit, and win prizes!

Kudos to our 2025 Community Spotlight Honorees

Expanding mentorship, skilling, and AI innovation

Congratulations to the May Top 10 Community Leaders!

These are the community rock stars!

Subscribe to this forum!

Stay up to date on forum activity by subscribing.

Select categories

Autonomous agents

Bot administration

Bot analytics

Bot extensibility

Building Copilot Studio chatbots in Microsoft Teams

Calling actions from Copilot Studio

Copilot Studio pre-built agents/templates

Copilot Studio skills development

General topics

Model context protocol

Publish & channel management

Topic creation & management

Leaderboard > Copilot Studio

#1

#2

#3

sannavajjala87 156 Super User 2026 Season 1

Last 30 days Overall leaderboard

Featured topics

Announcing the "Microsoft Copilot Studio ❤️ MCP" lab

Block PII/PCI in Copilot Studio agent user prompt

Product updates

Microsoft Power Platform Community release plans

© Microsoft

Manage Cookies
Privacy & cookies
Terms of use
Trademarks

Your Privacy Choices Consumer Health Privacy

Messages

Profile
Messages
My activity
Sign out