Answered

Accuracy of Responses from Agents

(3) Share

Report

Posted on by David_MA

14,614 Super User 2026 Season 1

While this is not exactly a Copilot Studio question, it is related. I have been experimenting with Google's Notebook LM, which in a way is like some of the features Copilot Studio is used for. You provide Google Notebook LM with data sources, and you can then use it like a chatbot to ask questions about the content to get back answers. I recently found out that OneNote has a feature called Copilot Notebooks. It works inside OneNote where you can create a new notebook and provide it data sources, then ask Copilot questions against the data.

I am currently writing a book and have 37 chapters with over 90,000 words. I created a Google Notebook LM notebook and a OneNote Copilot notebook and provided both the 37 files of the book's content (each file contained a chapter). I then asked both tools the exact same question. For the first three questions I asked Copilot, it got the answer wrong, while Google Notebook answered every question correctly and in detail.

I then asked Copilot why Notebook LM is so much better at providing correct answers than Copilot. So you do not think I am making up the response, here is a screen shot of what Copilot said about Copilot and Notebook LM:

What has your experience been when using Copilot Studio for getting accurate responses back from the data sources you provide to it? While there may be some differences between the Copilot used in Copilot Studio versus Copilot in OneNote, if it is as bad at answering questions as I have experienced so far, I don't see how I could ever use Copilot Studio. Since I am A/B testing with a book I have written with both OneNote Copilot Notebooks and Google Notebook LM, it makes it easy to see issues. But I am not always going to be as familiar with the content when it comes to developing solutions with Copilot Studio. Therefore, I don't see how I could ever use Copilot Studio.

Categories:

Autonomous agents

Building Copilot Studio chatbots in Microsoft Teams

I have the same question (0)

All responses (2)

Answers (1)

Sort by

David_MA 14,614 Super User 2026 Season 1 on at

Like (1)

Report
Copy link

Link copied!

I should have included an example of the difference in accuracy between the two tools. So, here is one. I asked both tools this question:

What is the name of the campsite assigned to Troop 22 when they arrive at Camp Abenaki?

Copilot's Response, which is wrong: The campsite assigned to Troop 22 when they arrive at Camp Abenaki is Pine Island. This location becomes their home base for the week, nestled along the shores of Lake Winnipesaukee, where much of their camp experience unfolds.

Google Notebook LM's response, which is correct and detailed: When Troop 22 arrives at Camp Abenaki, they are assigned to the campsite named Takwobok. This site is located on the northern ridge of the camp, which Ranger Bill Knowles describes as a quiet spot with excellent tree cover. In the spirit of our book club's appreciation for historical and linguistic detail, it is fascinating to note that Takwobok is an Abenaki word meaning "night". Mr. Telford eventually uses this meaning to teach the scouts that their honor is found in the work they do between the "dawn and the dusk" while they are staying in the "night" land. After a ten-minute hike from the camp gateway, the boys settle into this "temporary community" to begin the task of unloading their gear and transforming the site into their home for the week. This setting becomes the heart of their camp experience, where they deal with everything from the "silent hum of dread" regarding the war to the simple joy of sharing peanuts with a local red squirrel.

The actual source material used by both:

You can see why when testing a new solution built with Copilot Studio, you'll need to be very familiar with the source materials you provide and do extensive testing to determine the level of inaccuracy you can tolerate.

Was this reply helpful? Yes No
Verified answer

Valantis 5,267 on at

Like (2)

Report
Copy link

Link copied!
Hi @David_MA,

So based on what Microsoft recommends in their docs, here are some things you can try in Copilot Studio to cut down on those wrong answers:

Turn off "Use general knowledge" – This makes sure your copilot only pulls answers from the sources you've configured, instead of making stuff up from its general training.

Use a Generative answers node with specific sources – Microsoft actually says this approach gives the best results because you're being explicit about where answers should come from.

If your content is in SharePoint, enable Tenant graph grounding with semantic search – Microsoft claims this works significantly better, but heads up you'll need the right tenant setup and licensing for it to work.

Create an Evaluation test set – Build a set of your A/B test questions and use it to measure accuracy, relevancy, and overall quality before you push anything live. It's a bit of upfront work, but it helps catch issues early.

Helpful resources:

Generative answers FAQ

Boosting conversational responses

Knowledge sources in Copilot Studio

Creating agent evaluations

Hope this helps tighten things up!

Was this reply helpful? Yes No