Copilot Studio not retrieving answers correctly from knowledge base

(0) Share

Report

Posted on by RS-24031318-0

Dear team,

We are creating a Q&A Agent for our project internal purpose.

As part of it we are creating a knowledge base consisting of a number of files, a few of them are .pdf files.

When we asked a series of questions to the bot (both in Copilot Studio and also to the deployed Agent in Teams), the Copilot Agent did not respond accurately, although the answer is present somewhere in the middle of the file or in the end. Many times it is giving "AI generated" responses which are too generic that a person will not be able to attend effectively.

Please help us resolve this issue. Please contact us for further details, we may need a debug/issue resolution call from you if required.

Categories:

Building Copilot Studio chatbots in Microsoft Teams

I have the same question (0)

All responses (3)

Answers (0)

Sort by

David_MA 15,127 Super User 2026 Season 1 on at

Like
a
(0)

Report
Copy link

Link copied!
This is a rather complicated question to answer since we do not know the configuration of your agent or the quality of the knowledge sources you are using. However, the quality and configuration play very important roles in getting responses you are looking for. In my experience, the following greatly helps:

Do not use pdf files if possible. PDF files contain hidden coding and are not structured efficiently for the agent to get data from them. If you must use PDF files, be sure that OCR has been performed on them and they are not image-based pdf files.

Turn off general knowledge in the settings. Otherwise, the agent will use any files in Microsoft 365 that the user has access to for answering questions.

If the knowledge is uploaded directly to the agent, use folders to group knowledge by categories and be sure to provide a good description to the agent so it knows when to use the files in the folder to answer a question.

Plain text, Markdown and Word documents are better knowledge sources than pdf files. Add some metadata information at the beginning of each knowledge source to help the agent quickly identify what the source contains. For example:

Title: Safety Training Procedures

Category: HR / Training

Purpose: Provides step-by-step instructions for completing mandatory safety training.

Updated: Feb 2026

Note: in these forums you will not get a support call from Microsoft to help you resolve your issues or to do debugging. For the most part, questions are answered by end users just like you and not by Microsoft. If you want a response from Microsoft, you should submit a ticket to Microsoft through your M365 admin portal.

Was this reply helpful? Yes No
Suggested answer

RichAI 26 on at

Like
a
(1)

Report
Copy link

Link copied!

Hi @RS-24031318-0,

You may be running into limitations with how large PDF files are processed inside a Copilot Studio knowledge base. When Copilot Studio ingests large PDFs as a single file, the retrieval quality often drops because the model cannot effectively surface answers that sit deep within long documents. This leads to generic, AI‑generated responses even when the correct answer exists in the file. To improve accuracy, you will need to chunk your PDFs into smaller, meaningful sections before adding them to the knowledge base. This gives the retrieval layer more granular units of information to match against user questions. For more advanced and reliable document retrieval, you may also consider using Azure AI Search, which supports hybrid search, semantic ranking, and vector search and integrates well with Copilot Studio for enterprise‑ready performance.

Please take a look at this detailed guide, where I talk about the chunking strategy, implementation, and a full GitHub sample which you can use:

Azure AI Search + Copilot Studio: How to Build Enterprise‑Ready Agents with Document Citations

Let me know if this approach helps or if you’d like support implementing it.

Hope this helps!

Was this reply helpful? Yes No
David_MA 15,127 Super User 2026 Season 1 on at

Like
a
(1)

Report
Copy link

Link copied!
What @RichAI noted about chunking is very true. In test agents I built to learn Copilot Studio, I used the content from public domain books and a book I am writing as knowledge sources. I first attempted to use one large pdf file of the book. This resulted in very poor results like you are describing.

Then I created one pdf per chapter. Better results, but not what I was looking for.

After learning about some limitations of using PDF as a source, I exported the book to Word documents (one *.docx per chapter). This provided much better results, but there were still some issues.

I added metadata at the start of each chapter file. This improved the results even more to the point where I wanted to see if one agent could answer questions about multiple books.

I then created folders, one for each book, and moved the files for each book into the relevant folder. I also included instructions to the agent to tell it what book the knowledge sources in the folder could answer questions about.

Once all this was done, the agent was at a stage that I felt it provided sufficiently accurate results there wasn't much more I could do.

Was this reply helpful? Yes No