web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Copilot Studio / Copilot Studio not ret...
Copilot Studio
Suggested Answer

Copilot Studio not retrieving answers correctly from knowledge base

(0) ShareShare
ReportReport
Posted on by
Dear team,
 
We are creating a Q&A Agent for our project internal purpose.
 
As part of it we are creating a knowledge base consisting of a number of files, a few of them are .pdf files.
 
When we asked a series of questions to the bot (both in Copilot Studio and also to the deployed Agent in Teams), the Copilot Agent did not respond accurately, although the answer is present somewhere in the middle of the file or in the end. Many times it is giving "AI generated" responses which are too generic that a person will not be able to attend effectively.
 
Please help us resolve this issue. Please contact us for further details, we may need a debug/issue resolution call from you if required.
I have the same question (0)
  • David_MA Profile Picture
    14,624 Super User 2026 Season 1 on at
    This is a rather complicated question to answer since we do not know the configuration of your agent or the quality of the knowledge sources you are using. However, the quality and configuration play very important roles in getting responses you are looking for. In my experience, the following greatly helps:
    • Do not use pdf files if possible. PDF files contain hidden coding and are not structured efficiently for the agent to get data from them. If you must use PDF files, be sure that OCR has been performed on them and they are not image-based pdf files.
    • Turn off general knowledge in the settings. Otherwise, the agent will use any files in Microsoft 365 that the user has access to for answering questions. 
    • If the knowledge is uploaded directly to the agent, use folders to group knowledge by categories and be sure to provide a good description to the agent so it knows when to use the files in the folder to answer a question.
    • Plain text, Markdown and Word documents are better knowledge sources than pdf files. Add some metadata information at the beginning of each knowledge source to help the agent quickly identify what the source contains. For example:
      • Title: Safety Training Procedures
      • Category: HR / Training
      • Purpose: Provides step-by-step instructions for completing mandatory safety training.
      • Updated: Feb 2026
    Note: in these forums you will not get a support call from Microsoft to help you resolve your issues or to do debugging. For the most part, questions are answered by end users just like you and not by Microsoft. If you want a response from Microsoft, you should submit a ticket to Microsoft through your M365 admin portal.
  • Suggested answer
    RichAI Profile Picture
    26 on at
    Hi @RS-24031318-0,
     
    You may be running into limitations with how large PDF files are processed inside a Copilot Studio knowledge base. When Copilot Studio ingests large PDFs as a single file, the retrieval quality often drops because the model cannot effectively surface answers that sit deep within long documents. This leads to generic, AI‑generated responses even when the correct answer exists in the file. To improve accuracy, you will need to chunk your PDFs into smaller, meaningful sections before adding them to the knowledge base. This gives the retrieval layer more granular units of information to match against user questions. For more advanced and reliable document retrieval, you may also consider using Azure AI Search, which supports hybrid search, semantic ranking, and vector search and integrates well with Copilot Studio for enterprise‑ready performance.
     
    Please take a look at this detailed guide, where I talk about the chunking strategy, implementation, and a full GitHub sample which you can use:
     
     
    Let me know if this approach helps or if you’d like support implementing it.
    Hope this helps!
  • David_MA Profile Picture
    14,624 Super User 2026 Season 1 on at
    What @RichAI noted about chunking is very true. In test agents I built to learn Copilot Studio, I used the content from public domain books and a book I am writing as knowledge sources. I first attempted to use one large pdf file of the book. This resulted in very poor results like you are describing.
    • Then I created one pdf per chapter. Better results, but not what I was looking for.
    • After learning about some limitations of using PDF as a source, I exported the book to Word documents (one *.docx per chapter). This provided much better results, but there were still some issues.
    • I added metadata at the start of each chapter file. This improved the results even more to the point where I wanted to see if one agent could answer questions about multiple books.
    • I then created folders, one for each book, and moved the files for each book into the relevant folder. I also included instructions to the agent to tell it what book the knowledge sources in the folder could answer questions about.
    • Once all this was done, the agent was at a stage that I felt it provided sufficiently accurate results there wasn't much more I could do.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the April Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Copilot Studio

#1
Valantis Profile Picture

Valantis 633

#2
Vish WR Profile Picture

Vish WR 301

#3
Haque Profile Picture

Haque 219

Last 30 days Overall leaderboard