Copilot Studio public website knowledge source returning "No information was found"

(0) Share

Report

Posted on by CT-20042235-0

Hi all,

I'm building a Copilot Studio agent for our company's support site that retrieves product documentation PDFs from their public DAM. I've set up a public website knowledge source pointing to: https://www.solidigm.com/content/dam/solidigm/en/site/products/documents/

When users ask for a document through a Search and Summarize node, the agent consistently returns "No information was found."

The files I'm trying to retrieve are publicly accessible PDFs sitting directly under that path. The knowledge source status shows as "Ready."

Has anyone successfully used a public website knowledge source to retrieve PDFs from a DAM-style path like this? Any advice on configuration, crawl behavior, or troubleshooting would be appreciated.

Examples:

Categories:

AI Builder

I have the same question (0)

All responses (4)

Answers (0)

Sort by

Vish WR 1,102 on at

Like (0)

Report
Copy link

Link copied!

@CT-20042235-0

Are those PDFs placed on the website as links in the pages ?, or is it only the website content folder?

Was this reply helpful? Yes No
Suggested answer

Sunil Kumar Pashikanti 1,737 Moderator on at

Like (1)

Report
Copy link

Link copied!
Hi @CT-20042235-0,

If you have added a public website URL (like /content/dam/.../documents/) as a Knowledge Source, and it shows "Ready" but returns "No information found," you are likely hitting a Crawl Discovery Gap.

The Root Cause: Crawlers are Link-Followers, not File-Explorers
Copilot Studio’s public website crawler is designed to mimic a human browsing a site. It follows HTML links (<a> tags) to find content.

HTML Pages: Easily discoverable via navigation.

DAM/Binary Folders: These are "Asset Stores." They usually lack an HTML interface.

The Result: The crawler hits your folder URL, sees a blank response (because directory browsing is disabled on the server), and assumes there is nothing to index. It cannot "guess" the filenames of your PDFs.

How to Fix It (Proven Options)
Option 1: The "Index Page" (Fastest Low-Code Fix)
Create a simple HTML landing page (e.g., yoursite.com/support/docs) that contains direct links to every PDF you want indexed.

Why it works: When the crawler hits this page, it sees the links, follows them, and begins indexing the PDF content.

Tip: Ensure the links are standard <a href="..."> tags and not hidden behind JavaScript buttons.

Option 2: Upload Files Directly
If your document set is under 500 files and individual files are smaller than 20MB:

Go to: Knowledge > Add Knowledge > Files.

Why it works: This bypasses the crawler entirely. Copilot Studio will immediately chunk and index the full text of the PDFs.

Option 3: SharePoint Integration
If your PDFs are internal or sensitive, move them to a SharePoint Document Library.

Why it works: Copilot Studio uses the Microsoft Graph API for SharePoint, which performs a direct "file crawl" rather than a "web crawl." It is significantly more reliable for deep directory structures.

Option 4: The XML Sitemap (Advanced)
If you cannot create a public HTML page, add the direct URLs of every PDF to your site’s sitemap.xml.

Why it works: The Copilot crawler checks the sitemap to find "deep links" it might have missed during the standard crawl.

What will NOT work:
Waiting longer: If it hasn't indexed in 24 hours, it never will because it can't find the path.

Changing the Prompt: This is a data-source issue, not a language-model issue.

Adding more sub-folders: More folders only make it harder for a crawler to guess the path.

Bottom Line: A web crawler needs a map (HTML links). If you point it at a "closed" folder, it will report as "Ready" (because the URL works) but index zero documents.

✅ If this answer helped resolve your issue, please mark it as Accepted so it can help others with the same problem.
👍 Feel free to Like the post if you found it useful.

Sunil Kumar Pashikanti, Moderator
Blog: https://sunilpashikanti.com/posts/

Was this reply helpful? Yes No
CT-20042235-0 4 on at

Like (0)

Report
Copy link

Link copied!

@Vish WR,

Yes, the links to the PDFs can be found on our Document Management System page.

https://www.solidigm.com/products/document-management-system.html

Was this reply helpful? Yes No
CT-20042235-0 4 on at

Like (0)

Report
Copy link

Link copied!
@Sunil Kumar Pashikanti

Thank you for the detailed response.

I've since confirmed that the PDFs are linked from two places on our site:

The Document Management System page at: https://www.solidigm.com/products/document-management-system.html

Individual product pages across the site:

Example 1: https://www.solidigm.com/products/data-center/d7/ps1010.html

Example 2: https://www.solidigm.com/products/data-center/d7/p5810.html

I have my agent's knowledge source pointed at www.solidigm.com, but it is still struggling to find these documents.

To try to improve retrieval, I attempted to narrow the knowledge source specifically to the DMS page in a topic dedicated to document retrieval. However I ran into a couple of issues:

The knowledge source URL field doesn't appear to accept a .html file extension, so I'm unable to point it directly at https://www.solidigm.com/products/document-management-system.html. Our web team is working on setting up a redirect from the extensionless URL to the .html version. Would a redirect work for the crawler, or does it need to hit the final destination URL directly?

Looking at the page source for the DMS page, the document table appears to be powered by the DataTables library. Could this cause an issue where the crawler sees an empty table because the data is loaded dynamically via JavaScript after page load, rather than being server-rendered in the HTML?

When testing the agent with this configuration, the Search and Summarize node returns "No information was found that could help answer this", suggesting the knowledge source is not returning any content despite the knowledge source status showing as "Ready."

For reference I've attached a simplified version of the topic YAML showing the Search and Summarize node pointed at the DMS knowledge source.

Any guidance would be appreciated.

Documents Topic.txt

Was this reply helpful? Yes No