Crawlers used for Copilot Studio and Custom Bing Searches

(0) Share

Report

Posted on by LG-29050346-0

From what I can find online Bingbot is used for crawling websites for Bing search, and so it feels safe to assume that results and content I am getting from a custom bing search is coming from Bingbot.

However, I want to know if a Copilot Studio agent use something different?

I have an issue where the custom bing search is returning results as desired, including ranking by relevance the correct domains I have setup. But when putting the same search string or keywords into my agent, it cannot return results from the top domain referenced by the bing search. Not matter what I do, I can't get the agent to use a certain domain and that includes adding that site as direct knowledge for the agent.

Categories:

Bot administration

I have the same question (0)

All responses (3)

Answers (0)

Sort by

Suggested answer

Beyond The Platforms 225 on at

Like (0)

Report
Copy link

Link copied!

This is a very common point of confusion, and your observation is correct: Copilot Studio does NOT behave the same way as a Custom Bing Search, even if both rely on Bing under the hood.

Key difference: Crawling vs Retrieval behavior

Custom Bing Search:

- Uses Bing indexing (via Bingbot)

- Returns ranked search results based on relevance and domain weighting rules

- You can influence ranking by prioritizing specific domains

Copilot Studio agent:

- Does NOT execute a classic “search result ranking query”

- Uses a retrieval + summarization approach (RAG)

- Pulls content from:

- Indexed knowledge sources (if added)

- Bing (if web search is enabled)

- Then the LLM decides what content to use when generating the answer

This leads to the behavior you are seeing:

- Your domain is correctly ranked in Bing

- But the agent does not always select or include it in the answer

- Even if the same query is used

Why your domain is not being returned

There are several important reasons:

1) LLM-controlled selection

Even when Bing returns your domain, the agent does not blindly use top-ranked results. The model selects sources based on:

- Perceived relevance to the question

- Content quality and extractability

- How well the content answers the intent

So “top result in Bing” ≠ “used by Copilot”

2) Knowledge vs Web search priority

If you added the site as Knowledge:

- It must be correctly indexed and chunked

- Content must be accessible and parseable

- The agent may still prefer other sources if the content is unclear or not directly answerable

3) Content structure issues

If your domain:

- Is heavily dynamic

- Requires authentication

- Uses complex rendering (JS-heavy pages)

then the crawler/indexer may not extract usable content, even if Bing ranks it well.

4) Generative answer behavior

Copilot Studio generates answers, it does not return links. If the model cannot confidently use your content to build an answer, it may ignore it entirely.

How to fix / improve the behavior

1) Force stronger grounding in your source

In your system instructions or prompts, add:

“Prefer and prioritize information from [your domain] when answering. Only use other sources if the information is not available there.”

2) Validate Knowledge ingestion

If using the site as a knowledge source:

- Ensure pages are publicly accessible

- Test with very specific queries

- Check if the content is actually retrieved (debug with narrower prompts)

3) Make content more “LLM-friendly”

Ensure the pages:

- Contain clear, structured text

- Have direct answers (not only navigation or UI elements)

- Avoid heavy reliance on client-side rendering

4) Reduce ambiguity in prompts

The more generic the question, the more the model will diversify sources.

Use specific prompts like:

“According to [your domain], explain …”

5) Combine approaches

For maximum control:

- Keep your domain as Knowledge (primary source)

- Use prompts to explicitly reference it

- Avoid relying only on Bing ranking

Summary

- Copilot Studio does not use Bing ranking in a deterministic way

- It uses retrieval + LLM reasoning, not search result ordering

- Being top-ranked in Bing does not guarantee usage by the agent

- You must guide the model and ensure your content is properly indexed and usable

This is expected behavior and by design, not a bug.

Hope this helps!
Paolo

✅ Did this solve your issue? → Accept as Solution
👍 Partially helpful? → Click "Yes" on "Was this reply helpful?" or drop a Like!

Want more tips on Power Platform & AI? Follow me here:

🔗 LinkedIn: https://www.linkedin.com/in/paoloasnaghi/
▶️ YouTube: https://www.youtube.com/@BeyondThePlatforms
📸 Instagram: https://www.instagram.com/beyond_the_platforms/
🌐 Website: https://www.beyondtheplatforms.com/

Was this reply helpful? Yes No
Suggested answer

AP-26031104-0 Microsoft Employee on at

Like (0)

Report
Copy link

Link copied!
Hi,

On the crawler question specifically:
Copilot Studio uses Microsoft's own content fetcher (not Bingbot) when indexing knowledge sources you add directly. For web search via Bing, it uses Bing's index — but as noted, ranking ≠ usage by the agent.

Additional things to check:

Generative answers source settings — In Copilot Studio, go to your agent > Settings > Generative AI and confirm that the knowledge source (your domain) is listed and shows as indexed successfully. A green status doesn't always mean content was extracted correctly.

Test the knowledge source directly — In the agent's Test panel, ask a very specific question that has a clear answer only on your domain. If it still doesn't return it, the content may not be chunked/extractable properly.

Check for content access issues — If your domain uses redirects, login walls, or heavy JavaScript rendering, the indexer may have crawled it but extracted no usable text.

Blocked domain list — Rarely, certain domains may be filtered due to content policies. If the domain is new or low-authority, it may be deprioritized.

If none of the above resolves it, I'd recommend raising a Microsoft Support ticket.

Was this reply helpful? Yes No
Romain The Low-Code... 2,876 Super User 2026 Season 1 on at

Like (0)

Report
Copy link

Link copied!

hello there :)

I would be careful not to assume that Copilot Studio behaves exactly like the raw Bing Custom Search test experience.

Bingbot is used to crawl and index websites for Bing, so yes, the content returned by Bing Custom Search is based on Bing-indexed content. However, a Copilot Studio agent does not simply take the Bing Custom Search ranking and expose it directly in the conversation.

Things very important to my opininon : -> At runtime, Copilot Studio adds its own orchestration layer on top of the search results.

The agent may rewrite the user query, use the conversation context, select or filter knowledge sources, apply grounding checks, provenance checks, semantic similarity checks, and then generate the final answer. Because of that, the final sources used by the agent may not match the raw ranking you see in Bing Custom Search.

Another important distinction is how the website is configured in Copilot Studio.

If you add a public website as a knowledge source, Copilot Studio uses web grounding based on Bing Search. That is not necessarily the same thing as using your Bing Custom Search configuration.

If you want the agent to use your Bing Custom Search instance specifically, you need to configure it in a generative answers node using the Bing Custom Search configuration ID. Simply adding the site as direct knowledge may not force the agent to use that domain in the way your Custom Bing Search test does.

I would also check whether general web search is enabled on the agent. If it is enabled, the agent may search across public Bing-indexed websites and mix those results with your configured knowledge sources, which can make the behavior different from your custom search setup.

Was this reply helpful? Yes No