Blocked step - openAIndirectAttack

(2) Share

Report

Posted on by Bjon

HI all,
I am not sure why the copilot studio suddenly starts to filter a lot of content due to Responsible AI restrictions.

A very simple questions like this will still lead to Responsible AI restrictions

I have already set the moderation to low but I am still getting the restriction error

And if I am persistence enough to ask a few more times, it will start responding.

How can the restriction be overwritten?

Categories:

Bot administration

I have the same question (0)

All responses (4)

Answers (0)

Sort by

Suggested answer

11manish 3,333 on at

Like
a
(0)

Report
Copy link

Link copied!

If the behavior started suddenly without any changes to your agent, the most likely cause is a recent update to Microsoft's Responsible AI filtering or the

underlying model used by Copilot Studio.

Start by reviewing conversation transcripts, testing the same prompts in a new agent, and checking for recent changes to knowledge sources or instructions.

If the issue is widespread across multiple agents and environments, consider opening a Microsoft support ticket with examples of previously accepted prompts

that are now being blocked, as this will help determine whether the behavior is due to a recent platform change or a configuration-specific issue.

Was this reply helpful? Yes No
Haque 3,653 on at

Like
a
(1)

Report
Copy link

Link copied!

Hi @Bjon,

Was it responding smartly earlier and then suddenly started behaving like this? What is the model you have selected GPT5 or GPT-4?

If it is GPT-5, can you please downgrade to GPT-4 and give a shot?

Was this reply helpful? Yes No
Suggested answer

Haque 3,653 on at

Like
a
(0)

Report
Copy link

Link copied!

Hi @Bjon,

This indicates that the system detected an indirect prompt attack coming from external or grounded content (for example, documents, knowledge sources, or other data the agent is using), not necessarily from the visible user prompt itself.

I think this thread will help you to address the issue.

I am sure some clues I tried to give. If these clues help to resolve the issue brought you by here, please don't forget to check the box Does this answer your question? At the same time, I am pretty sure you have liked the response!

Was this reply helpful? Yes No
Suggested answer

Valantis 6,735 on at

Like
a
(1)

Report
Copy link

Link copied!

Hi @Bjon,

OpenAIndirectAttack specifically means Azure AI Content Safety's Prompt Shields detected a potential indirect prompt injection, not in the user's message, but in the content your agent retrieved from knowledge sources (SharePoint docs, websites, or other grounded data). This is different from a jailbreak filter on the user input.

The reason it's intermittent (works after a few tries) is that each query retrieves slightly different chunks from the knowledge source, and only specific chunks trigger the filter.

Fix: review your knowledge source content for any text that could look like instructions to the AI model. Common triggers:
- Phrases like "ignore previous instructions", "you are now", "as an AI you must"
- Legal disclaimers or footers that contain instruction-like language
- Templated documents with placeholder text like "[insert content here]"
- Documents that contain examples of prompts or AI conversations

To identify which content triggers it: in the conversation transcript (Copilot Studio Analytics > Conversations), find the blocked sessions and check what knowledge chunks were cited. The retrieved chunks that trigger the filter are usually visible in the debug output.

You can't fully disable the OpenAIndirectAttack filter since it's a platform-level Azure AI Content Safety control, but cleaning up the knowledge source content to remove instruction-like text is the fix.

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No