web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Copilot Studio / Generative answers - h...
Copilot Studio
Answered

Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

(0) ShareShare
ReportReport
Posted on by 54

How does the bot response for generative answers people who are obviously trying to abuse the bot? I know with Azure OpenAI it provides a content moderation response with some true/false flags on what flags got tripped, do we have access to any of that in the chatbots using generative answers?

 

I checked the chat transcripts and do not see any of the content moderation flags in the JSON, just wondering if there is a way for us to report or alert on that at all (or even what the bot's behavior is, I'm afraid to test it in case it auto-bans me or something...). 

 

I don't see any mention of how moderation is handled other than low/medium/high settings in the documentation. 

Categories:
I have the same question (0)
  • Verified answer
    remidyon Profile Picture
    Microsoft Employee on at

    Hi @cpayton 

    Copilot Studio (new PVA) includes a lot of internal safeguards to block users from abusing the bot. Those safeguards are on top of the low/medium/high setting that is more directed for accuracy/creativity of answers, and you cannot access them (they are system prompts).

     

    If you connect your Copilot to Application Insights you will see when a message is getting moderated and the answer is filtered

    Capture telemetry with Application Insights - Microsoft Copilot Studio | Microsoft Learn

     

    remidyon_0-1702493269684.png

     

    If you start asking question regarding illegal activities / dangerous topic then the bot will simply ignore the questions, but it will show in the app insight / transcript for you to analyze:

    remidyon_1-1702493475828.png

     

    Hope that answered your question -

     

  • cpayton Profile Picture
    54 on at

    Thanks! I didn't realize I could connect Application Insights to it, I will check that out.

     

    I ended up needing to put something together before I got your response, I'll describe it here in case it helps anyone. What I did was connect to the transcripts in Power BI, split the user messages to new lines on the space character, lowercased text, then did a join on "only matching rows" with a "bad words" Dataverse table. If you keep the conversation transcript ID there, you can link it back to the conversations to "flag" them in reporting. I'm not sure how the refresh performance will be on that, because the word split seems like it would be a heavy lift, will try incremental refresh or something and see how it goes... 

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the March Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Copilot Studio

#1
Valantis Profile Picture

Valantis 835

#2
Vish WR Profile Picture

Vish WR 294

#3
Haque Profile Picture

Haque 248

Last 30 days Overall leaderboard