Skip to main content

Notifications

Community site session details

Community site session details

Session Id :
Copilot Studio - General
Answered

Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

(0) ShareShare
ReportReport
Posted on by 54

How does the bot response for generative answers people who are obviously trying to abuse the bot? I know with Azure OpenAI it provides a content moderation response with some true/false flags on what flags got tripped, do we have access to any of that in the chatbots using generative answers?

 

I checked the chat transcripts and do not see any of the content moderation flags in the JSON, just wondering if there is a way for us to report or alert on that at all (or even what the bot's behavior is, I'm afraid to test it in case it auto-bans me or something...). 

 

I don't see any mention of how moderation is handled other than low/medium/high settings in the documentation. 

  • cpayton Profile Picture
    54 on at
    Re: Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

    Thanks! I didn't realize I could connect Application Insights to it, I will check that out.

     

    I ended up needing to put something together before I got your response, I'll describe it here in case it helps anyone. What I did was connect to the transcripts in Power BI, split the user messages to new lines on the space character, lowercased text, then did a join on "only matching rows" with a "bad words" Dataverse table. If you keep the conversation transcript ID there, you can link it back to the conversations to "flag" them in reporting. I'm not sure how the refresh performance will be on that, because the word split seems like it would be a heavy lift, will try incremental refresh or something and see how it goes... 

  • Verified answer
    remidyon Profile Picture
    on at
    Re: Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

    Hi @cpayton 

    Copilot Studio (new PVA) includes a lot of internal safeguards to block users from abusing the bot. Those safeguards are on top of the low/medium/high setting that is more directed for accuracy/creativity of answers, and you cannot access them (they are system prompts).

     

    If you connect your Copilot to Application Insights you will see when a message is getting moderated and the answer is filtered

    Capture telemetry with Application Insights - Microsoft Copilot Studio | Microsoft Learn

     

    remidyon_0-1702493269684.png

     

    If you start asking question regarding illegal activities / dangerous topic then the bot will simply ignore the questions, but it will show in the app insight / transcript for you to analyze:

    remidyon_1-1702493475828.png

     

    Hope that answered your question -

     

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

🌸 Community Spring Festival 2025 Challenge Winners! 🌸

Congratulations to all our community participants!

Warren Belz – Community Spotlight

We are honored to recognize Warren Belz as our May 2025 Community…

Congratulations to the April Top 10 Community Stars!

Thanks for all your good work in the Community!

Leaderboard > Copilot Studio - General

#1
Romain The Low-Code Bearded Bear Profile Picture

Romain The Low-Code... 25

#1
Pablo Roldan Profile Picture

Pablo Roldan 25

#3
stampcoin Profile Picture

stampcoin 10

Overall leaderboard