Skip to main content

Notifications

Copilot Studio - General
Answered

Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

Posted on by 46

How does the bot response for generative answers people who are obviously trying to abuse the bot? I know with Azure OpenAI it provides a content moderation response with some true/false flags on what flags got tripped, do we have access to any of that in the chatbots using generative answers?

 

I checked the chat transcripts and do not see any of the content moderation flags in the JSON, just wondering if there is a way for us to report or alert on that at all (or even what the bot's behavior is, I'm afraid to test it in case it auto-bans me or something...). 

 

I don't see any mention of how moderation is handled other than low/medium/high settings in the documentation. 

  • cpayton Profile Picture
    cpayton 46 on at
    Re: Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

    Thanks! I didn't realize I could connect Application Insights to it, I will check that out.

     

    I ended up needing to put something together before I got your response, I'll describe it here in case it helps anyone. What I did was connect to the transcripts in Power BI, split the user messages to new lines on the space character, lowercased text, then did a join on "only matching rows" with a "bad words" Dataverse table. If you keep the conversation transcript ID there, you can link it back to the conversations to "flag" them in reporting. I'm not sure how the refresh performance will be on that, because the word split seems like it would be a heavy lift, will try incremental refresh or something and see how it goes... 

  • Verified answer
    remidyon Profile Picture
    remidyon on at
    Re: Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

    Hi @cpayton 

    Copilot Studio (new PVA) includes a lot of internal safeguards to block users from abusing the bot. Those safeguards are on top of the low/medium/high setting that is more directed for accuracy/creativity of answers, and you cannot access them (they are system prompts).

     

    If you connect your Copilot to Application Insights you will see when a message is getting moderated and the answer is filtered

    Capture telemetry with Application Insights - Microsoft Copilot Studio | Microsoft Learn

     

    remidyon_0-1702493269684.png

     

    If you start asking question regarding illegal activities / dangerous topic then the bot will simply ignore the questions, but it will show in the app insight / transcript for you to analyze:

    remidyon_1-1702493475828.png

     

    Hope that answered your question -

     

Helpful resources

Quick Links

Exciting News for Copilot Studio Communi…

Get ready to experience a whole new level of engagement with the Copilot Studio…

Celebrating the May Super User of the…

LaurensM is an exceptional contributor to the Power Platform Community…

Check out the Copilot Studio Cookbook…

We are excited to announce our new Copilot Cookbook Gallery in the Community…

Leaderboard

#1
renatoromao Profile Picture

renatoromao 6,459

#2
Pstork1 Profile Picture

Pstork1 1,997

#3
Expiscornovus Profile Picture

Expiscornovus 1,708

Leaderboard