Answered

Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

Like (0) Share

Report

Posted on 5 Dec 2023 20:54:42 by cpayton

How does the bot response for generative answers people who are obviously trying to abuse the bot? I know with Azure OpenAI it provides a content moderation response with some true/false flags on what flags got tripped, do we have access to any of that in the chatbots using generative answers?

I checked the chat transcripts and do not see any of the content moderation flags in the JSON, just wondering if there is a way for us to report or alert on that at all (or even what the bot's behavior is, I'm afraid to test it in case it auto-bans me or something...).

I don't see any mention of how moderation is handled other than low/medium/high settings in the documentation.

I have the same question (0)

All responses (2)

Answers (1)

cpayton 54 on 13 Dec 2023 at 22:37:59

Like (0)

Report

Re: Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

Thanks! I didn't realize I could connect Application Insights to it, I will check that out.

I ended up needing to put something together before I got your response, I'll describe it here in case it helps anyone. What I did was connect to the transcripts in Power BI, split the user messages to new lines on the space character, lowercased text, then did a join on "only matching rows" with a "bad words" Dataverse table. If you keep the conversation transcript ID there, you can link it back to the conversations to "flag" them in reporting. I'm not sure how the refresh performance will be on that, because the word split seems like it would be a heavy lift, will try incremental refresh or something and see how it goes...

Was this reply helpful? Yes No
Verified answer

remidyon on 13 Dec 2023 at 18:52:43

Like (0)

Report

Re: Generative answers - how does it handle "bad actors"? Is there accessible logging for attempts?

Hi @cpayton
Copilot Studio (new PVA) includes a lot of internal safeguards to block users from abusing the bot. Those safeguards are on top of the low/medium/high setting that is more directed for accuracy/creativity of answers, and you cannot access them (they are system prompts).

If you connect your Copilot to Application Insights you will see when a message is getting moderated and the answer is filtered
Capture telemetry with Application Insights - Microsoft Copilot Studio | Microsoft Learn

If you start asking question regarding illegal activities / dangerous topic then the bot will simply ignore the questions, but it will show in the app insight / transcript for you to analyze:

Hope that answered your question -

Was this reply helpful? Yes No