Power Platform Community Forum Thread Details

Hi all,

Has anyone tested the accuracy of a RAG Copilot? We're working with a few dozen documents, and manual testing is too labor-intensive. Is there a systematic way to ensure that the answers provided by Copilot match our expected answers? We've created a set of Q&A pairs for each document (considered the correct answers) and want to evaluate Copilot's performance. Any insights or methods would be greatly appreciated. Thanks!

Categories:

General topics

Hello,

I assume that you're using Azure AI Studio. As you probably already know a lot is changing day-by-day, I see that MS already prepared a tool for testing:

Have you tried it? Nevertheless, you could still test the deployment by using any programming language and assessing the output. Just as an example in Python:

import unittest
import openai

# Set up your OpenAI API key
openai.api_key = "your_api_key_here"

def get_llm_response(prompt, model="gpt-4"):
"""
Get the LLM response for a given prompt.

:param prompt: The prompt to send to the LLM
:param model: The model to use (default is "gpt-4")
:return: The LLM response as a string
"""
response = openai.ChatCompletion.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message["content"].strip()

class TestLLMOutput(unittest.TestCase):
"""Test cases for validating LLM output."""

def test_capital_of_france(self):
"""Test if the LLM correctly identifies the capital of France."""
prompt = "What is the capital of France?"
expected_response = "The capital of France is Paris."
response = get_llm_response(prompt)
self.assertEqual(response, expected_response, f"LLM response: {response}")

def test_addition(self):
"""Test if the LLM correctly performs addition."""
prompt = "What is 2 + 2?"
expected_response = "2 + 2 equals 4."
response = get_llm_response(prompt)
self.assertEqual(response, expected_response, f"LLM response: {response}")

def test_greeting(self):
"""Test if the LLM returns a polite greeting."""
prompt = "Say hello to the user."
expected_response = "Hello, how can I assist you today?"
response = get_llm_response(prompt)
self.assertEqual(response, expected_response, f"LLM response: {response}")

if __name__ == "__main__":
unittest.main()

You'd need to make sure that the response verification is not based on Equals - due to the nature of LLMs, the answer could be slightly different each time, so you'd fail. :-)

In case of any other questions, let me know. If the answer is correct, mark it as a solution, so that others can benefit from it.

Best regards,

Artur Stepniak