Hello,
I assume that you're using Azure AI Studio. As you probably already know a lot is changing day-by-day, I see that MS already prepared a tool for testing:
Have you tried it? Nevertheless, you could still test the deployment by using any programming language and assessing the output. Just as an example in Python:
import unittest
import openai
# Set up your OpenAI API key
openai.api_key = "your_api_key_here"
def get_llm_response(prompt, model="gpt-4"):
"""
Get the LLM response for a given prompt.
:param prompt: The prompt to send to the LLM
:param model: The model to use (default is "gpt-4")
:return: The LLM response as a string
"""
response = openai.ChatCompletion.create(
model=model,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message["content"].strip()
class TestLLMOutput(unittest.TestCase):
"""Test cases for validating LLM output."""
def test_capital_of_france(self):
"""Test if the LLM correctly identifies the capital of France."""
prompt = "What is the capital of France?"
expected_response = "The capital of France is Paris."
response = get_llm_response(prompt)
self.assertEqual(response, expected_response, f"LLM response: {response}")
def test_addition(self):
"""Test if the LLM correctly performs addition."""
prompt = "What is 2 + 2?"
expected_response = "2 + 2 equals 4."
response = get_llm_response(prompt)
self.assertEqual(response, expected_response, f"LLM response: {response}")
def test_greeting(self):
"""Test if the LLM returns a polite greeting."""
prompt = "Say hello to the user."
expected_response = "Hello, how can I assist you today?"
response = get_llm_response(prompt)
self.assertEqual(response, expected_response, f"LLM response: {response}")
if __name__ == "__main__":
unittest.main()
You'd need to make sure that the response verification is not based on Equals - due to the nature of LLMs, the answer could be slightly different each time, so you'd fail. :-)
In case of any other questions, let me know. If the answer is correct, mark it as a solution, so that others can benefit from it.
Best regards,
Artur Stepniak