Answered

Image Input and Analysis - How does it work?

(2) Share

Report

Posted on by cmora-ss

Hello,
I am working on Microsoft Copilot Studio to make a custom copilot that serves as a guide for an application and I want to enable image analysis. I have searched for the documentation on the functionality, but it doesn't clarify how it works. For it to analyze images, does the sources of knowledge need to have similar images to associate it to the information stored within a document? Does the images on a document can help with the analysis or does it work differently? I would appreciate if someone could tell me how this works exactly as I don't precisely got much from the documentation. I am not sure if I need to train my copilot once again with documents with images so that it can better analyze the ones uploaded by the user or if the IA alone can handle the image to find what's the information that needs to return according to the image's content.

Categories:

General topics

I have the same question (0)

All responses (3)

Answers (2)

Verified answer

Michael E. Gernaey 53,479 Super User 2025 Season 2 on at

Like (1)

Report

Hi @CM-24021732-0

You do not have to do that. You have to make sure Generative AI is turned on to use it OOB.

You can also turn on General knowledge too, but essentially no you can just use them.

If this helps you I'd appreciate if you Marked it as Resolved and maybe a like :-)

Cheers

Was this reply helpful? Yes No
Suggested answer

ronaldwalcott 3,847 Super User 2025 Season 2 on at

Like (0)

Report

Essentially when it comes to using AI models you first test the available models to determine if they provide the expected results. If they don't work as you expect then you either have to train your own models or search for a model which works with your test cases.

Was this reply helpful? Yes No
Verified answer

SherriS 19 on at

Like (1)

Report

Hi there,

You didn't mention how you're adding the image recognition into your model. The answer depends on what model you're adding to Copilot Studio to accomplish the computer vision task, if any. There are different ways to accomplish this task, with different answers to your question depending upon which way you decide to do this.

Uploading Images to Copilot Studio (for example as a knowledge source):
If you're using Copilot Studio's built-in image recognition by uploading images as a knowledge source, for example, you're right: I can't find in the documentation what kind of model is being used for the image recognition task, either. But I suspect it's a ViT, or similar model structure, appended to the front end of the Transformer model that they're using for the LLM. You can look that model architecture up, but basically it's a way to convert images into vectors for the model to use just like it uses text vectors to generate predictions. These are trained on a giant corpus of images, just like the LLMs are trained on a giant corpus of text. If you have a general use case (i.e., the types of things that a model who had seen all the images on the internet), your model will be good at recognizing your images out of the box. There isn't a low-code / no-code way that I'm aware of to re-train or fine-tune these images. You can do it in a full code way, but it would be very, very expensive, just as it would be to retrain or fine tune an LLM. You're talking similar amounts of data and compute time as you would to retrain or fine-tune an LLM.

Using Power Automate to Add a Custom Gen AI Prompt

Same thing applies if you're using Power Automate to incorporate a Gen AI prompt. (You can find these in the AI Hub by clicking on prebuilt prompts or you can create your own generic prompt and pull it into Power Automate.) If you use one of these and pull it into your model using Power Automate, all of the above applies. My guess is it's probably also a ViT.

Using Power Automate to Include an AI Builder Model

You can also incorporate an AI Builder Model into your Copilot Studio model using Gen AI. You would need to create one of the computer vision AI Builder models in the AI Hub and then pull it into Copilot Studio using the one of the AI Builder actions in Power Automate. If you have very specific images that the Gen AI models aren't doing well at recognizing (the vision equivalent of needing RAG is for text), then you might benefit from adding the computer vision task this way because you get access to fine-tuning a custom model to recognize your specific images. These are the good old Convolutional Neural Networks (CNNs), probably with some additional architecture that's proprietary to Microsoft.

I hope this is helpful! If it is, I'd really appreciate if you would mark this answer as correct and give it a like!

Was this reply helpful? Yes No