Hi there,
You didn't mention how you're adding the image recognition into your model. The answer depends on what model you're adding to Copilot Studio to accomplish the computer vision task, if any. There are different ways to accomplish this task, with different answers to your question depending upon which way you decide to do this.
Uploading Images to Copilot Studio (for example as a knowledge source):
If you're using Copilot Studio's built-in image recognition by uploading images as a knowledge source, for example, you're right: I can't find in the documentation what kind of model is being used for the image recognition task, either. But I suspect it's a ViT, or similar model structure, appended to the front end of the Transformer model that they're using for the LLM. You can look that model architecture up, but basically it's a way to convert images into vectors for the model to use just like it uses text vectors to generate predictions. These are trained on a giant corpus of images, just like the LLMs are trained on a giant corpus of text. If you have a general use case (i.e., the types of things that a model who had seen all the images on the internet), your model will be good at recognizing your images out of the box. There isn't a low-code / no-code way that I'm aware of to re-train or fine-tune these images. You can do it in a full code way, but it would be very, very expensive, just as it would be to retrain or fine tune an LLM. You're talking similar amounts of data and compute time as you would to retrain or fine-tune an LLM.
Using Power Automate to Add a Custom Gen AI Prompt
Same thing applies if you're using Power Automate to incorporate a Gen AI prompt. (You can find these in the AI Hub by clicking on prebuilt prompts or you can create your own generic prompt and pull it into Power Automate.) If you use one of these and pull it into your model using Power Automate, all of the above applies. My guess is it's probably also a ViT.
Using Power Automate to Include an AI Builder Model
You can also incorporate an AI Builder Model into your Copilot Studio model using Gen AI. You would need to create one of the computer vision AI Builder models in the AI Hub and then pull it into Copilot Studio using the one of the AI Builder actions in Power Automate. If you have very specific images that the Gen AI models aren't doing well at recognizing (the vision equivalent of needing RAG is for text), then you might benefit from adding the computer vision task this way because you get access to fine-tuning a custom model to recognize your specific images. These are the good old Convolutional Neural Networks (CNNs), probably with some additional architecture that's proprietary to Microsoft.
I hope this is helpful! If it is, I'd really appreciate if you would mark this answer as correct and give it a like!