Currently Copilot should be able to read the text present in the PDF documents without any issues. We just need to make sure that PDF file is properly formatted.
Regarding embedded images with PDF, currently copilot cannot process this. But there is announcement from Microsoft that this feature will be out soon, Copilot will soon be able to process the Embedded images inside PDF also.
Power Apps and Power Automate have limitations when it comes to extracting text from PDF files, especially when the PDF contains embedded images or non-selectable text.
Native Power Automate PDF actions do not directly support extracting text from PDF files that contain images or scanned documents.
If your PDFs contain embedded images or are scanned documents (e.g., the text is part of the image), extracting text will not work unless Optical Character Recognition (OCR) is used.
OCR is required to extract text from image-based PDFs or PDFs with embedded images. Power Automate doesn’t have a built-in OCR feature, but you can integrate with services like AI Builder or third-party OCR tools.
AI Builder (a part of the Microsoft Power Platform) can be used to extract text from PDFs, including handling image-based PDFs via OCR.
SharePoint and Power Automate may encounter issues with large PDF files or PDFs with highly complex formatting. Processing times may increase, or extraction may fail for large documents.
We have seen GPT behind Copilot Studio was able to read images and company logos and does the OCR on high quality images, but we see the OCR is degraded from last week.
I recommend uploading documents with high quality images and test the current version.
Under review
Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.