Power Platform Community Forum Thread Details

Is there a way to extract the text layer of a pdf?

At the moment I use the AI function "Recognize text in an image or a PDF document" to extract the PDF text content. The function works regardless if there is a text layer present or not - and it is quite easy to 'filter array' to find the text (and the text position). I guess the function uses OCR no matter what? Because I notice that the OCR is sometimes different from the text layer - especially when pdf quality is low.

The ideal flow would be to extract text layer if available, and then use OCR if needed.

Categories:

Building flows