Power automate Extract text from pdf in cloud flow

(0) Share

Report

Posted on by lexxx

Hello everyone,

I have been working on building a custom model in Ai builder to be used in cloud Power Automate flow.

The pdf files I would be extracting data from are generated pdf's and not scans.

Under Desktop Power Automate there is functionality "Extract text from pdf".

The functionality it offers is more than enough for my application.

My question is, is or will this feature be available for cloud power automate.

Currently I have to use the Ai builder solution for automated cloud flows with all the extra manual tagging and training of the model that has to be done. Using third party connectors is not and will not be allowed by our DLP policy, so that is a no go sadly.

Thank you in advance.

Lex

Categories:

AI Builder

I have the same question (0)

All responses (8)

Answers (0)

antoinec on at

Like (0)

Report

Hi Lex, That's a great question. Have you tried using "Text Recognition" in AI Builder? That provides a pre-built model to extract text from any type of PDF document. You can learn more about the feature here: Recognize text with AI Builder - Learn | Microsoft Docs . Do let us know if this work for you or if I misunderstood your question

1 people found this reply helpful.

Was this reply helpful? Yes No
lexxx 13 on at

Like (0)

Report

No I have not tried it yet. From the description on the ai builder explore page the extract from documents seemed the most appropriate.

Was this reply helpful? Yes No
antoinec on at

Like (0)

Report

It will extract text from everywhere. It won't extract structure like tables or checkboxes state though. Is that something you'd find valuable without actually getting more value from a trained model? In a sense I'm curious what you do with the data once you've extracted it.

Was this reply helpful? Yes No
lexxx 13 on at

Like (0)

Report

When I tried it with Power Automate Desktop the extracted text, including tables and checkboxes (the X marking the checkbox to be precise), was fairly consistent. It would be fairly trivial for me to write a script or even a flow to search for the keywords and parse the text.

The extracted data would be saved to a dataverse table.

Thanks for your time.

Lex

Was this reply helpful? Yes No
antoinec on at

Like (0)

Report

Understood. It'd be worth logging that as an idea on https://aka.ms/aibuilder-ideas to help us prioritize as an improvement for future changes. Right now, unfortunately, the only way to get table and checkbox data is with a custom trained model :'(

Was this reply helpful? Yes No
antoinec on at

Like (0)

Report

Understood. It'd be worth logging that as an idea on https://aka.ms/aibuilder-ideas to help us prioritize as an improvement for future changes. Right now, unfortunately, the only way to get table and checkbox data is with a custom trained model :'(

Was this reply helpful? Yes No
antoinec on at

Like (0)

Report

Thinking about it a bit more, if you have the ability to provision Azure resources, you could also give Form Recognizer's Layout a try. It's the foundational service we use for document processing in AI Builder. Layout should provide the data you're looking for.

Layouts - Form Recognizer - Azure Applied AI Services | Microsoft Docs

Was this reply helpful? Yes No
IWantToKnow 90 on at

Like (0)

Report

How about using an Adobe connector?
You can get the content as Json and parse to extract the appropriate data.

Was this reply helpful? Yes No