web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Data extraction from S...
Power Automate
Answered

Data extraction from Scanned PDF document

(0) ShareShare
ReportReport
Posted on by 62

I have got scanned invoices converted into pdf format. I want to have data extracted from the invoices and store it in an excel sheet.

I have tried AI builder, but all the data which needs to be extracted is not getting analysed.

 

Please provide some suggestions.

 

Thanks!

Categories:
I have the same question (0)
  • abm abm Profile Picture
    32,985 Most Valuable Professional on at

    Hi @dc23 

     

    Did you looked into this?

     

    https://powerusers.microsoft.com/t5/Power-Automate-Community-Blog/Extract-data-from-documents-with-Microsoft-Flow/ba-p/370422

     

    @Jay-Encodian  could help you in this.

     

    Thanks

  • dc23 Profile Picture
    62 on at

    Yes i have gone through the solution provided. It worked for me in case I had to pick up a single element from the invoice, however, in case the invoice has multiple rows for Quantity, Item etc, I'm not able to get the desired result.

     

    Thanks!

  • abm abm Profile Picture
    32,985 Most Valuable Professional on at

    Hi @dc23 

     

    Thanks for your quick reply. That might be the product limitation. @Jay-Encodian could clarify more on this.

     

    Thanks

  • Jay-Encodian Profile Picture
    2,920 on at

    Thanks @abm 

    Hey @dc23 

    The 'Extract Text from Regions' action is designed to allow text extraction from pre-defined regions... if those regions are dynamic it is more difficult. You have a few options to consider:

    1. Persist with the AI builder approach
    2. Use the Encodian action but add regions where data might exist and then handle null values in your Flow
    3. Use the Encodian 'Get PDF Text Layer' action combined with our 'Search Text - Regex' (Preview) action

    I'd recommend you go with option #1, this is the right tool for this scenario especially where you have multiple differing layouts of invoices... it's absolutely possible with the Encodian action, but this is a more basic tool (By design) and you will have to do more work up front to get this to work.

    If you want to try the Encodian approach let me know and I'll guide you through.

    HTH

    Jay

  • dc23 Profile Picture
    62 on at

    Hi Jay,

     

    Thanks for the response, however, before moving onto encodian I did try with the AI builder approach. In that case the issue, I encountered was that for some of the invoices the data was being fetched, however, for some others the with the same format the data was not getting analysed.

     

    Also, not sure if AI builder allows us to select the data of our choice rather than just suggesting the fields we can select from.

     Do you have suggestion to it.

     

    Thanks!

     

  • Verified answer
    Jay-Encodian Profile Picture
    2,920 on at

    Hi @dc23 

    In my exp you need to provide enough sample documents which are very similar but with different data so that the fields are recognized for selection, ref - https://docs.microsoft.com/en-us/ai-builder/form-processing-sample-data

    I believe you need the AI model to detect the fields, you can't just select them.

    HTH

    Jay

     

  • CFernandes Profile Picture
    8,482 Most Valuable Professional on at

    You can use Muhimbi PDF Converter Power Automate action to Extract Data from Scanned PDF document.

     

    Muhimbi PDF Converter comes with support for a number of OCR (Optical Character Recognition) related facilities including the ability to make image based PDFs (Scans, faxes) fully searchable and indexable. In addition it support a way to extract this text to allow information such as Invoice numbers, Purchase Order numbers or other identifiable information to be extracted.

     

    You can find details.

     

     

    I hope this helps.

  • takolota1 Profile Picture
    4,980 Moderator on at

    If anyone wants to extract data from a PDF or image without training a model for select documents, try this new GPT data extraction method: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

     

    It doesn’t require specifying certain document areas, wordings, styles, etc. It just OCRs the file, converts it to a replica text (txt), and passes it to a GPT prompt where you can ask GPT to do whatever you want with the document data.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the March Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Haque Profile Picture

Haque 594

#2
Valantis Profile Picture

Valantis 328

#3
David_MA Profile Picture

David_MA 281 Super User 2026 Season 1

Last 30 days Overall leaderboard