web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Extract data from PDF ...
Power Automate
Unanswered

Extract data from PDF with multiple pages to Excel

(0) ShareShare
ReportReport
Posted on by

Hi,

 

I have a flow that extracts fields from a PDF to an excel table. The only issue I have is it will only extract the first page of the PDF. The file will have a different number of pages, but each page is the exact same format. Should it be extracting all pages by default? I am not sure if this would be an issue with the actions or the model that was trained. I cant seem to upload screenshots. Here is an imgur below of the flow and some of the actions. I trained 25 individual pages. I tried training one with multiple pages, but wasnt sure how to select multiple fields for the same label. It would not let me click fields on different pages for the same label. I dont know if I need to retrain it somehow or if an action needs to be added or changed. Thanks for your help!

 

https://imgur.com/a/iFlChMG

 

Categories:
I have the same question (0)
  • takolota1 Profile Picture
    4,974 Moderator on at

    This person had a similar issue with a field that would repeat several times across the document. If it could only repeat a few times then it is advised to create a new extraction field for each potential instance.

    https://powerusers.microsoft.com/t5/AI-Builder/repeated-field-on-the-same-page/m-p/1392532#M1058

    If you could have many instances like the person in that thread though, then you may want to use a different method of extracting data with OCR & GPT prompts that can return arrays of results from repeated fields: https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

  • CU27101555-0 Profile Picture
    on at

    Thanks for the reply. Just to be clear, these PDF's are text, not image. Would I still want to use OCR for that? Is that not for extracting text from images?

  • takolota1 Profile Picture
    4,974 Moderator on at

    @dw_22801 It would work on any PDF, text or scanned image.

  • CU27101555-0 Profile Picture
    on at

    I've created the flow from the link you sent. There are updated versions of the flow that the creator posted since he created the video, so the flow is a little different than what the video shows. I was able to get it to work though, but again with the same issue, it only outputs one page worth of data on the excel file. I've gone through the inputs and outputs of the run, and it looks like all three pages have been captured and output all the way up to the input of the "Create text with GPT" action. The output of it does not include all pages. Do you have any ideas?

    Thanks

     

    Flow 1.JPG

     

     

     

    Flow 2.JPG

     

     

    Flow 3.JPG

     

     

     

    Flow 4.JPG

     

  • takolota1 Profile Picture
    4,974 Moderator on at

    @dw_22801 
    Have you adjusted the example JSON in the prompt to indicate that there are repeating fields & to get them in an array output? And noted in the prompt to GPT which fields may repeat?

  • CU27101555-0 Profile Picture
    on at

    Here is the prompt I have. If I try adding anything to it and test it out, I get a message "Sorry, I cant respond to that request".

     

    From the OCR captured invoice document text provided between [Start of text] & [End of text] markers, extract data for each of the example JSON fields between [Start example JSON] & [End example JSON].
    Be aware, OCR text may contain errors like wrong characters or missing formatting.
    In the JSON output, replace [data] with each field's extracted data text using only string datatypes. Use a string value of "N/A" when the field's data text is not found in the document text provided.
    Convert all numbers to a standard 2-place decimal notation.
    Represented each data value as a string type.


    [Start example JSON]
    [
    {
    "Name:": [data],
    "Ticket:": [data],
    "Gross Standard Volume:": [data],
    "Net Standard Volume:": [data],
    "Meter Density:": [data],
    "Meter Temperature:": [data],
    "S&W%:": [data]
    }
    ]
    [End example JSON]


    [Start of text]
    Outputs txt output
    [End of text]


    Return only the final JSON object. Do not return any other output descriptions or explanations, only the JSON object.

  • takolota1 Profile Picture
    4,974 Moderator on at

    @dw_22801 

    If all those fields repeat, then try stating that there may be multiple instances of each set of values & that each instance should be added as another json object to the json array.

    And in the example try…

     

    [Start example JSON]
    [
    {
    "Name:": [data],
    "Ticket:": [data],
    "Gross Standard Volume:": [data],
    "Net Standard Volume:": [data],
    "Meter Density:": [data],
    "Meter Temperature:": [data],
    "S&W%:": [data]
    },
    {
    "Name:": [data],
    "Ticket:": [data],
    "Gross Standard Volume:": [data],
    "Net Standard Volume:": [data],
    "Meter Density:": [data],
    "Meter Temperature:": [data],
    "S&W%:": [data]
    }
    ]

    [End example JSON]

     

    That way you clearly indicate you want a JSON array response with multiple objects in the JSON array.

  • CU27101555-0 Profile Picture
    on at

    Almost there. The flow finished, but didn't export to excel. It did show all data in the output of the Parse JSON action, but didnt seem to fill in the table.

    dw_22801_0-1714069805232.png

     

  • CU27101555-0 Profile Picture
    on at

    Something seems strange with the excel though. Whenever I open it, sometimes it asks me if I want to merge with server changes when I try and close. I dont know if something may not be synching

  • takolota1 Profile Picture
    4,974 Moderator on at

    @dw_22801 

    Did GPT add that outermost key of “data”: [ ]

     

    I wonder if that additional property messed things up.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 523 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 406 Moderator

#3
abm abm Profile Picture

abm abm 245 Most Valuable Professional

Last 30 days Overall leaderboard