web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Apps / Multi-page table issue...
Power Apps
Unanswered

Multi-page table issue in document processing

(0) ShareShare
ReportReport
Posted on by 8

I need to extract information in tabular format from order confirmation pdfs received from suppliers. Each pdf has multiple items and each item will have a code, description, quantity and delivery date. So the table will have four columns: Code , Description, Quantity, Delivery Date with each row representing an item. 
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. 

For example: if this is the pdf
-----some text--------------------------------------------

-----some text---------------------------------------------

code: 101       

description: this is first item

quantity: 56     

delivery date: 22.08.2023

 

code: 102       

description: this is second item

quantity: 65     

delivery date: 23.08.2023

 

code: 103      

description: this is third item

 

-------page 1 ends here---------

 

 

-------page 2 begins here--------

 

quantity: 72    

delivery date: 24.08.2023

 

code: 104       

description: this is fourth item

quantity: 80  

delivery date: 23.08.2023

 

code: 105       

description: this is fifth item

quantity: 60     

delivery date: 21.08.2023

 

---------some text here--------------------------------

------------------------------page 2 ends----------------------

------------------------------pdf ends----------------------------

 

 

The document cannot be tagged correctly and the accuracy of model is pretty bad. For the above document, the tagged tables look like this

CodeDescriptionQuantityDelivery Date
101this is first item5622.08.2023
102this is second item6523.08.2023
103this is third item  

 

CodeDescriptionQuantityDelivery Date
  7224.08.2023
104this is fourth item6021.08.2023
105this is fifth item8023.08.2023

Let me know if there is any solution for this. I have already tried the following things:

- not tagging these rows in multipage documents

- tagging one page documents only for training and then using multi page ones during testing

- tagging the tables like shown above 

Everytime, the accuracy is bad. I used increase the number of documents during training but to no use.  

@antoinec @plarrue @Antrod @JoeF-MSFT @CedrickB 

Categories:
I have the same question (0)
  • plarrue Profile Picture
    Moderator on at

    Hi @Charanjit ,

     

    Thanks for reaching out.

    "The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. " - Did you select This table continues on next page and continue tagging the table on the following page ?

     

    Thanks,

    Regards

  • Charanjit Profile Picture
    8 on at

    Hi @plarrue ,
    yes, after tagging the first page and getting the table 

     

    CodeDescriptionQuantityDelivery Date
    101this is first item5622.08.2023
    102this is second item6523.08.2023
    103this is third item  

     

    I selected this table continues on next page, tagged the content in the second page and got the table below

    CodeDescriptionQuantityDelivery Date
      7224.08.2023
    104this is fourth item6021.08.2023
    105this is fifth item8023.08.2023


    The problem is for the third item as its code and description are on the first page and the values for quantity and delivery date are on second page.

  • plarrue Profile Picture
    Moderator on at

    ah sorry I didn't notice the line 103. Rows that break from one page to another are not supported.

    It's a bit challenging. Perhaps you could consider consolidating this extracted data during a post-processing step?

     

    Regards,

  • Charanjit Profile Picture
    8 on at

    I could have considered a post processing step even if the model has accurately predicted all the other rows that donot break. Because of the break, the accuracy reduces drastically and the model starts to make mistakes even for the rows which have all the details on the same page. 

    We were thinking of extending the power platform across our organisation in all the countries but unfortunately if this is the case then it doesn't make sense to do so. 

    We are better off parsing the whole documents using python and create a logic to extract relevant information. Thanks.

  • plarrue Profile Picture
    Moderator on at

    Maybe AI Builder Create text with GPT can help do the post processing ? 🙂
    . Give the whole table content to GPT, and have it break it down in different rows

  • RamKan2021 Profile Picture
    102 on at

    I am also working on the same requirement , any solutions found for this.

  • Charanjit Profile Picture
    8 on at

    Unfortunately nothing worked with AI builder, i tried to do some post processing but it was a huge effort for not so good results. The amount of effort that i had to put made no sense.
    Furthermore, the model accuracy was drastically low while dealing with new pdfs where some details for a row were split over two pages. 
    I am not using AI builder anymore. I simply wrote some python code  that captures the incoming emails, downloads pdf attachments in a folder, reads and extracts the relevant information from the pdfs and writes that information in an excel file. The only thing was that I had to write one python script for one type/layout of document.
    AI builder is good for handling only one page documents or those multi page documents where the format is super nice and clean.

  • CedrickB Profile Picture
    Moderator on at

    Text spaning accross pages is unfortunatelly not yet supported.
    Post processing could be a way, but it would be difficult to get a reliable behavior in all cases.

    The only option is to manage this with manual edit after extraction which will make this process semi-automated.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Apps

#1
WarrenBelz Profile Picture

WarrenBelz 721 Most Valuable Professional

#2
Michael E. Gernaey Profile Picture

Michael E. Gernaey 320 Super User 2025 Season 2

#3
Power Platform 1919 Profile Picture

Power Platform 1919 268

Last 30 days Overall leaderboard