Skip to main content

Notifications

Community site session details

Community site session details

Session Id : MVzhG5HZg3mKuDDhURDNWT
Power Apps - AI Builder
Unanswered

Multi-page table issue in document processing

Like (0) ShareShare
ReportReport
Posted on 22 Aug 2023 08:44:06 by 8

I need to extract information in tabular format from order confirmation pdfs received from suppliers. Each pdf has multiple items and each item will have a code, description, quantity and delivery date. So the table will have four columns: Code , Description, Quantity, Delivery Date with each row representing an item. 
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. 

For example: if this is the pdf
-----some text--------------------------------------------

-----some text---------------------------------------------

code: 101       

description: this is first item

quantity: 56     

delivery date: 22.08.2023

 

code: 102       

description: this is second item

quantity: 65     

delivery date: 23.08.2023

 

code: 103      

description: this is third item

 

-------page 1 ends here---------

 

 

-------page 2 begins here--------

 

quantity: 72    

delivery date: 24.08.2023

 

code: 104       

description: this is fourth item

quantity: 80  

delivery date: 23.08.2023

 

code: 105       

description: this is fifth item

quantity: 60     

delivery date: 21.08.2023

 

---------some text here--------------------------------

------------------------------page 2 ends----------------------

------------------------------pdf ends----------------------------

 

 

The document cannot be tagged correctly and the accuracy of model is pretty bad. For the above document, the tagged tables look like this

CodeDescriptionQuantityDelivery Date
101this is first item5622.08.2023
102this is second item6523.08.2023
103this is third item  

 

CodeDescriptionQuantityDelivery Date
  7224.08.2023
104this is fourth item6021.08.2023
105this is fifth item8023.08.2023

Let me know if there is any solution for this. I have already tried the following things:

- not tagging these rows in multipage documents

- tagging one page documents only for training and then using multi page ones during testing

- tagging the tables like shown above 

Everytime, the accuracy is bad. I used increase the number of documents during training but to no use.  

@antoinec @plarrue @Antrod @JoeF-MSFT @CedrickB 

Categories:
  • CedrickB Profile Picture
    Moderator on 20 Mar 2024 at 12:30:05
    Re: Multi-page table issue in document processing

    Text spaning accross pages is unfortunatelly not yet supported.
    Post processing could be a way, but it would be difficult to get a reliable behavior in all cases.

    The only option is to manage this with manual edit after extraction which will make this process semi-automated.

  • Charanjit Profile Picture
    8 on 19 Mar 2024 at 09:46:38
    Re: Multi-page table issue in document processing

    Unfortunately nothing worked with AI builder, i tried to do some post processing but it was a huge effort for not so good results. The amount of effort that i had to put made no sense.
    Furthermore, the model accuracy was drastically low while dealing with new pdfs where some details for a row were split over two pages. 
    I am not using AI builder anymore. I simply wrote some python code  that captures the incoming emails, downloads pdf attachments in a folder, reads and extracts the relevant information from the pdfs and writes that information in an excel file. The only thing was that I had to write one python script for one type/layout of document.
    AI builder is good for handling only one page documents or those multi page documents where the format is super nice and clean.

  • RamKan2021 Profile Picture
    102 on 18 Mar 2024 at 18:28:01
    Re: Multi-page table issue in document processing

    I am also working on the same requirement , any solutions found for this.

  • plarrue Profile Picture
    Moderator on 22 Aug 2023 at 13:31:41
    Re: Multi-page table issue in document processing

    Maybe AI Builder Create text with GPT can help do the post processing ? 🙂
    . Give the whole table content to GPT, and have it break it down in different rows

  • Charanjit Profile Picture
    8 on 22 Aug 2023 at 12:45:46
    Re: Multi-page table issue in document processing

    I could have considered a post processing step even if the model has accurately predicted all the other rows that donot break. Because of the break, the accuracy reduces drastically and the model starts to make mistakes even for the rows which have all the details on the same page. 

    We were thinking of extending the power platform across our organisation in all the countries but unfortunately if this is the case then it doesn't make sense to do so. 

    We are better off parsing the whole documents using python and create a logic to extract relevant information. Thanks.

  • plarrue Profile Picture
    Moderator on 22 Aug 2023 at 12:04:40
    Re: Multi-page table issue in document processing

    ah sorry I didn't notice the line 103. Rows that break from one page to another are not supported.

    It's a bit challenging. Perhaps you could consider consolidating this extracted data during a post-processing step?

     

    Regards,

  • Charanjit Profile Picture
    8 on 22 Aug 2023 at 09:54:55
    Re: Multi-page table issue in document processing

    Hi @plarrue ,
    yes, after tagging the first page and getting the table 

     

    CodeDescriptionQuantityDelivery Date
    101this is first item5622.08.2023
    102this is second item6523.08.2023
    103this is third item  

     

    I selected this table continues on next page, tagged the content in the second page and got the table below

    CodeDescriptionQuantityDelivery Date
      7224.08.2023
    104this is fourth item6021.08.2023
    105this is fifth item8023.08.2023


    The problem is for the third item as its code and description are on the first page and the values for quantity and delivery date are on second page.

  • plarrue Profile Picture
    Moderator on 22 Aug 2023 at 09:47:01
    Re: Multi-page table issue in document processing

    Hi @Charanjit ,

     

    Thanks for reaching out.

    "The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. " - Did you select This table continues on next page and continue tagging the table on the following page ?

     

    Thanks,

    Regards

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Thomas Rice – Community Spotlight

We are honored to recognize Thomas Rice as our March 2025 Community…

Kudos to the February Top 10 Community Stars!

Thanks for all your good work in the Community

Announcing Our 2025 Season 1 Super Users!

A new season of Super Users has arrived, and we are so grateful for the daily…

Leaderboard

#1
WarrenBelz Profile Picture

WarrenBelz 146,508 Most Valuable Professional

#2
RandyHayes Profile Picture

RandyHayes 76,287 Super User 2024 Season 1

#3
Pstork1 Profile Picture

Pstork1 65,440 Most Valuable Professional

Leaderboard
Loading started