Skip to main content

Notifications

Power Apps - AI Builder
Unanswered

Multi-page table issue in document processing

(0) ShareShare
ReportReport
Posted on by 8

I need to extract information in tabular format from order confirmation pdfs received from suppliers. Each pdf has multiple items and each item will have a code, description, quantity and delivery date. So the table will have four columns: Code , Description, Quantity, Delivery Date with each row representing an item. 
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. 

For example: if this is the pdf
-----some text--------------------------------------------

-----some text---------------------------------------------

code: 101       

description: this is first item

quantity: 56     

delivery date: 22.08.2023

 

code: 102       

description: this is second item

quantity: 65     

delivery date: 23.08.2023

 

code: 103      

description: this is third item

 

-------page 1 ends here---------

 

 

-------page 2 begins here--------

 

quantity: 72    

delivery date: 24.08.2023

 

code: 104       

description: this is fourth item

quantity: 80  

delivery date: 23.08.2023

 

code: 105       

description: this is fifth item

quantity: 60     

delivery date: 21.08.2023

 

---------some text here--------------------------------

------------------------------page 2 ends----------------------

------------------------------pdf ends----------------------------

 

 

The document cannot be tagged correctly and the accuracy of model is pretty bad. For the above document, the tagged tables look like this

CodeDescriptionQuantityDelivery Date
101this is first item5622.08.2023
102this is second item6523.08.2023
103this is third item  

 

CodeDescriptionQuantityDelivery Date
  7224.08.2023
104this is fourth item6021.08.2023
105this is fifth item8023.08.2023

Let me know if there is any solution for this. I have already tried the following things:

- not tagging these rows in multipage documents

- tagging one page documents only for training and then using multi page ones during testing

- tagging the tables like shown above 

Everytime, the accuracy is bad. I used increase the number of documents during training but to no use.  

@antoinec @plarrue @Antrod @JoeF-MSFT @CedrickB 

Categories:
  • CedrickB Profile Picture
    CedrickB on at
    Re: Multi-page table issue in document processing

    Text spaning accross pages is unfortunatelly not yet supported.
    Post processing could be a way, but it would be difficult to get a reliable behavior in all cases.

    The only option is to manage this with manual edit after extraction which will make this process semi-automated.

  • Charanjit Profile Picture
    Charanjit 8 on at
    Re: Multi-page table issue in document processing

    Unfortunately nothing worked with AI builder, i tried to do some post processing but it was a huge effort for not so good results. The amount of effort that i had to put made no sense.
    Furthermore, the model accuracy was drastically low while dealing with new pdfs where some details for a row were split over two pages. 
    I am not using AI builder anymore. I simply wrote some python code  that captures the incoming emails, downloads pdf attachments in a folder, reads and extracts the relevant information from the pdfs and writes that information in an excel file. The only thing was that I had to write one python script for one type/layout of document.
    AI builder is good for handling only one page documents or those multi page documents where the format is super nice and clean.

  • RamKan2021 Profile Picture
    RamKan2021 102 on at
    Re: Multi-page table issue in document processing

    I am also working on the same requirement , any solutions found for this.

  • plarrue Profile Picture
    plarrue on at
    Re: Multi-page table issue in document processing

    Maybe AI Builder Create text with GPT can help do the post processing ? 🙂
    . Give the whole table content to GPT, and have it break it down in different rows

  • Charanjit Profile Picture
    Charanjit 8 on at
    Re: Multi-page table issue in document processing

    I could have considered a post processing step even if the model has accurately predicted all the other rows that donot break. Because of the break, the accuracy reduces drastically and the model starts to make mistakes even for the rows which have all the details on the same page. 

    We were thinking of extending the power platform across our organisation in all the countries but unfortunately if this is the case then it doesn't make sense to do so. 

    We are better off parsing the whole documents using python and create a logic to extract relevant information. Thanks.

  • plarrue Profile Picture
    plarrue on at
    Re: Multi-page table issue in document processing

    ah sorry I didn't notice the line 103. Rows that break from one page to another are not supported.

    It's a bit challenging. Perhaps you could consider consolidating this extracted data during a post-processing step?

     

    Regards,

  • Charanjit Profile Picture
    Charanjit 8 on at
    Re: Multi-page table issue in document processing

    Hi @plarrue ,
    yes, after tagging the first page and getting the table 

     

    CodeDescriptionQuantityDelivery Date
    101this is first item5622.08.2023
    102this is second item6523.08.2023
    103this is third item  

     

    I selected this table continues on next page, tagged the content in the second page and got the table below

    CodeDescriptionQuantityDelivery Date
      7224.08.2023
    104this is fourth item6021.08.2023
    105this is fifth item8023.08.2023


    The problem is for the third item as its code and description are on the first page and the values for quantity and delivery date are on second page.

  • plarrue Profile Picture
    plarrue on at
    Re: Multi-page table issue in document processing

    Hi @Charanjit ,

     

    Thanks for reaching out.

    "The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. " - Did you select This table continues on next page and continue tagging the table on the following page ?

     

    Thanks,

    Regards

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Microsoft Kickstarter Events…

Register for Microsoft Kickstarter Events…

Announcing Our 2025 Season 1 Super Users!

A new season of Super Users has arrived, and we are so grateful for the daily…

Announcing Forum Attachment Improvements!

We're excited to announce that attachments for replies in forums and improved…

Leaderboard

#1
WarrenBelz Profile Picture

WarrenBelz 145,666

#2
RandyHayes Profile Picture

RandyHayes 76,287

#3
Pstork1 Profile Picture

Pstork1 64,996

Leaderboard