Unanswered

Multi-page table issue in document processing

Like (0) Share

Report

Posted on 22 Aug 2023 08:44:06 by Charanjit

I need to extract information in tabular format from order confirmation pdfs received from suppliers. Each pdf has multiple items and each item will have a code, description, quantity and delivery date. So the table will have four columns: Code , Description, Quantity, Delivery Date with each row representing an item.
The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page.

For example: if this is the pdf
-----some text--------------------------------------------

-----some text---------------------------------------------

code: 101

description: this is first item

quantity: 56

delivery date: 22.08.2023

code: 102

description: this is second item

quantity: 65

delivery date: 23.08.2023

code: 103

description: this is third item

-------page 1 ends here---------

-------page 2 begins here--------

quantity: 72

delivery date: 24.08.2023

code: 104

description: this is fourth item

quantity: 80

delivery date: 23.08.2023

code: 105

description: this is fifth item

quantity: 60

delivery date: 21.08.2023

---------some text here--------------------------------

------------------------------page 2 ends----------------------

------------------------------pdf ends----------------------------

The document cannot be tagged correctly and the accuracy of model is pretty bad. For the above document, the tagged tables look like this

Code	Description	Quantity	Delivery Date
101	this is first item	56	22.08.2023
102	this is second item	65	23.08.2023
103	this is third item

Code	Description	Quantity	Delivery Date
		72	24.08.2023
104	this is fourth item	60	21.08.2023
105	this is fifth item	80	23.08.2023

Let me know if there is any solution for this. I have already tried the following things:

- not tagging these rows in multipage documents

- tagging one page documents only for training and then using multi page ones during testing

- tagging the tables like shown above

Everytime, the accuracy is bad. I used increase the number of documents during training but to no use.

@antoinec @plarrue @Antrod @JoeF-MSFT @CedrickB

Categories:

AI Builder

All responses (8)

Answers (0)

CedrickB Moderator on 20 Mar 2024 at 12:30:05

Like (0)

Report

Re: Multi-page table issue in document processing

Text spaning accross pages is unfortunatelly not yet supported.
Post processing could be a way, but it would be difficult to get a reliable behavior in all cases.

The only option is to manage this with manual edit after extraction which will make this process semi-automated.
Charanjit 8 on 19 Mar 2024 at 09:46:38

Like (0)

Report

Re: Multi-page table issue in document processing

Unfortunately nothing worked with AI builder, i tried to do some post processing but it was a huge effort for not so good results. The amount of effort that i had to put made no sense.
Furthermore, the model accuracy was drastically low while dealing with new pdfs where some details for a row were split over two pages.
I am not using AI builder anymore. I simply wrote some python code that captures the incoming emails, downloads pdf attachments in a folder, reads and extracts the relevant information from the pdfs and writes that information in an excel file. The only thing was that I had to write one python script for one type/layout of document.
AI builder is good for handling only one page documents or those multi page documents where the format is super nice and clean.
RamKan2021 102 on 18 Mar 2024 at 18:28:01

Like (0)

Report

Re: Multi-page table issue in document processing

I am also working on the same requirement , any solutions found for this.
plarrue Moderator on 22 Aug 2023 at 13:31:41

Like (0)

Report

Re: Multi-page table issue in document processing

Maybe AI Builder Create text with GPT can help do the post processing ? 🙂
. Give the whole table content to GPT, and have it break it down in different rows
Charanjit 8 on 22 Aug 2023 at 12:45:46

Like (0)

Report

Re: Multi-page table issue in document processing

I could have considered a post processing step even if the model has accurately predicted all the other rows that donot break. Because of the break, the accuracy reduces drastically and the model starts to make mistakes even for the rows which have all the details on the same page.
We were thinking of extending the power platform across our organisation in all the countries but unfortunately if this is the case then it doesn't make sense to do so.
We are better off parsing the whole documents using python and create a logic to extract relevant information. Thanks.
plarrue Moderator on 22 Aug 2023 at 12:04:40

Like (0)

Report

Re: Multi-page table issue in document processing

ah sorry I didn't notice the line 103. Rows that break from one page to another are not supported.

It's a bit challenging. Perhaps you could consider consolidating this extracted data during a post-processing step?

Regards,
Charanjit 8 on 22 Aug 2023 at 09:54:55

Like (0)

Report

Re: Multi-page table issue in document processing

Hi @plarrue ,
yes, after tagging the first page and getting the table

Code Description Quantity Delivery Date
101 this is first item 56 22.08.2023
102 this is second item 65 23.08.2023
103 this is third item

I selected this table continues on next page, tagged the content in the second page and got the table below
Code Description Quantity Delivery Date
72 24.08.2023
104 this is fourth item 60 21.08.2023
105 this is fifth item 80 23.08.2023

The problem is for the third item as its code and description are on the first page and the values for quantity and delivery date are on second page.
plarrue Moderator on 22 Aug 2023 at 09:47:01

Like (0)

Report

Re: Multi-page table issue in document processing

Hi @Charanjit ,

Thanks for reaching out.

"The problem arises when some details for an item are present at the bottom of one page and the remaining details are on the next page. " - Did you select This table continues on next page and continue tagging the table on the following page ?

Thanks,

Regards

Community site session details

Multi-page table issue in document processing

Helpful resources

News and Announcements

Quick Links

Announcing our 2025 Season 2 Super Users!

Paul Stork – Community Spotlight

Congratulations to the June Top 10 Community Leaders!

Leaderboard > Power Apps

Featured topics

Product updates

Community site session details

Multi-page table issue in document processing

Helpful resources

News and Announcements

Quick Links

Announcing our 2025 Season 2 Super Users!

Paul Stork – Community Spotlight

Congratulations to the June Top 10 Community Leaders!

Subscribe to this forum!

Select categories

Leaderboard > Power Apps

Featured topics

Product updates