Skip to main content

Notifications

Power Apps - AI Builder
Answered

How to handle incorrect values when tagging documents? (With examples) - Document processing AI Builder

(1) ShareShare
ReportReport
Posted on by 7

Hi everyone!

I'm training a document processing model with about 120 documents in pdf and 40 fields. The pdfs are divided into 3 different collections (3 different layouts): the first one with 70 documents, the second one with 34, and the third one with 16.

 

Mainly in the first collection, I have this problem:

Some of the documents are native pdfs, like the example above, and there is no problem with tagging because the detected word is precisely the one in the document.

Native PDF Example.jpeg

On the other hand, some of them are scanned or photocopied documents, where the quality makes it difficult to detect the 100% correct values. For example, in this example, the value detected was "22:18" and not "22.18".

Scanned PDF Example.jpeg

 

I have other examples where the values detected are like "22 18", "22.18.", "*22.18". On scanned documents with poor quality, this happens to me for about 4 of the 40 fields.

 

So my question is, what do you recommend me to do in these cases?

- Tag the word, even knowing that it is not exactly the correct value.

- Choose Not available in document option, even if the word is present in the document and it is the detected value that was not exactly correct.

- Eliminate that pdf from the training collection (Not my favorite, because the distribution between native and scanned pdf in my real case is almost 50/50, so I am interested in including this kind of case in the training).

 

Please base your answer on what is best in terms of training the model, I am looking for the performance to be fairly good and reliable. Also, feel free to include another alternative that I may not have considered. Thanks in advance for your help!! 😊

Categories:
  • dcortes187 Profile Picture
    dcortes187 7 on at
    Re: How to handle incorrect values when tagging documents? (With examples) - Document processing AI Builder

    Hi @plarrue, thank you very much for your reply! I have not finished training the model, but I am going to use the option you recommend.

     

    As I understand from your answer, when the model is finished and in use, values like "22 18", "22.18.", "*22.18" (that come from poor quality documents) are going to have a confidence score significantly lower than a correct value like "22.18", so I will be able to distinguish between these cases based on that confidence score. Am I correct?

  • Verified answer
    plarrue Profile Picture
    plarrue on at
    Re: How to handle incorrect values when tagging documents? (With examples) - Document processing AI Builder

    Hi @dcortes187 

    We would recommend to : - Tag the word, even knowing that it is not exactly the correct value.

    This will teach the model about different document quality types which is good.

    When processing the documents,  we then expose a confidence score for each field that can be used to flag if a document needs to be manually reviewed.

     

    Improve the performance of your document processing model - AI Builder | Microsoft Learn

     

    Hope it helps.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Microsoft Kickstarter Events…

Register for Microsoft Kickstarter Events…

Announcing Our 2025 Season 1 Super Users!

A new season of Super Users has arrived, and we are so grateful for the daily…

Announcing Forum Attachment Improvements!

We're excited to announce that attachments for replies in forums and improved…

Leaderboard

#1
WarrenBelz Profile Picture

WarrenBelz 145,580

#2
RandyHayes Profile Picture

RandyHayes 76,287

#3
Pstork1 Profile Picture

Pstork1 64,909

Leaderboard