Unanswered

Forms Processing missing characters

(0) Share

Report

Posted on by Patrick2424

Hi All

I'm training my Forms Processing model using 14 samples all with the same format. What I'm noticing is that the OCR is not fully extracting the data. When I run a Quick Test, the areas get selected correctly but the actual data is incorrect.

For example in the pdf a date value is 2020.05.29 but it gets extracted as 20 .05 29. In another example the pdf has the value 2019.11.29 but it gets extracted as 2019.1 29. In a third example we have the value 15452 but it gets extracted as 154 2. Is there anything that I can do to improve the accuracy? The pdf are generated by a third party system so I don't have much control over changing the format.

Categories:

AI Builder

I have the same question (0)

All responses (3)

Answers (0)

JoeF-MSFT on at

Like (0)

Report
Hi @Patrick2424,

Thanks for raising this issue. One thing you can try is to add more samples to the training set in your Form Processing model and see if the quality improves.

Having said this, there are a couple of upcoming improvements that will help on this:

This summer we will be upgrading the OCR engine we currently use in AI Builder Form Processing to the latest and greatest coming from Azure AI.

Towards the end of the year, Form Processign will let you specify the type of a field (for instance date, number, currency) so if, for example, you specify a field as a date, you will get back a normalized value without the spaces.

If adding more samples to the training set doesn't help, feel free to reach out by private message with some samples and I can validate that the improvements that are coming this summer will improve the results and I can share back the results with you.

Was this reply helpful? Yes No
55552 674 on at

Like (0)

Report

Was the second option implemented yet? So far, I am not seeing any option to define the type of field. They are all text by default.

Was this reply helpful? Yes No
JoeF-MSFT on at

Like (0)

Report

Hi @Runner55552 - unfortunately not yet, but it is still on our roadmap for later in the year.

Was this reply helpful? Yes No