Hi All
I'm training my Forms Processing model using 14 samples all with the same format. What I'm noticing is that the OCR is not fully extracting the data. When I run a Quick Test, the areas get selected correctly but the actual data is incorrect.
For example in the pdf a date value is 2020.05.29 but it gets extracted as 20 .05 29. In another example the pdf has the value 2019.11.29 but it gets extracted as 2019.1 29. In a third example we have the value 15452 but it gets extracted as 154 2. Is there anything that I can do to improve the accuracy? The pdf are generated by a third party system so I don't have much control over changing the format.
Hi @Runner55552 - unfortunately not yet, but it is still on our roadmap for later in the year.
Was the second option implemented yet? So far, I am not seeing any option to define the type of field. They are all text by default.
Hi @Patrick2424,
Thanks for raising this issue. One thing you can try is to add more samples to the training set in your Form Processing model and see if the quality improves.
Having said this, there are a couple of upcoming improvements that will help on this:
If adding more samples to the training set doesn't help, feel free to reach out by private message with some samples and I can validate that the improvements that are coming this summer will improve the results and I can share back the results with you.
WarrenBelz
770
Most Valuable Professional
stampcoin
494
MS.Ragavendar
399