Hi!
We're using a custom AI model to process PDF documents that our clients sent to a mailbox. Each PDF document has 7 pages with a lot of fields that need to be filled in. One of the fields is a filenumber with the following format XX:000000
So this can be for example TD:123456. Some clients fill in the form a bit different, so sometimes it's being filled in as TD: 123456 or TD 123456.
When we train our model with this values and test the model after successful training, it seems impossible for the model to identify the full value when this is being filled in as TD: 123456. The only part that's being extracted is TD:
The other possible values are being detected correctly. It looks like it's struggling with the colon (:).
The accuracy score for the field is 99% and in the quick test it returns a confidence score of 99% as well, despite it's not detected correctly.
What can we do with the AI model to get this value correct?
Thanks in advance.