Hello Team ,
we are using AI builder to extract some datapoints form invoices. we are not using the pre trained model . so created around 50 collections (basically on vendors ) and trained around 200 Plus invoices. while fields like invoice number and location are coming with good accuracy. there are some fields which are concerning like amount , vendor name & po number . especially PO number . as we all know PO number doesn't come in invoice so in all the 200 + invoices in training we used the PO number which is added manually over the invoice. (refer attached). during the testing also we have given 90% of of the time the documents where the po number is stamped clearly. but unfortunately the model doesn't extract it rightly. the accuracy rate is 60%!!!
and here the accuracy i mean is not the confidence score , if some one edited the existing PO number we consider that as in-accurate. so which means out of 10 documents in 4 some one has to edit the po number extracted by the model . which is bad. is there any other better way of training ?
as i said above 90% of the times we give the document in which the po number is imprinted. the remaining 10% has the po printed partially , partially hand written etc., i have given these samples also during training , would these kind of documents be responsible to confuse the model to extract the right po number ?
experts please advice.
below is the 4 different types of PO i observed appearing in our invoices. i have one idea i can re train the model as in most cases the number is sorrounded by the text PO . but if the AI model is already taking the nearby text to extract this effort may go in vein , as it involves training 200 Plus documents
am thinking of using pre trained model for PO number , vendor name , invoice date and invoice number fields, for other custom defined fields . for this i am thinking of using the option of training the existing pretrained model for customs fields alone and see the results. if the accuracy rate is not up to the expected level am thinking of trying the solution that you provided alone. using unstructured document type.
Hi @seetharaman,
Did you provide the model with at least 5 examples of each formatting of the PO number?
Another thing you can try is to test with the unstructured document processing model, which could provide better results.
WarrenBelz
770
Most Valuable Professional
stampcoin
494
MS.Ragavendar
399