Hello Team ,
we are using AI builder to extract some datapoints form invoices. we are not using the pre trained model . so created around 50 collections (basically on vendors ) and trained around 200 Plus invoices. while fields like invoice number and location are coming with good accuracy. there are some fields which are concerning like amount , vendor name & po number . especially PO number . as we all know PO number doesn't come in invoice so in all the 200 + invoices in training we used the PO number which is added manually over the invoice. (refer attached). during the testing also we have given 90% of of the time the documents where the po number is stamped clearly. but unfortunately the model doesn't extract it rightly. the accuracy rate is 60%!!!
and here the accuracy i mean is not the confidence score , if some one edited the existing PO number we consider that as in-accurate. so which means out of 10 documents in 4 some one has to edit the po number extracted by the model . which is bad. is there any other better way of training ?
as i said above 90% of the times we give the document in which the po number is imprinted. the remaining 10% has the po printed partially , partially hand written etc., i have given these samples also during training , would these kind of documents be responsible to confuse the model to extract the right po number ?
experts please advice.
below is the 4 different types of PO i observed appearing in our invoices. i have one idea i can re train the model as in most cases the number is sorrounded by the text PO . but if the AI model is already taking the nearby text to extract this effort may go in vein , as it involves training 200 Plus documents
