
Hello all, I was hoping to get some troubleshooting tips for my model. When I test the model after training using one of the documents I used in the document set to train it, it highlights 100% of the field values and appears to be perfect. My overall score is 81% and I have 16 documents in the set I used for training.
However, when that same document is processed through the workflow, on some runs it skips random category fields, it's always "Category". The field is left out of the output of the AI action. The document is confidential but "Name", "Count" and "Category" occur 10 times in the document and run in order 1 through 10 onto a second page, like Name1, Count1, Category1 then, Name2, Count2, Category2 and so-on. Not sure if there is anything I can do, one important note is that in a few of the reports (pdfs) Category is blank, therefore the AI model is correct. I have other columns in the report that do not appear 100% of the time and are marked "Not Available in document" and they work as expected.
Since there was 0 response to my post, I thought I would share. The AI model is somewhat inconsistent even after running 19 samples through the model, the samples are the actual documents that I am using for the production data. With that said, the workflow action "Extract information from documents" has a ways to go before it's reliable. I am still using it, but always double check the output and find mistakes. I have 34 number fields, 2 date fields and 14 text fields. On some of the forms there are blanks (where that particular field was NA) so below are the scenarios i have to over come and how I did it.
1. Blanks: field does not appear in the form, some columns are omitted in some forms
2. Blanks: the field exists in the form but skipped/missing from the output
3. Extra characters: the output extracts a number for the number field and includes some text to the right like "12,345 DNS". This has occurred twice so far, the score for this column is 99 and was accurate on each test after training.
4. Missed target: this number field was accurate for 23 runs, then grabbed a text label instead of the number on a chart.
Not wanting to give up and wanting to continue to use AI, this is what I am using, it's not perfect, I still have to check the result against the document but far better than hand entering the data.
The above shows the process for just the first field. This returns a 0 if the field is blank, or missing and removes any text and extracts just the integer.
Select
From: chunk(variables('varDnsRequest'),1)
Map: if(contains('0123456789',item()),item(),'')
composeDnsRequests
Inputs: join(body('Select_2'),'')
ConvertDnsRequests
inputs: