I'm trying to build an invoice processing model in AI builder that can work across multiple types of invoice from multiple vendors; at first glance, it *seems* like the solution is to create multiple collections, one for each vendor/invoice layout - then tag all my training files, and train the model.
However, I'm running into 2 major issues: for some collections, no matter how many times I correct the auto-detected fields to the appropriate data within a collection, the model never seems to retain this info; for reference, I have 10 collections, ranging from 5 to 45 documents apiece, for a total of 230 total tagged documents.
When I go to train the model, the numbers of detected fields don't match the actual numbers of fields I've tagged; for example, every single tagged document contained a correctly identified "InvoiceID" field, but on the training step, it only shows 21.
Every document within a collection contains the same fields, but not every collection does. It seems to be averaging its detection scores based on the total number of documents, and not per-collection, and it seems to be averaging its internal detection rules across all of the collections, instead of per collection.
If that's the case, what's the use in even using the collections?