Hello everybody,
I am currently working on a data extraction model for purchase orders. The number of ordered items per order can range from one to more than a hundred, leading them to vary between one and 30+ pages. The single items are listed in similar layouts, although not in a table layout. This leaves me with two questions:
First, is it correct to use unstructured document format for the AI model.
Second, should I create collections for as many purchase orders with the same number of items as possible, so one collection for p.o. with 1 part, one for p.o. with 2 parts, and so on (noted p.o. with more than 10 items are very rare, usually only one of a certain amount and with large gaps the higher the number of items goes. e.g., three with 20, 2 with 23, one with 30, one with 38 and so on.). Or would it be smarter to use one big collection with all purchase orders available (about 300-400).
Thanks in advance for the help!