
Announcements
Hi There,
I am trying to automate a flow that extracts the same information from each page of a multi-page document. It's a package of engineering drawings. Each page has exactly the same layout where the key pieces of information (drawing number, document revision, job number, etc.) are in the same location. However, the number of pages in each document may differ. Ideally I would like to add a line to a table from each page document (I am starting with excel, but am buffing up knowledge for SQL/other databases depending on what my company funds) table will look something like this:
| DATA SOURCE --> | FROM DOCUMENT COLLECTION | FROM FILE NAME OR SIMILAR | FROM PROCESSING AI | FROM PROCESSING AI | FROM PROCESSING AI | FROM PROCESSING AI | FROM PROCESSING AI |
| COLUMN TITLE --> | Document Type | DWG Package | WO Number | CRN | Vendor DWG Number | DWG Revision | Customer Approval |
| ROW 1 | Structural Drawing | Package 1 | xxxxxx | DWG-STR-xxxxxx | D-VEN-CUST22-xxx | 0 | [initials] |
| ROW 2 | Structural Drawing | Package 1 | xxxxxx | DWG-STR-xxxxxx | D-VEN-CUST22-xxx | 1 | [initials] |
| ROW 3 | Piping Isometric | Package 2 | xxxxxx | DWG-PIP-xxxxxx | D-VEN-CUST22-xxx | 1 | [initials] |
| ROW 4 | Structural Drawing | Package 3 | xxxxxx | DWG-STR-xxxxxx | D-VEN-CUST22-xxx | 0 | [initials] |
| ROW 5 | Piping Isometric | Package 4 | xxxxxx | DWG-PIP-xxxxxx | D-VEN-CUST22-xxx | 0 | [initials] |
The red coloured headings are that fields I can reliably extract from the first page of each package with the document processing AI, these are the most important pieces of information for me to extract. The green 'customer approval' column would be useful, currently I have this information being extracted as a table along with the yellow information below, once again only from the first page at the moment. Yellow information is not necessary but as it's in a table it seems easy to gather it as a table. Below is an example from a drawing, black items do not need to be captured and have been redacted.
Basically what I want the AI to do is just repeat the same field processing on each page of a PDF adding a line for each page.
Hi @StruggleTownAUS - thanks for sharing your use case. It looks like automating it can save you a lot of time. 🙂
When training the document processing model in AI Builder, did you tag all tables on the document as shown here? https://learn.microsoft.com/en-us/ai-builder/create-form-processing-model#multipage-tables
Another thing you can try is selecting 'Unstructured documents' on the first step of the training process: https://learn.microsoft.com/en-us/ai-builder/create-form-processing-model#select-the-type-of-document This uses a newer AI technology behind the scenes that performs better with multipage tables. Despite it's name, it also works great on structured documents. 🙂