Document Classification

(0) Share

Report

Posted on by CC-09010818-0

I’m currently building a solution to convert all PDF documents into a standardized naming convention. For example, an Invoice should become Inv, and a Proforma Invoice should become PI, and so on by using AI BUILDER in Power Automate Cloud.

Please note:

Each folder contains 20 document types, and I have hundreds of different folders.
Each document type may have up to 10 variations.
Each document type can have various naming formats when inserted into the folder.
Some documents may be partially scanned PDFs, while others are digital PDFs.

Example Table of Naming Convention

Document Type	Standard Name
Invoice	Inv
Proforma Invoice	PI
Delivery Note	DN
Purchase Order	PO

Given these conditions, I would like to understand if this approach (Document classification using AI Builder) is truly feasible based on your past experience or similar use cases.

Categories:

AI Builder

I have the same question (0)

All responses (2)

Answers (0)

Sort by

Suggested answer

SwatiSTW 809 Super User 2026 Season 1 on at

Like (1)

Report
Copy link

Link copied!

Using AI Builder's Document Classification in Power Automate Cloud is a feasible solution for standardizing PDF file names based on content, but it requires proper setup.
To achieve accurate results, you’ll need a vast and well-labeled training dataset (50–100 samples per document type), covering all layout variations.
Scanned PDFs should be preprocessed using OCR, while digital PDFs can have text extracted directly.
Because filenames are inconsistent, the model must rely on document content, not naming.
You’ll need a mapping table to convert document types to standard short codes (e.g., Invoice → Inv).
Build logic to handle low-confidence predictions (e.g., manual review if confidence < 80%).
Be mindful of AI Builder licensing and credit costs, especially at scale.
A hybrid model (AI + rule-based logic) can improve reliability in complex cases.

Was this reply helpful? Yes No
CC-09010818-0 3 on at

Like (0)

Report
Copy link

Link copied!

Thank you very much SwatiSTW.

1. From your suggestion of i’ll need a vast and well-labeled training dataset (50–100 samples per document type), covering all layout variations.

May I know where I should select this option in AI Builder?

2. Scanned PDFs should be preprocessed using OCR, while digital PDFs can have text extracted directly.

I’m currently using Document Processing > Extract custom information from documents (Custom Model), which actually allows us to place the cursor on keywords (for both digital PDFs and scanned PDFs).

3. Because filenames are inconsistent, the model must rely on document content, not naming.

Yes, I fully agree. That’s why I have the idea below—what is your suggestion?

My approach is to train a custom AI Builder model to extract key words (like ‘Invoice’, ‘Customer’, ‘Supplier’) from document content, then use Power Automate conditions such as If extracted text contains 'Invoice' AND 'Customer' AND 'Supplier' → classify as Invoice to determine the document type and rename the file based on a mapping table

4. You’ll need a mapping table to convert document types to standard short codes (e.g., Invoice → Inv).

Yes, a mapping table is definitely needed.

5. Build logic to handle low-confidence predictions (e.g., manual review if confidence < 80%).

Since the number of folders is large, in the future I will need to check for false positives as well as false negatives.

6. Be mindful of AI Builder licensing and credit costs, especially at scale.
A hybrid model (AI + rule-based logic) can improve reliability in complex cases.

Yes, I agree—especially when deploying using a custom model. I do have some document types with fixed formats (though not many). I think you’re suggesting using a text extraction engine like Adobe for simple text extraction, right? But the first step—identifying the document type—is still critical. Any suggestions are welcome.

Was this reply helpful? Yes No