web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Document Classification
Power Automate
Suggested Answer

Document Classification

(0) ShareShare
ReportReport
Posted on by 3

I’m currently building a solution to convert all PDF documents into a standardized naming convention. For example, an Invoice should become Inv, and a Proforma Invoice should become PI, and so on by using AI BUILDER in Power Automate Cloud.

 

Please note:


 
 
 

  • Each folder contains 20 document types, and I have hundreds of different folders.
  • Each document type may have up to 10 variations.
  • Each document type can have various naming formats when inserted into the folder.
  • Some documents may be partially scanned PDFs, while others are digital PDFs.

 

Example Table of Naming Convention

Document Type Standard Name
Invoice Inv
Proforma Invoice PI
Delivery Note DN
Purchase Order PO

 

Given these conditions, I would like to understand if this approach (Document classification using AI Builder) is truly feasible based on your past experience or similar use cases.

 

 

Categories:
I have the same question (0)
  • Suggested answer
    SwatiSTW Profile Picture
    807 Super User 2026 Season 1 on at
    Using AI Builder's Document Classification in Power Automate Cloud is a feasible solution for standardizing PDF file names based on content, but it requires proper setup.
    To achieve accurate results, you’ll need a vast and well-labeled training dataset (50–100 samples per document type), covering all layout variations.
    Scanned PDFs should be preprocessed using OCR, while digital PDFs can have text extracted directly.
    Because filenames are inconsistent, the model must rely on document content, not naming.
    You’ll need a mapping table to convert document types to standard short codes (e.g., Invoice → Inv).
    Build logic to handle low-confidence predictions (e.g., manual review if confidence < 80%).
    Be mindful of AI Builder licensing and credit costs, especially at scale.
    A hybrid model (AI + rule-based logic) can improve reliability in complex cases.
  • CC-09010818-0 Profile Picture
    3 on at
    Thank you very much SwatiSTW.
     
    1. From your suggestion of i’ll need a vast and well-labeled training dataset (50–100 samples per document type), covering all layout variations.
    May I know where I should select this option in AI Builder?
     
     
    2. Scanned PDFs should be preprocessed using OCR, while digital PDFs can have text extracted directly.
    I’m currently using Document Processing > Extract custom information from documents (Custom Model), which actually allows us to place the cursor on keywords (for both digital PDFs and scanned PDFs).

    3. Because filenames are inconsistent, the model must rely on document content, not naming.
    Yes, I fully agree. That’s why I have the idea below—what is your suggestion?
     
     
    My approach is to train a custom AI Builder model to extract key words (like ‘Invoice’, ‘Customer’, ‘Supplier’) from document content, then use Power Automate conditions such as If extracted text contains 'Invoice' AND 'Customer' AND 'Supplier' → classify as Invoice to determine the document type and rename the file based on a mapping table
     
     
     
    4. You’ll need a mapping table to convert document types to standard short codes (e.g., Invoice → Inv).
    Yes, a mapping table is definitely needed.

    5. Build logic to handle low-confidence predictions (e.g., manual review if confidence < 80%).
    Since the number of folders is large, in the future I will need to check for false positives as well as false negatives.

    6. Be mindful of AI Builder licensing and credit costs, especially at scale.
    A hybrid model (AI + rule-based logic) can improve reliability in complex cases.
    Yes, I agree—especially when deploying using a custom model. I do have some document types with fixed formats (though not many). I think you’re suggesting using a text extraction engine like Adobe for simple text extraction, right? But the first step—identifying the document type—is still critical. Any suggestions are welcome.
     
     

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the March Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Haque Profile Picture

Haque 605

#2
Valantis Profile Picture

Valantis 340

#3
11manish Profile Picture

11manish 284

Last 30 days Overall leaderboard