Power Automate

Issue with PDF Invoice Classification Logic in Power Automate Flow

(1) Share

Report

Posted on by IB-22040114-0

Hi all,

I’m building a Power Automate flow to process email attachments and store only valid invoice PDFs into SharePoint.

Current design:

Trigger: New email with attachments
Loop: Apply to each attachment
PDF check
Classification using email body text (Html-to-text)
Positive: invoice, tax invoice, invoice no, inv
Negative: quotation, proforma, credit note, delivery order, etc.
SharePoint duplicate check (Get items)
Store valid invoices, others routed to “Others” folder

Issue:

Classification is inconsistent in real scenarios:
Proforma/quotation emails sometimes pass invoice filters due to email text noise (forwarded/replied emails).
No validation at PDF/document level (only email body is used).
Multiple attachments and mixed email content reduce accuracy.

Question:

What is the best practice for reliable invoice vs non-invoice PDF classification in Power Automate?
Should this be:
email text-based filtering
document-level OCR / AI Builder extraction
or a hybrid approach?
Also, how do you usually separate classification, validation, and SharePoint storage for better reliability and maintainability?

Categories:

Building flows

I have the same question (0)

All responses (12)

Answers (0)

Sort by

Suggested answer

11manish 2,829 on at

Like (0)

Report
Copy link

Link copied!
For reliable invoice processing in Power Automate, avoid relying solely on email body text because forwarded emails, replies, signatures, and mixed attachments can

cause inaccurate classification.

A hybrid approach is recommended:

Validate the attachment (PDF, file size, etc.).

Extract and analyze the PDF content using:

AI Builder Invoice Processing

AI Builder Document Processing

Azure AI Document Intelligence

Classify the document based on invoice-specific fields such as:

Invoice Number

Supplier Name

Invoice Date

Total Amount

Validate the extracted data and confidence score.

Check for duplicates using business keys (e.g., Supplier + Invoice Number) rather than filenames.

Store the document in the appropriate SharePoint folder (Processed, Duplicate, Validation Failed, Others, etc.).

Recommended Flow:

Email Received -> Attachment Validation -> OCR / AI Builder Extraction -> Invoice Classification -> Business Validation -> Duplicate Check -> SharePoint Storage

Best Practice: Use email content only as a secondary indicator. For production solutions, AI Builder Invoice Processing combined with SharePoint metadata validation and duplicate detection provides the highest accuracy and maintainability.

Was this reply helpful? Yes No
Suggested answer

Valantis 6,286 on at

Like (1)

Report
Copy link

Link copied!

Hi @IB-22040114-0,

Email body classification alone will always be unreliable for the exact reasons you're hitting forwarded threads, signatures, and quoted text create too much noise.

The recommended approach is hybrid, with document-level classification as the primary signal:

1. AI Builder Document Classification: train a custom model on your actual invoice PDFs vs non-invoice PDFs. This classifies based on document content, not the email. It's the most reliable approach and handles mixed attachments correctly since each PDF is classified independently. Add this after your PDF check, before the SharePoint step.

2. Keep email body as a pre-filter only: use it to exclude obvious non-invoices early (e.g. if the subject contains "quotation") but don't rely on it for positive classification.

3. Structure the flow in three separate scopes: Classification scope (AI Builder call), Validation scope (check required fields exist invoice number, date, amount), Storage scope (SharePoint write). Separating these makes error handling and debugging much cleaner.

For multiple attachments: process each attachment independently inside Apply to each, not as a batch. Each PDF gets its own classification result.
AI Builder document classification requires AI Builder credits but the accuracy improvement over text matching is significant for production use.

Ref: https://learn.microsoft.com/en-us/ai-builder/document-classification-overview

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No
IB-22040114-0 41 on at

Like (1)

Report
Copy link

Link copied!

Hi 11manish, thank you for the advice. Here the flow that I built based on what you suggest. For the AI Builder, I wanted to use from AI Hub in the AI model, is it can, does it require much time?

Was this reply helpful? Yes No
Suggested answer

Valantis 6,286 on at

Like (1)

Report
Copy link

Link copied!

Hi @IB-22040114-0,

the AI Builder Invoice Processing model is a prebuilt model, meaning it requires no training time at all. It's ready to use immediately from Power Automate without any setup.

In your flow, add the Process and save invoices or Extract information from invoices action from the AI Builder connector. It handles standard invoice fields (invoice number, date, vendor, amount, line items) out of the box.

If you want to use it from AI Hub in Power Apps / make.powerapps.com, you can test it there first to see the extracted fields, but for production use in your flow just add it directly as a Power Automate action.

The only thing you need is sufficient AI Builder credits in your environment. The prebuilt invoice model consumes credits per page processed.

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No
IB-22040114-0 41 on at

Like (1)

Report
Copy link

Link copied!

Hi Valantis, I already asked my supervisor to add the AI Builder Credit and she granted it, but after I add in the power automate, I am having this expression in the action Process Invoice. Any suggestion on how to fix it?

Was this reply helpful? Yes No
Suggested answer

Valantis 6,286 on at

Like (1)

Report
Copy link

Link copied!

Hi @IB-22040114-0,

That JSON object ({"consumptionSource":"PowerAutomate"...}) is the AI Builder billing metadata appearing in the wrong field. It means the AI Builder action input isn't correctly mapped.

The most likely cause: the AI Document file input field is receiving the wrong dynamic content. The Process Invoice action expects the file content as base64 encoded binary, not a metadata object.

Fix:
1. In the Process Invoice action, click on the AI Document (file) input field
2. Make sure you're passing the attachment content from your Apply to each loop specifically the attachments body/contentBytes field from the trigger, not the attachment metadata
3. The expression should be something like: triggerOutputs()?['body/attachments'][0]['contentBytes'] or the dynamic content item()?['contentBytes'] from inside Apply to each

If you're getting the attachment from a Get attachment content action, pass the Body output of that action to the AI Builder file input.

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No
IB-22040114-0 41 on at

Like (1)

Report
Copy link

Link copied!

Hi Valantis, the AI credit is not available, so for the current flow design I need to do manually. Is there any suggestion for implement without AI?

Was this reply helpful? Yes No
Suggested answer

Valantis 6,286 on at

Like (1)

Report
Copy link

Link copied!

Hi @IB-22040114-0,

Without AI Builder, you'll need to rely on rule-based classification. Here's the most reliable approach without AI credits:

1. Extract PDF text content: use the Get file content action to get the PDF, then use the Encodian Convert PDF to text action (if available in your environment) or use the HTML-to-text approach you already have on the email body. This gives you document-level text rather than just the email body.

2. Apply keyword rules on the extracted text with scoring. Instead of a simple contains check, score each keyword hit:
- Contains 'Invoice No' or 'Invoice #' or 'INV-': +2 points
- Contains 'Total Amount' or 'Amount Due': +1 point
- Contains 'Quotation' or 'Proforma' or 'Quote': -3 points
- Score > 1: classify as invoice

3. Use both attachment name AND content. If the filename contains 'INV' or 'invoice', weight that as an additional signal.

4. For subject line pre-filtering: add a condition before the loop. If the email subject contains 'quotation' or 'proforma', route directly to Others without processing attachments.

This won't be as accurate as AI Builder but is significantly better than email body only because you're reading the actual document content.

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No
IB-22040114-0 41 on at

Like (1)

Report
Copy link

Link copied!

Hi Valantis, thank for the advice. I already follow as you suggested based on scoring email.
Here is a flow design that I currently built:

Trigger:
When a new email arrives (V3)

↓
Compose – Subject
Compose – SenderEmail
Compose – MessageID
Compose – HTML Body
Html to Text (Body Clean)
Compose – Cleaned Body
Compose – Sender Domain

↓
Condition – Pre Filter Check
(IF email is relevant)

├── ❌ NO
│ → Compose: "No Invoice"
│ → END
│
└── ✅ YES (Continue Branch)

↓
Condition – Has Attachment?

├── ❌ NO
│ → Compose: "No Attachment"
│ → END
│
└── ✅ YES

↓
Apply to each (Attachments)

↓
Compose – Check Name
(toLower(item()?['name']))

↓
Condition – Business Document Check
(PDF filter only)

├── ❌ FALSE
│ → Compose: "Skipped (Not PDF)"
│
└── ✅ TRUE (PDF ONLY)

↓
────────────────
📊 SCORING ENGINE
────────────────

Compose – PositiveScore
Compose – NegativeScore
Compose – FinalScore

↓
Condition – Score Decision

├── ❌ LOW SCORE
│
│ ↓
│ Create File → /Non Invoice/
│
│ ↓
│ Create Item (SharePoint)
│ Status = Non Invoice
│ ConfidenceScore = FinalScore
│
│
└── ✅ HIGH SCORE

↓
───────────────────────
✅ BUSINESS VALIDATION
───────────────────────

Compose – UniqueKey

↓
Get Items (SharePoint)
(Check duplicate)

↓
Condition – Duplicate?

├── ✅ YES (Duplicate)
│
│ ↓
│ Create File → /Duplicate/
│
│ ↓
│ Create Item
│ Status = Duplicate
│ ConfidenceScore = FinalScore
│
│
└── ❌ NO

↓
Condition – Validation Check

├── ❌ FAILED
│
│ ↓
│ Create File → /Validation Failed/
│
│ ↓
│ Create Item
│ Status = Validation Failed
│ ConfidenceScore = FinalScore
│
│
└── ✅ PASSED

↓
Create File → /Processed/

↓
Create Item
Status = Processed ✅
DocumentType = Invoice
ConfidenceScore = FinalScore

I know that I need to filter from the subject and body email to identify the attachment. But I am having a hard time to identify from the pdf file that usually name for example like NVP0201_422124009.pdf, BWS1.pdf, BLOW WOLL - 260050.pdf
and PK26008979.pdf, start from the scoring engine. Then when run even the file is skipped. Can you suggest solution?

Was this reply helpful? Yes No
Suggested answer

Valantis 6,286 on at

Like (1)

Report
Copy link

Link copied!

Hi @IB-22040114-0,

The flow structure looks good. The issue is that filenames like NVP0201_422124009.pdf and BWS1.pdf contain no obvious invoice keywords, so the filename scoring gives 0 points and the files get skipped or classified as non-invoice.

For the scoring engine, shift the weight away from filename and toward the email subject and body since you don't have PDF text extraction:

For the PositiveScore compose, combine checks:
- Email subject contains 'invoice', 'inv', 'tax invoice': +3
- Email body contains 'invoice no', 'amount due', 'total amount': +2
- Sender domain is a known supplier: +1
- Filename starts with known prefixes (NVP, BWS, etc. from your suppliers): +1

For the NegativeScore:

- Subject or body contains 'quotation', 'proforma', 'quote', 'credit note': -3

For the filename specifically, you can build a known prefix list for your suppliers over time. Store them in a SharePoint list and check if the filename starts with any of those prefixes using a Get items call against your prefix list.

The key insight: with these filenames, the email context (subject, sender, body) is more reliable than the filename itself. Your pre-filter condition before the Apply to each loop is actually the most important classification step make sure it's doing the heavy lifting.

Best regards,

Valantis

✅ If this helped solve your issue, please Accept as Solution so others can find it quickly.

❤️ If it didn’t fully solve it but was still useful, please click “Yes” on “Was this reply helpful?” or leave a Like :).

🏷️ For follow-ups @Valantis.

📝 https://valantisond365.com/

💼 LinkedIn

▶️ YouTube

Was this reply helpful? Yes No