web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Invoice PDF data extra...
Power Automate
Unanswered

Invoice PDF data extraction before Power Automate, AI Builder vs external tools?

(0) ShareShare
ReportReport
Posted on by
Hi everyone,
We’re setting up a flow where we receive a high volume of invoice PDFs (different suppliers, some scanned, some multi-page) and need to push structured data into Power Automate / Power Apps.
The fields we need are typical invoice fields:
invoice number
vendor name
invoice date
tax / VAT
totals
line items
We’re currently evaluating two approaches:
Using AI Builder directly inside Power Automate
Preprocessing invoices with an external invoice data extractor (for example, tools like DigiParser or similar) and then sending structured JSON into Power Automate
Has anyone here tried either approach at scale?
Specifically curious about:
reliability with different invoice layouts
handling multi-page invoices
cost and performance trade-offs
whether external preprocessing simplifies flows long-term
Would appreciate hearing real-world experiences or recommendations.
Thanks!
Categories:
I have the same question (0)
  • takolota1 Profile Picture
    4,980 Moderator on at
    I’d suggest using AI Builder / Copilot LLM prompt actions & the document input so you don’t have to try to train a machine learning model for a bunch of different formats.
    My only concern with that is when I previously tried document uploads on prompts to GPT4 it would not recognize all characters correctly, so if that is still the case then you can OCR the files & feed the prompt the file text replica like this example template: https://community.powerplatform.com/galleries/gallery-posts/?postid=31e67eea-3f73-47b4-95b7-fe4a7b646389
  • CU19050456-1 Profile Picture
    10 on at

    Hi @takolota1,

    I am currently using a similar approach to address the invoice processing requirements. However, I am facing challenges with extraction accuracy, which is currently around 55% for the overall process in production.

    The primary challenge is that we receive multiple types of documents and a large variety of invoice formats (approximately 100+ different formats/vendors). Due to these layout variations, OCR extraction quality and field mapping consistency are significantly impacted.

    I would appreciate your suggestions or recommendations regarding:

    • Improving extraction accuracy in “Get text from document”

    • Best practices for handling multiple document formats

    • Any available documentation, benchmarks, or references related to AI Builder OCR accuracy

    • Recommended approaches for scalable invoice-to-JSON extraction

    Any guidance or production learnings would be very helpful.

    Thanks in advance.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the April Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Vish WR Profile Picture

Vish WR 784

#2
Valantis Profile Picture

Valantis 581

#3
Haque Profile Picture

Haque 545

Last 30 days Overall leaderboard