Unanswered

Invoice PDF data extraction before Power Automate, AI Builder vs external tools?

(0) Share

Report

Posted on by CU25120923-0

Hi everyone,
We’re setting up a flow where we receive a high volume of invoice PDFs (different suppliers, some scanned, some multi-page) and need to push structured data into Power Automate / Power Apps.
The fields we need are typical invoice fields:
invoice number
vendor name
invoice date
tax / VAT
totals
line items
We’re currently evaluating two approaches:
Using AI Builder directly inside Power Automate
Preprocessing invoices with an external invoice data extractor (for example, tools like DigiParser or similar) and then sending structured JSON into Power Automate
Has anyone here tried either approach at scale?
Specifically curious about:
reliability with different invoice layouts
handling multi-page invoices
cost and performance trade-offs
whether external preprocessing simplifies flows long-term
Would appreciate hearing real-world experiences or recommendations.
Thanks!

Categories:

AI Builder

I have the same question (0)

All responses (2)

Answers (0)

Sort by

takolota1 4,980 Moderator on at

Like (0)

Report
Copy link

Link copied!

I’d suggest using AI Builder / Copilot LLM prompt actions & the document input so you don’t have to try to train a machine learning model for a bunch of different formats.

My only concern with that is when I previously tried document uploads on prompts to GPT4 it would not recognize all characters correctly, so if that is still the case then you can OCR the files & feed the prompt the file text replica like this example template: https://community.powerplatform.com/galleries/gallery-posts/?postid=31e67eea-3f73-47b4-95b7-fe4a7b646389

1 people found this reply helpful.

Was this reply helpful? Yes No
CU19050456-1 10 on at

Like (0)

Report
Copy link

Link copied!
Hi @takolota1,

I am currently using a similar approach to address the invoice processing requirements. However, I am facing challenges with extraction accuracy, which is currently around 55% for the overall process in production.

The primary challenge is that we receive multiple types of documents and a large variety of invoice formats (approximately 100+ different formats/vendors). Due to these layout variations, OCR extraction quality and field mapping consistency are significantly impacted.

I would appreciate your suggestions or recommendations regarding:

Improving extraction accuracy in “Get text from document”

Best practices for handling multiple document formats

Any available documentation, benchmarks, or references related to AI Builder OCR accuracy

Recommended approaches for scalable invoice-to-JSON extraction

Any guidance or production learnings would be very helpful.

Thanks in advance.

Was this reply helpful? Yes No