web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Need Suggestions for S...
Power Automate
Suggested Answer

Need Suggestions for Scalable Invoice Processing Architecture in Power Platform

(0) ShareShare
ReportReport
Posted on by 10

Hi Everyone,

I’m currently working on an invoice automation and reconciliation solution using Power Automate + AI Builder/OCR, where invoices are processed and converted into structured JSON for downstream SAP reconciliation.

We are facing major challenges because invoices come from more than 1000+ vendors, and every vendor has a different invoice layout, table structure, font style, alignment, tax section, and page format.

Our current production accuracy is around 60%, which is creating significant manual validation effort.

Some common issues we are facing:

  • Incorrect extraction of invoice number, GSTIN, dates, and totals
  • Line item tables breaking across pages
  • Multi-line descriptions shifting columns
  • Different tax structures (CGST/SGST/IGST/TDS)
  • Low-quality scans and image-based PDFs
  • OCR reading incorrect numeric values
  • Header/footer duplication on multi-page invoices

One critical issue:
In some invoices, the OCR/model reads quantities incorrectly. For example, it gets confused between values like 5 and 6, along with several similar numeric recognition issues.

This causes reconciliation failures and incorrect invoice posting.

Currently, we are mainly using:

  • AI Builder “Get text from document”
  • Power Automate parsing
  • JSON mapping logic

However, since this is primarily OCR-based extraction and not vendor-trained document understanding, accuracy varies heavily depending on invoice quality and layout.

I would appreciate guidance or suggestions on:

  • Best approach for handling 1000+ vendor invoice formats
  • Improving OCR accuracy in production
  • Preprocessing techniques before AI Builder
  • Better approaches for table extraction
  • Whether Azure Document Intelligence/Form Recognizer performs better than standard AI Builder OCR
  • Hybrid approaches using OCR + LLM + rule engine
  • Confidence-score-based validation strategies
  • Recommended architecture for scalable invoice-to-JSON extraction

Additionally, has anyone worked with for enterprise invoice OCR scenarios?

  • Is ABBYY suitable for improving OCR reading accuracy in large-scale invoice processing?
  • How does ABBYY compare with Azure Document Intelligence/Form Recognizer for numeric field accuracy, table extraction, and low-quality scanned invoices?
  • Has anyone achieved better production accuracy using ABBYY in hybrid architectures with Power Platform or SAP integrations?

Also, if anyone has articles, documentation, benchmarks, or production learnings related to AI Builder extraction accuracy and large-scale invoice processing, that would be extremely helpful.

Would really appreciate recommendations and insights from teams handling enterprise-scale invoice processing systems.

Thanks in advance.

I have the same question (0)
  • Suggested answer
    Vish WR Profile Picture
    3,246 on at
     

    Hi,

    60% accuracy with 1000+ vendor formats is expected when using AI Builder general OCR. It was never designed for this scale. The problem is the tool choice, not your implementation.
     
    Switch to Azure Document Intelligence
    The prebuilt invoice model handles multi-page invoices, broken line item tables, and mixed tax structures like CGST/SGST/IGST much better. No training needed to get started. For vendors with unique layouts, train a custom model with just 5 labelled samples and accuracy typically jumps to 85-90% for those vendors.
     
    Numeric Confusion like 5 vs 6
    This is almost always a scan quality issue. Preprocess your PDFs before extraction, increase DPI, deskew and improve contrast. Set a confidence threshold and anything below that should go to manual review rather than straight to SAP.
     
    On ABBYY
    Strong for numeric accuracy on poor quality scans but comes with heavy licensing cost and integration overhead. Azure Document Intelligence gives comparable results if you are already on the Microsoft stack
     
    Suggested Flow
    Invoice arrives → Preprocess image → Azure Document Intelligence
    → Confidence check → High: JSON to SAP / Low: Manual review
    → Power Automate orchestrates throughout
     
    https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence
    https://learn.microsoft.com/en-us/ai-builder/form-processing-model-overview
     
    Vishnu WR
     
    Please  Does this answer your question if my post helped you solve your issue. This will help others find it more readily. It also closes the item. If the content was useful in other ways, please consider answering Yes to Was this reply helpful? or give it a Like 
     
  • CU19050456-1 Profile Picture
    10 on at

    Hi @Vish WR,

    I am trying to use Azure AI Document Intelligence in Power Automate.

    I created the Azure resource successfully and have the endpoint and API key. However, in Power Automate I am only seeing the “HTTP Premium Connector” option instead of a direct Azure AI Document Intelligence/Form Recognizer connector.

    I already have a Power Automate Premium license, but I want to understand:

    1. Is HTTP connector the recommended way to integrate Azure AI Document Intelligence?
    2. Why is the dedicated connector not visible in my environment?
    3. Are there any licensing or regional limitations for the connector?
    4. What is the best practice for processing invoices/documents using Document Intelligence inside Power Automate?

    Currently I am planning to:

    • Send document using HTTP POST
    • Use Analyze API
    • Parse JSON response

    Any suggestions, documentation references, or recommended architecture would be very helpful.

    Thanks!

  • Nikki Profile Picture
    20 on at

    We ran into a very similar challenge and ultimately solved it by implementing a dual-extraction architecture instead of relying on a single method. We built two parallel approaches—AI Builder (OCR-based extraction) and an AI Prompt/LLM-based extraction that we introduced later for better handling of variability and context. I then daisy-chained the two together so that the AI Prompt runs first, and if it fails, the flow automatically falls back to AI Builder OCR. This was a huge game changer. Much stronger coverage across a wide range of invoice formats without needing to maintain vendor-specific templates. I initially deployed this solution in November 2024, and after introducing the AI Prompt enhancement in August 2025, we saw a significant improvement in performance.

    This solution is running at scale in production, processing over 14,000 emails/invoices per month across a large and constantly changing vendor base. At this point, the number of vendors or invoice formats doesn’t materially impact the flow, as the architecture is designed to handle variability without requiring vendor-specific customization. The entire process is triggered by a new email received in our AP mailbox with an attachment, and from there, invoices are extracted using the AI Prompt with OCR as a fallback and standardized into structured JSON.

    The biggest value comes after extraction—we rely heavily on a robust validation layer rather than trusting OCR output alone. This includes required field checks (invoice number, totals, dates), logical validations (such as totals matching line items), and business rules (like tax structure expectations and known formatting patterns). Based on these results, invoices are automatically routed into three paths: high-confidence invoices are auto-approved and posted, clearly invalid invoices are auto-rejected, and anything in between is routed to a Canvas App where users can review, correct, and approve or reject as needed before final upload to the system.

    We also store all processed invoice data in a Dataverse table that is connected to a Power BI report, which allows us to monitor performance, track trends, and easily reference historical invoices. From a quality standpoint, every invoice that has been questioned has ultimately proven to have been processed correctly by the system—it’s typically user error rather than an issue with the automation.

    Since implementing this architecture, we’ve improved from roughly a 60% auto-processing rate to about 85%, with the remaining ~15% routed to the review queue in the Canvas App. What made the biggest difference wasn’t just improving OCR accuracy—it was combining multiple extraction methods, layering strong validation logic, and implementing intelligent routing. That combination is what allowed us to scale effectively while maintaining high accuracy and significantly reducing manual effort.

    Also worth noting is that many of our invoices include handwritten text used for special coding. The AI Prompt is able to accurately interpret this handwritten content and evaluate it against predefined business rules. The extracted values are dynamically passed through the flow and validated as part of the overall decisioning process.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the April Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Vish WR Profile Picture

Vish WR 784

#2
Valantis Profile Picture

Valantis 581

#3
Haque Profile Picture

Haque 545

Last 30 days Overall leaderboard