web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Help creating an autom...
Power Automate
Unanswered

Help creating an automated workflow - extracting data from .pdfs

(0) ShareShare
ReportReport
Posted on by 3

Hi all,
This is a very specific question, but I hope someone may be able to help with finding a solution. I feel like Power Automate might be able to do this but, as I'm a novice at this type of automation, I'd be happy to get feedback from the members here.


I work for a Maintenance contractor and my job involves receiving PDF "work orders" from 6-7 clients. Each client has a different looking format to their pdf, but the all (more or less) contain the same information. It's just the layout that differs.
I'm seeking an automated solution to extract key pieces of data from the PDFs, then create a new PDF with solely the extracted data included... Essentially this would get rid of all the 'fat' and output a basic, trim version of the original document.
The types of fields we'd be extracting are relatively short/basic. Things like "Name", "Reference Number", "Address", "Job Description," etc. 
 
In an ideal world, my workflow would go something like this:
1. I email the original PDF 'work order' to an email address/inbox
2. Data is automatically extracted from PDF.
3. A new PDF is compiled with only the extracted information, into a standardised document
4. The new PDF is emailed back to me.

Does anyone have any suggestions as to how this could be carried out? Is Power Automate (or anything else) suitable for this type of task
 
FYI. We'd be doing this around 200 times per month, on a job by job basis, so it's not a huge amount of data to process... It's just boring to do manually 🙂 
 
Thanks in advance for your help.
 
R.

Thanks in advance for any help you can provide.
Categories:
I have the same question (0)
  • takolota1 Profile Picture
    4,980 Moderator on at

    AI Builder will have some document processing solutions. However, I don’t like it for trying to handle many different formats.

     

    For many different formats, I use a flow that extracts the text from each document, then passes the text to a GPT prompt

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

  • RakeshS Profile Picture
    3 on at

    Thanks for your reply, Takolata.

    That could certainly be an option. All the text in the original pdf will be typed text, so should be pretty easy to read in an OCR. Obviously the highest risk fields for this would be for fields such as "reference number" where errors may not be immediately obvious. I'll look into this further.

     

    How would the AI Builder workflow go?

    As there are only 6 different pdf formats, would it be too tedious to train 6 different models with their specific pdfs only?

     

    Thanks again,

     

    R.

  • takolota1 Profile Picture
    4,980 Moderator on at

    @RakeshS 

    Oh if you only have 6 potential different formats then the AI Builder document processing shouldn’t be too bad. I believe there are some options to train a single model on a few different formats.

  • ARB_wcc Profile Picture
    283 Super User 2024 Season 1 on at

    You can easily train a single custom model with 6 "document collections" inside it.

     

    Alternatively, @takolota suggestion can be quite effective, transform all docs to OCR and build a GPT prompt to review and transform the data in the way you want.

     

    As for the PDF creation part, not sure how that will work, will probably need to rely on third-party connectors or services within your flow.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Leaderboard > Power Automate

#1
Haque Profile Picture

Haque 589

#2
Valantis Profile Picture

Valantis 328

#3
David_MA Profile Picture

David_MA 284 Super User 2026 Season 1

Last 30 days Overall leaderboard