web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Help creating an autom...
Power Automate
Unanswered

Help creating an automated workflow - extracting data from .pdfs

(0) ShareShare
ReportReport
Posted on by 3

Hi all,
This is a very specific question, but I hope someone may be able to help with finding a solution. I feel like Power Automate might be able to do this but, as I'm a novice at this type of automation, I'd be happy to get feedback from the members here.


I work for a Maintenance contractor and my job involves receiving PDF "work orders" from 6-7 clients. Each client has a different looking format to their pdf, but the all (more or less) contain the same information. It's just the layout that differs.
I'm seeking an automated solution to extract key pieces of data from the PDFs, then create a new PDF with solely the extracted data included... Essentially this would get rid of all the 'fat' and output a basic, trim version of the original document.
The types of fields we'd be extracting are relatively short/basic. Things like "Name", "Reference Number", "Address", "Job Description," etc. 
 
In an ideal world, my workflow would go something like this:
1. I email the original PDF 'work order' to an email address/inbox
2. Data is automatically extracted from PDF.
3. A new PDF is compiled with only the extracted information, into a standardised document
4. The new PDF is emailed back to me.

Does anyone have any suggestions as to how this could be carried out? Is Power Automate (or anything else) suitable for this type of task
 
FYI. We'd be doing this around 200 times per month, on a job by job basis, so it's not a huge amount of data to process... It's just boring to do manually 🙂 
 
Thanks in advance for your help.
 
R.

Thanks in advance for any help you can provide.
Categories:
I have the same question (0)
  • takolota1 Profile Picture
    4,974 Moderator on at

    AI Builder will have some document processing solutions. However, I don’t like it for trying to handle many different formats.

     

    For many different formats, I use a flow that extracts the text from each document, then passes the text to a GPT prompt

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

  • RakeshS Profile Picture
    3 on at

    Thanks for your reply, Takolata.

    That could certainly be an option. All the text in the original pdf will be typed text, so should be pretty easy to read in an OCR. Obviously the highest risk fields for this would be for fields such as "reference number" where errors may not be immediately obvious. I'll look into this further.

     

    How would the AI Builder workflow go?

    As there are only 6 different pdf formats, would it be too tedious to train 6 different models with their specific pdfs only?

     

    Thanks again,

     

    R.

  • takolota1 Profile Picture
    4,974 Moderator on at

    @RakeshS 

    Oh if you only have 6 potential different formats then the AI Builder document processing shouldn’t be too bad. I believe there are some options to train a single model on a few different formats.

  • ARB_wcc Profile Picture
    283 Super User 2024 Season 1 on at

    You can easily train a single custom model with 6 "document collections" inside it.

     

    Alternatively, @takolota suggestion can be quite effective, transform all docs to OCR and build a GPT prompt to review and transform the data in the way you want.

     

    As for the PDF creation part, not sure how that will work, will probably need to rely on third-party connectors or services within your flow.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 522 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 364 Moderator

#3
abm abm Profile Picture

abm abm 243 Most Valuable Professional

Last 30 days Overall leaderboard