web
You’re offline. This is a read only version of the page.
close
Skip to main content
Community site session details

Community site session details

Session Id :
Power Automate - AI Builder
Answered

Extracting data from semi-structured pdf?

(0) ShareShare
ReportReport
Posted on by 6
Hi all, 
 
Wanted to see if anyone has run into this problem before. I have set of pdf's which are semi-structured (attached) and i'm hoping to extract data from the 'pesticide production information' section. The issue is that for each pdf, they can have varying number of pages, and varying number of sections per page (a new section starts when 42. is the first cell in the table). Is there any way to build an AI model that will extract the pesticide production information no matter the pdf format? Or will i have to train the model with a bunch of examples of varying types. Is it possible to utilize power automate to do this? Thanks in advance!
Categories:
I have the same question (0)
  • Verified answer
    takolota1 Profile Picture
    4,944 Moderator on at
    Extracting data from semi-structured pdf?
    I recommend using OCR & GPT in this case, like the set-up in this template:
    You can request GPT to output an array with a JSON object for each of the dynamic number of sections found in your PDF, just like the Product Lines in the invoice example.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Responsible AI policies

As AI tools become more common, we’re introducing a Responsible AI Use…

Chiara Carbone – Community Spotlight

We are honored to recognize Chiara Carbone as our Community Spotlight for November…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 650 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 341 Moderator

#3
developerAJ Profile Picture

developerAJ 256

Last 30 days Overall leaderboard