Answered

Unable to extract data from pdf

(0) Share

Report

Posted on by AT88

Hi All , I have pdf file with multiple pages. Inside the foreach Loop, I am using Extract text from PDF action and all the text value are saved in ExtractedPdfText variable. I would like to extract the invoice name from the pdf file. But for each and every pdf page the variable index value differs for the invoice name. So the output is coming wrong. But all the pdf pages looks like structured one.

By using Extract pdf file pages to new pdf file action, i need to save each file with the invoice name. But it is saving the file with the wrong name because of index value changes for each page. Can you please guide me how i can extract the invoice name correctly?

Thanks in advance!

Categories:

Power Automate Desktop

I have the same question (0)

All responses (4)

Answers (2)

Sort by

Verified answer

UshaJyothiKasibhotla 225 Moderator on at

Like
a
(3)

Report
Copy link

Link copied!

Use regex and put that pattern in the parse text activity....
Please send me one sample data so that I will give full clarification

Hope thi helps
Usha

Was this reply helpful? Yes No
Th11 on at

Like
a
(0)

Report
Copy link

Link copied!

I am also running into this kind of similar issue

Was this reply helpful? Yes No
AT88 34 on at

Like
a
(0)

Report
Copy link

Link copied!

Hi Usha,
Thank you so much for your response. Eg: In the pdf file, filename will be like this File Name: 20231211-1
PDF file has multiple pages. The PAD has to extract this value from all the pdf pages and using the Extract pdf file pages to new pdf file action, it has to save the new file with the appropriate filename. But for some reason in the variable index it is taking the date value (1/5/2024) and it is giving the wrong output. please guide me.

Was this reply helpful? Yes No
Verified answer

Agnius Bartninkas Most Valuable Professional on at

Like
a
(0)

Report
Copy link

Link copied!

Use Parse text on the %ExtractedPDFText% value, with the "Is regular expression" toggle enabled, and write a regex pattern to find the value. You could use (?<=File Name:\s)[\d-]+ as the regex. And then the %Match% variable will return the file name.

If a single file can contain more than one file name, disable the "First occurrence only" toggle, and you'll get all the matches in a list stored in %Matches%:

-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.

I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.

Was this reply helpful? Yes No