Answered

Extract data from pdf in power automate desktop Problem 3-Need Help

(0) Share

Report

Posted on by bdmafuz

Hello @Agnius you solve my Extract data from pdf in power automate desktop Problem-Need Help & Extract data from pdf in power automate desktop Problem 2-Need Help post. Now i have another issue. In my flow (Pic-1) i extract data from single page (Pic-2) but i have multi pages pdf file. When i select one multi pages pdf file it write only first page data. Can you tell me how can i do this.

Thanks & regards

bdmafuz

Categories:

Power Automate Desktop

I have the same question (0)

All responses (14)

Answers (1)

Sort by

Agnius Bartninkas Most Valuable Professional on at

Like
a
(1)

Report
Copy link

Link copied!

In your Extract text from PDF action, set the "Page(s) to extract" value to "All" and it will extract all the text from all pages.
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.

Was this reply helpful? Yes No
bdmafuz 39 on at

Like
a
(0)

Report
Copy link

Link copied!

@Agnius i check before (Pic-3) but it not work. Need set any loop ???

Was this reply helpful? Yes No
Agnius Bartninkas Most Valuable Professional on at

Like
a
(1)

Report
Copy link

Link copied!

What exactly did not work? What is it that you expect and what actually happens? Please provide more context.

Was this reply helpful? Yes No
bdmafuz 39 on at

Like
a
(0)

Report
Copy link

Link copied!

@Agnius First see my flow (Pic-1 & 2). When i run with select "All" it only write data from 1st page. But my pdf have 20 pages. If i select "Range" from-1 to-20 then it write data from 1st page, if i select "Single" and mentioned page number it can write data from my desire page. I want to write data from all pages.

Thanks
bdmafuz

Was this reply helpful? Yes No
Agnius Bartninkas Most Valuable Professional on at

Like
a
(1)

Report
Copy link

Link copied!

Okay, I see. I thought you simply had a single case in a file that has multiple pages. But if you have multiple pages, each of which contains a separate case that you need to process, you do need another loop inside the existing one. You would need to loop through the pages and extract the text one page at a time.
I suggest doing a while loop with the condition being 1 = 1 (infinite loop) and then have a page index that you increment on each iteration with an Increase variable action.
Then use the page index in the Extract text from PDF action, so you keep reading one page at a time.
Finally, add a rule that makes your flow go to a label that is outside of the while loop on error on the Extract text from PDF action. This will make your flow stop extracting pages when you reach the final page. This is needed because there is no native way to get the number of pages in a PDF document, so you don't automatically know how many pages you need to extract.
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.

Was this reply helpful? Yes No
bdmafuz 39 on at

Like
a
(0)

Report
Copy link

Link copied!

Hello @Agnius i think i found my problem it happen for my "Split Text" variable (Pic-1, attach previous replay). In my pdf i need "Employee ID" (Pic-4) but when i "Extract text from pdf" it show like this (Pic-5 & 6). Now how can i extract data with "Employee ID" & "eTin" ???

Was this reply helpful? Yes No
bdmafuz 39 on at

Like
a
(0)

Report
Copy link

Link copied!

@Agnius Can you give me screen short please

Was this reply helpful? Yes No
Agnius Bartninkas Most Valuable Professional on at

Like
a
(1)

Report
Copy link

Link copied!
Here you go:
You will need to add your actions for retrieving specific values in the place where I added the comment.

The Extract text from PDF action has the following On error settings:
This will make it stop extracting pages from the current file, when the extraction fails (assuming there are no more pages in the file) and continue to the next file.

Here's a snippet:
LOOP FOREACH CurrentItem IN Files SET PageIndex TO 1 LOOP WHILE (1) = (1) Pdf.ExtractTextFromPDF.ExtractTextFromPage PDFFile: CurrentItem PageNumber: PageIndex DetectLayout: False ExtractedText=> ExtractedPDFText ON ERROR GOTO 'Exit nested loop' END # Your existing actions go here Variables.IncreaseVariable Value: PageIndex IncrementValue: 1 END LABEL 'Exit nested loop' END

-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.

Was this reply helpful? Yes No
bdmafuz 39 on at

Like
a
(0)

Report
Copy link

Link copied!

Hello @Agnius Thanks for your help, by my flow Pic-1) it can write data on excel. On my folder have only one pdf file have 20 pages after 20 pages flow not save & close, it continue page 21,22,23..... and write last data again and again (Pic-2) loop not stop. When i check I found on my “Extract text from PDF” flow “On error” tab when I select label and save, it turns red automatically and deselect label (Pic-3).

Was this reply helpful? Yes No
Agnius Bartninkas Most Valuable Professional on at

Like
a
(1)

Report
Copy link

Link copied!
Okay, that seems like a bug with the Go To functionality.
Try using an On block error instead like this:
The On block error is set up to go to end of block:
Here's the snippet:

LOOP FOREACH CurrentItem IN Files SET PageIndex TO 1 BLOCK ON BLOCK ERROR END LOOP WHILE (1) = (1) Pdf.ExtractTextFromPDF.ExtractTextFromPage PDFFile: CurrentItem PageNumber: PageIndex DetectLayout: False ExtractedText=> ExtractedPDFText # Your existing actions go here Variables.IncreaseVariable Value: PageIndex IncrementValue: 1 END END END
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.

Was this reply helpful? Yes No