web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Search specific text o...
Power Automate
Suggested Answer

Search specific text on a PDF and get the page number it exists

(1) ShareShare
ReportReport
Posted on by 2

Hi, I'm a beginner user of Power Automate.

I have around 10 page PDF (ex. page 1 to 10), and I would like to get the page number where the specific text (ex. SS-4) exists, and extract the one page PDF. The problem is the existing page is different for each PDF. Thus, I'm trying to use AI Builder, Power Automate, and Encodian to solve this.

 

First, I use AI Builder's "Recognise text in an image or a PDF document" and get the all context. Next, I add "Compose" and type the expression which orders search the specific text "ex. A" and get the page number. However I'm not sure what kind of expression I should use.

The text A is the unique text and should be only 1 page out of 10.

Appreciate if you can help me Thanks!

 

Memo, Because I use Mac, I can't access to Power Automate Desktop.

Categories:
I have the same question (0)
  • antoinec Profile Picture
    on at

    Hi Tomo,

     

    Here is one way you could do it. After extracting PDF contents with the "Recognize text in an image or a PDF document" action, iterate on "results" and "lines" to find a "text" item which matches your criteria. If it matches, execute the relevant action within that condition block.

     

    This is what it looks like in the Flow editor:

    antoinec_1-1645017404206.png

     

    And here's an example run with a match:

    antoinec_0-1645017319389.png

     

    Is this what you were looking for? (there might be ways to do this without the double "Apply to each" blocks ... if anyone has the info I'm keen to learn :))

     

    Antoine

  • joshfleisher Profile Picture
    2 on at

    Hello!

     

    To add to this, and I know this was some time ago, but I was interested in being able to pull the whole page that contains a certain word, and creating its own file. Like the example above, I may have 10 pages, but 2 of the PDF pages have the word that means I need the whole document. I need the 2 pages extracted, and its own file created to store. Is there a way to extract the whole page(s)?

     

    Thank you in advance!

     

    Josh Fleisher

  • JRosenboom Profile Picture
    2 on at

    Hello!
    Just on this, in the case of multiple keywords appearing on the same or different pages in a document, is it possible to extract which page each keyword appears on even with duplicates?
    e.g.
    Keyword appeared on:  p.g. 1, p.g. 1, p.g. 2, p.g. 3, p.g. 4, p.g. 4
    ?

  • theresia Profile Picture
    10 on at

    hi can u make the dekstop flow version?

  • Lakshya2025 Profile Picture
    4 on at

    Hello,

    When i use Split pdf by text it gives me the split pdf not singular pages with the text, is there any way to get the single pages with the specific text so that i can use ai model on those pages.

  • Suggested answer
    BoredFish Profile Picture
    26 on at
    @theresia
    PA Desktop does not natively have a function to iterate through pages of a PDF file to extract individual pages where specific keywords are found. However, it does have enough functionality to create your own method of accomplishing this task.
     
    To do this, we first identify the total number of pages in the PDF; then create a fixed quantity loop that looks through each page for specific text; and finally extracting each page that text was found on. Sounds simple until we realize there's also no function to tell us how many pages a PDF has. However, you can obtain the page count by asking it to extract more pages than could possibly exist and thus forcing the flow to create an error. We can then capture the error, extract it's text and use it to identify what page number the function got up to before the error occurred. Here's what that might look like:
    **REGION Obtain Total Quantity of Pages in PDF
    SET MyPDF_FilePath TO $'''YourFilePathGoesHere'''
    Pdf.ExtractPages PDFFile: MyPDF_FilePath PageSelection: $'''1-1000''' ExtractedPDFPath: MyPDF_FilePath IfFileExists: Pdf.IfFileExists.AddSequentialSuffix ExtractedPDFFile=> ExtractedPDF
        ON ERROR
    
        END
    ERROR => LastError
    SET LastErrTrimmed TO LastError.Message.Trimmed.Trimmed
    Text.SplitText.SplitWithDelimiter Text: LastErrTrimmed CustomDelimiter: $''',''' IsRegEx: False Result=> LastErrSplit
    SET MyPDF_PageCount TO LastErrSplit[1]
    Text.ToNumber Text: MyPDF_PageCount Number=> MyPDF_PageCount
    Variables.DecreaseVariable Value: MyPDF_PageCount DecrementValue: 2
    **ENDREGION
    **REGION Extract Pages
    LOOP LoopIndex FROM 1 TO MyPDF_PageCount STEP 1
        Pdf.ExtractTextFromPDF.ExtractTextFromPage PDFFile: MyPDF_FilePath PageNumber: LoopIndex DetectLayout: False ExtractedText=> ExtractedPDFText
        Text.ParseText.ParseForFirstOccurrence Text: ExtractedPDFText TextToFind: $'''TextToSearchForGoesHere''' StartingPosition: 0 IgnoreCase: False OccurrencePosition=> Position
        IF Position > 0 THEN
            Pdf.ExtractPages PDFFile: MyPDF_FilePath PageSelection: LoopIndex ExtractedPDFPath: MyPDF_FilePath IfFileExists: Pdf.IfFileExists.AddSequentialSuffix ExtractedPDFFile=> ExtractedPDF
        END
    END
    **ENDREGION
    
     
    It should be fairly easy to adapt that code to fit most use cases. Good luck!

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 507 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 267 Moderator

#3
abm abm Profile Picture

abm abm 232 Most Valuable Professional

Last 30 days Overall leaderboard