web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Power Automate doesn't...
Power Automate
Answered

Power Automate doesn't detect pattern when trying to scrape website text over several paragraphs, web helper always only lets me select a single source

(0) ShareShare
ReportReport
Posted on by 10

Hi all, I've been trying to scrape this online textbook from openstax https://openstax.org/books/introduction-philosophy/pages/1-introduction,

I can select the pager element, and I've been trying to extract text from the div and paragraphs individually.
But whenever I move to a new page I have to select a new element.

If I do advanced and change it always changes back to single element whenever I select a new pages divs text.

I know we can download pdf, and epub versions of this, but I would like to have the raw texts though if possible.

 

What am I doing wrong?

I have the same question (0)
  • OkanMTL Profile Picture
    703 Super User 2024 Season 1 on at

    Hello,

     

    Your selectors are not Dynamic and instead are static. In short: The selectors you declare on page one, won't work on page 2 or page 10. Same goes the other way around. 

     

    You have to look for the selector that is the same on each page. Which is: div[Id="main-content"] AND ONLY div[Id="main-content"]

    So your UI element within your scrape activity should have this selector.

    OkanAT_0-1679312279547.png

     

    Good Luck 🙂

     

    PS: this selector made me scrape multiple pages on the site you want to scrape.

  • Verified answer
    OkanMTL Profile Picture
    703 Super User 2024 Season 1 on at

    Reading your topic back, you've maybe already found that you can use div[Id="main-content"]. To get all content on a page.

     

    To get paragraphs on a page, you can use The sections or Para elements on the page. If you inspect the page you can see that sections are made up by sect-xxxx and parapgraphs are made up by para-xxxxx.

     

    It's gonna be a little bit advanced but you can loop thru the xxxx'es in Sections and Paragraphs.

     

     

     

     

  • oliver9 Profile Picture
    10 on at

    Thanks a lot, I'll try with this, I didn't know about the main content tag. I'm optimistic since it worked for you 🙂

  • oliver9 Profile Picture
    10 on at

    hmm, ok, I am not able to do it.
    When I try to extract value it always just says single value is that the issue?
    Because it only extracts one time, clicks on next and stops

  • oliver9 Profile Picture
    10 on at

    How do I get to that page on your screenshot where I can select the divs? I only  have the advanced settings via the webhelper:

    oliver9_0-1679390512128.png

     

  • OkanMTL Profile Picture
    703 Super User 2024 Season 1 on at

    Hello,

     

    If it Extracts one time and it stops, perhaps you don't have any activity after extracting the first page.

     

    You can use a Loop in which you can set the counter to the amount of pages it has and increment it by 1.

    Within the loop you can {

    1. Extract the page activity.

    2. Click on next activity

    3. Write the extracted value to an Excel/CSV file. }

     

    In the end you will have a file with all the extracted values.

  • ollibolli Profile Picture
    2 on at

    Thank you! Are there tutorials? I don't know how to set a loop and I feel bad for keep coming back to ask you.
    How do you get to the window in your screenshot above where I can select the different UI elements?

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 501 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 323 Moderator

#3
abm abm Profile Picture

abm abm 237 Most Valuable Professional

Last 30 days Overall leaderboard