web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Extract data from webp...
Power Automate
Unanswered

Extract data from webpage - Include data from hyperlinks

(0) ShareShare
ReportReport
Posted on by 73

All, 

I am trying to extract the data from this website: https://www.osha.gov/ords/imis/establishment.search?establishment=USA&state=all&officetype=all&office=all&sitezip=100000&startmonth=01&startday=01&startyear=2023&endmonth=12&endday=31&endyear=2023&p_case=all&p_violations_exist=yes&p_start=&p_finish=0&p_sort=12&p_desc=DESC&p_direction=Next&p_show=20

I have been able to create a flow that exports the front pages for the results. However I want to add the data from clicking on each Hyperlink for "Activity"


PowerAppsUser99_0-1707411447832.png

For example I want to add the data for each link Hyperlink like this https://www.osha.gov/ords/imis/establishment.inspection_detail?id=1718337.015


Is this possible in Power Automate Desktop?

PowerAppsUser99_1-1707411542164.png

 



I have the same question (0)
  • Deenuji_Loganathan_ Profile Picture
    6,255 Moderator on at

    @PowerAppsUser99 
    Please find the sample flow code below for your use case:

    Steps:

    1. Begin by launching the browser with the default URL.
    2. Extract the data from the webpage where it contains all the details in a data table.
    3. Iterate through the extracted data table and launch another browser with the detailed webpage URL (Static URL), then add the Activity ID from the variable.
    4. Extract the necessary data as required.

    Deenuji_0-1707412831353.png

    Thanks,

    Deenu

  • WillSG Profile Picture
    352 Moderator on at

    Hi @PowerAppsUser99  I hope you are doing well.

     

    Yes, you can accomplish this in the same table extraction, see the steps here below:

     

    My process recommendations steps:

     

    1. Right click in the activity number and select “Extract the Element Value”, this will extract the number.
    2. Then, right click again in the Activity number, but instead, select the last option at the bottom “Select Parent Ui” this will refresh and go a layer above.
    3. Now, in the new list display go to the “Extract Element Value” and you will see in the Third option from the list the URL link that you are looking for.

    By doing this you will extract the direct URL of each specific activity number.

     

    Please let me know if it works, happy automation!,


    If I have addressed your inquiry successfully, kindly consider marking my response as the preferred solution. If you found my assistance helpful, a 'Thumbs Up' would be greatly appreciated.

     

    Additionally, I offer specialized consultancy and development services leveraging PAD. If you're interested in exploring these services further, feel free to DM me, and we can initiate a discussion.

     

    Kind regards,

     

    Will SG

    Managing Director & Automation Lead

    RAMS CR (Recruitment & Automation)

    LinkedIn Profile

     

    WillSG_Screenshot 2024-02-08 111114.pngWillSG_Screenshot 2024-02-08 111300-2.png

  • PowerAppsUser99 Profile Picture
    73 on at

    Hey thanks for the reply. I tried replicating this flow and ran into an error on step 4.


    Do you know the cause of this? 

    PowerAppsUser99_0-1707418635627.png

     

  • WillSG Profile Picture
    352 Moderator on at

    Hi @PowerAppsUser99  I hope you are doing well.

     

    If you are using Deenuji approach, I believe that you need to use the variable %CurrentItem[1]%

     

    The List/table of items starts with 0, not 1, so in that table the Activity number is column 2, but when into the variable table the position number assigned is 1.

     

    Change it and try it again, it should work.

     

    Hope this help,


    If I have addressed your inquiry successfully, kindly consider marking my response as the preferred solution. If you found my assistance helpful, a 'Thumbs Up' would be greatly appreciated.

     

    Additionally, I offer specialized consultancy and development services leveraging PAD. If you're interested in exploring these services further, feel free to DM me, and we can initiate a discussion.

     

    Kind regards,

     

    Will SG

    Managing Director & Automation Lead

    RAMS CR (Recruitment & Automation)

    LinkedIn Profile

  • Deenuji_Loganathan_ Profile Picture
    6,255 Moderator on at

    Hi @PowerAppsUser99,

     

    It appears we're approaching table extraction differently,  that's why its showing the above error I guess.

    I've attached my code for your reference.

    I assume you're familiar with the process of copying and recreating the PAD flow on your machine.

    If not, please refer to the steps outlined below:

    1. Download the attached code file.
    2. Unzip the downloaded file.
    3. Open your PAD flow screen.
    4. Create a new flow or open the existing one where you want to integrate the code.
    5. Copy the code from the unzipped text file.
    6. Paste the copied code directly into your PAD flow screen.
    7. Save your changes.
    8. Test the flow to ensure it's working as expected.

    These steps should help you integrate the provided code into your PAD flow seamlessly. Let me know if you encounter any issues along the way.

    Thanks,

    Deenu

    -----------------------------------------------------------------------------------------------------------------------

    If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up. Thank you.

  • Deenuji_Loganathan_ Profile Picture
    6,255 Moderator on at

    @WillSG 
    I appreciate your perspective. Please find the datatable below, which I've captured using my approach. It contains a blank column, which is why I referenced %CurrentItem[2]%.

    Deenuji_0-1707504366459.png

     

    If there were no blank columns during our extraction, then your approach would indeed be accurate.

     

    Thanks,

    Deenu

  • cwischnewski Profile Picture
    162 on at

    When you right click the link and choose Inspect you can see the HTML behind this page.  What you are after is the <a> element but you actually want the href attribute value.  So you then build the UI selector to that <a> element.  On the "Get details of element on web page" you need to select that selector and then you need to change the Attribute name from the default of "Own text" to href.  It is a drop down box, but you can type in it if it isn't there.  (HRef is there, but the same works even if the website is using custom attributes that you want to capture.)

  • Deenuji_Loganathan_ Profile Picture
    6,255 Moderator on at

    @PowerAppsUser99 

    It appears we're approaching table extraction differently, which explains why it's successful for me but causing errors for you. I've attached my code for your reference. Hope this that would be helpful for your use case.


    PAD Code(Copy and paste in your desktop flow):


    WebAutomation.LaunchChrome.LaunchChrome Url: $'''https://www.osha.gov/ords/imis/establishment.search?establishment=USA&state=all&officetype=all&office=all&sitezip=100000&startmonth=01&startday=01&startyear=2023&endmonth=12&endday=31&endyear=2023&p_case=all&p_violations_exist=yes&p_start=&p_finish=0&p_sort=12&p_desc=DESC&p_direction=Next&p_show=20''' WindowState: WebAutomation.BrowserWindowState.Maximized ClearCache: False ClearCookies: False WaitForPageToLoadTimeout: 60 Timeout: 60 BrowserInstance=> Browser
    WebAutomation.ExtractData.ExtractHtmlTable BrowserInstance: Browser Control: $'''html > body > div:eq(2) > div > header > div:eq(4) > div > div:eq(4) > table''' ExtractionParameters: {[$''' ''', $'''#''', $'''Activity''', $'''Date Opened''', $'''RID''', $'''ST''', $'''Type''', $'''Scope''', $'''SIC''', $'''NAICS''', $'''Violations''', $'''Establishment Name'''], [$'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $''''''] } PostProcessData: False TimeoutInSeconds: 60 ExtractedData=> DataFromWebPage
    LOOP FOREACH CurrentItem IN DataFromWebPage
    WebAutomation.LaunchChrome.LaunchChrome Url: $'''https://www.osha.gov/ords/imis/establishment.inspection_detail?id=%CurrentItem[2]%''' WindowState: WebAutomation.BrowserWindowState.Normal ClearCache: False ClearCookies: False WaitForPageToLoadTimeout: 60 Timeout: 60 BrowserInstance=> Browser2
    WebAutomation.ExtractData.ExtractSingleValue BrowserInstance: Browser2 ExtractionParameters: {[$'''html > body > div:eq(2) > div > header > div:eq(4) > div > div:eq(2) > div:eq(0) > p:eq(0)''', $'''Own Text''', $''''''] } TimeoutInSeconds: 60 ExtractedData=> Site_Address
    END

  • WillSG Profile Picture
    352 Moderator on at

    Hi @PowerAppsUser99   I hope you are doing well.

     

    Yes, totally agree with @Deenuji , there are multiple ways to accomplish this task, hence we are using different approaches.

     

    Here below you will find the snipped code of another, this one will Launch the URL and create an spreadsheet right away with the main data, once created, we are reading the spreadsheet data and merging every number with the unique URL (after reviewing multiple URLs, I found out they are the same, hence we can concatenate the number with the URL.

     

    Here is the code snipped:

     

    SET URL TO $'''https://www.osha.gov/ords/imis/establishment.inspection_detail?id='''

    WebAutomation.LaunchChrome.LaunchChrome Url: $'''https://www.osha.gov/ords/imis/establishment.search?establishment=USA&state=all&officetype=all&office=all&sitezip=100000&startmonth=01&startday=01&startyear=2023&endmonth=12&endday=31&endyear=2023&p_case=all&p_violations_exist=yes&p_start=&p_finish=0&p_sort=12&p_desc=DESC&p_direction=Next&p_show=20''' WindowState: WebAutomation.BrowserWindowState.Maximized ClearCache: False ClearCookies: False WaitForPageToLoadTimeout: 60 Timeout: 60 BrowserInstance=> Browser

    WebAutomation.ExtractData.ExtractHtmlTableInExcel BrowserInstance: Browser Control: $'''html > body > div:eq(2) > div > header > div:eq(4) > div > div:eq(4) > table''' ExtractionParameters: {[$''' ''', $'''#''', $'''Activity''', $'''Date Opened''', $'''RID''', $'''ST''', $'''Type''', $'''Scope''', $'''SIC''', $'''NAICS''', $'''Violations''', $'''Establishment Name'''], [$'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $'''''', $''''''] } PostProcessData: False TimeoutInSeconds: 60 ExcelInstance=> ExcelInstance

    Excel.Attach DocumentName: $'''Book1''' Instance=> ExcelInstance

    Excel.ReadFromExcel.ReadAllCells Instance: ExcelInstance ReadAsText: False FirstLineIsHeader: False RangeValue=> ExcelData

    LOOP FOREACH CurrentItem IN ExcelData

        Excel.GetFirstFreeRowOnColumn Instance: ExcelInstance Column: $'''A''' FirstFreeRowOnColumn=> FirstFreeRowOnColumn

        Text.Trim Text: CurrentItem[2] TrimOption: Text.TrimOption.Both TrimmedText=> TrimmedText

        Excel.WriteToExcel.WriteCell Instance: ExcelInstance Value: $'''%URL%%TrimmedText%''' Column: $'''A''' Row: FirstFreeRowOnColumn

    END

    Excel.ResizeColumnsOrRows.AutofitRangeOfColumns Instance: ExcelInstance StartColumn: $'''A''' EndColumn: $'''D'''

     

     

    PS: the Number contains blank spaces, hence I’m using the trim action to clean the number prior to merge it.

     

    Please let me know if it works, happy automation!,


    If I have addressed your inquiry successfully, kindly consider marking my response as the preferred solution. If you found my assistance helpful, a 'Thumbs Up' would be greatly appreciated.

     

    Additionally, I offer specialized consultancy and development services leveraging PAD. If you're interested in exploring these services further, feel free to DM me, and we can initiate a discussion.

     

    Kind regards,

     

    Will SG

    Managing Director & Automation Lead

    RAMS CR (Recruitment & Automation)

    LinkedIn Profile

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the April Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Vish WR Profile Picture

Vish WR 784

#2
Valantis Profile Picture

Valantis 581

#3
Haque Profile Picture

Haque 545

Last 30 days Overall leaderboard