Unanswered

Using Regex to extract values from a web site

(0) Share

Report

Posted on by JanTheofel

Hello,

I try to extract data from a web seite with Power Automate Desktop. As the CSS Selektor is not unique I want to use regular expressions. I'm used to create them (coded much Perl in ma past) - but I'm not sure how to apply them here.

And I just searched the word "Unternehmenssitz" which is the word I am looking for as the headline (the text I want follows in the next tag). When I search for (Unternehmenssitz) everything is fine. But as soon as I add something to it, the Regex retruns nothing. Even when just adding the < of the closing tag.
I tried both

(Unternehmenssitz)<

and the escaped version

(Unternehmenssitz)\<
and with a \s* in between:

(Unternehmenssitz)\s*<
(Unternehmenssitz)\s*\<
All four return nothing.

The final regex should be something like this: (And should return "Halbergmoos, Germany" in the example below.)

Unternehmenssitz</td><dd[^>]+>([^<]+)<\/dd>

Here is a screenshot of the web page code (from the developer tool):

Thanks for your help!
Jan

Categories:

Power Automate Desktop

I have the same question (0)

All responses (3)

Answers (0)

JanTheofel 15 on at

Like (0)

Report

After thinking about it I guess the attribute setting might be the problem. It is set to "Own text" which probably just give access to the text of the webpage but not to it's code. But I can't find a reference for possible values and leaving it empty does not solve the issue.

Was this reply helpful? Yes No
Pavel_NaNoi 1,074 on at

Like (0)

Report

I'm not 100% sure as to what you're doing, but I did a bit of testing and I think it might work:
I'm using an UI Extract data from window action and i'm pointing it on to the pane, any part of the text will do (if you want it to automatically open this window up before extracting the text, just send an f12 hotkey) then I simply make it generic by making it like so:
this will extract all the text in there, from there you can simply use this regex inside a parse text command onto the variable that stored all the text, to find the text you need:
Unternehmenssitz(.)*(\s|\n)?(.)* which will extract everything after the word for one line, you can keep on parsing from there to further filter the text to the words you want.

I think this is what you kind of wanted,
hope it helps!

Was this reply helpful? Yes No
tkuehara 667 on at

Like (1)

Report

Hi,

You could try editing your CSS selector. If you are looking for a fixed value - in your case, "Unternehmenssitz" - then you could setup your CSS selector as follows:

dt:contains(Unternehmenssitz) + dd
The selector above could be interpreted like this: get a dt element that has the string "Unternehmenssitz" and then retrieve its adjacent sibling element dd (through the "+" selector). This way you get the next tag immediatly after the tag with the "Unternehmenssitz" text.

Was this reply helpful? Yes No