Hello,
I'm new to PAD and Regex, and I tried extracting text from a pdf then parsing it so I can get a text of:
Article X Section Y
I used the regex (?i)article\s?\n?\s\d\d?\s?\n?((?i)section\s?\n?\d\d?)? just in case the full text is separated with a line in the pdf, also in case the X and Y contains more than one digit.
The problem is, it seems that the text extracted is separated with a line, so it looks like this:
Article
X Section Y
I tried replacing the regex \n in the extracted text with %""% to delete the new line but it doesn't seem to work.
Can anyone help?
Thanks in advance.
You can use Replace text with regex to replace specific empty strings with a whitespace.
Using (?<=[A-z])(?=\d) will replace the empty string after a word and followed by a digit. You can replace it with %" "% and that will result in a whitespace.
You can then also reverse that to also add a whitespace after a digit and followed by a word in another Replace text action using (?<=\d)(?=[A-z]) as the regex pattern.
If you combine both, you'll get your desired result.
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.
How about inserting a space between two characters? Is it possible?
For example I want to make the text "ArticleXsectionY" into "Article X section Y"
A carriage return is a special character that is similar to newline. See here for more info: https://en.m.wikipedia.org/wiki/Carriage_return
\r matches a carriage return and \n matches a newline. The question marks indicate that both are optional, so that it works if any or both of them is present.
If you want to replace it with a whitespace, use %" "% as the value to replace it with (a whitespace between quotes) or \s with escape sequences enabled.
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.
It worked! Thank you!
If you don't mind, can you tell me what is carriage return and what does \r?\n? do?
Also, is there a way to insert a space in the output text because now I get "Article9 section 1"
You may have a carriage return there. Try replacing \r?\n? instead of just the \n.
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.
Thank you very much for the replies.
I tried simplifying the flow (I removed the second parse) and used the replace \n.
But when I tried displaying it as a message in PAD, the new line is still there.
I tried exporting the list as a .txt file and the new line's also there.
Am I still doing something wrong? What information should I provide further?
Thank you in advance.
Why exactly are you using Parse text inside the loop if you're already iterating through a list of matches that were retrieved via Parse text earlier?
The way I see it, you should simply do this:
And then at the end make sure you're using the %ArticleList% variable and not %Matches% when you want to use the text after replacing newlines.
-------------------------------------------------------------------------
If I have answered your question, please mark it as the preferred solution. If you like my response, please give it a Thumbs Up.
I also provide paid consultancy and development services using Power Automate. If you're interested, DM me and we can discuss it.
Yes, I made a loop to put all the texts in a list.
Okay. And then you somehow put it into a list?
Sorry for the trouble, this is what I did to remove the new line.
and this is what I did to get the text.
WarrenBelz
146,745
Most Valuable Professional
RandyHayes
76,287
Super User 2024 Season 1
Pstork1
66,091
Most Valuable Professional