Hello everyone,
I am seeking help with an issue related to HTML to plain text conversion using Html to text (Preview) action in Power Automate. My input HTML, when converted to plain text, is producing unusual line breaks represented as "\n".
Here's a sample of my Input and Output:
Input:
<p>CompanyXYZ managed the construction of Building123. At a significant height, the structure was one of the tallest residential buildings in the region at the time of completion. Located in one of the city's most desired neighborhoods, the large square foot footprint boasts over a substantial amount of contiguous feet of frontage on Main Street, alongside some of the most prestigious retail space in the city.</p>
Output:
CompanyXYZ managed the construction of Building123. At a significant height, the\nstructure was one of the tallest residential buildings in the region at time\nof completion. Located in one of the city's most desired neighborhoods, the\nlarge square foot footprint boasts over a substantial amount of contiguous feet of\nfrontage on Main Street, alongside some of the most prestigious retail\nspace in the city.\n
So here, the newline characters that appear between "the" and "structure", "time" and "of", and "retail" and "space" are unwanted.
I am aware that I could use a replace function to replace the "\n" with a space, but that approach also replaces the actual line breaks that I need to keep. I am looking for a method that can distinguish between these "unusual" newline characters and the intentional ones in the original HTML.
Any help or guidance on how to achieve this within Power Automate would be greatly appreciated.
Thanks in advance!
Hi @hasannaqvi
To get the exact content (including new line) between paragraph tag '<p>', you can make use of xpath() function. It accepts valid xml or html and using api parses the tags and fetches the content of a tags.
I tried extracting the content using the sample text shared by you. I have stored the original text in the "Compose" action:
Next, add an another "Compose" action, here will add an expression in the expression box. Inside expression box, we will add a formula to extract the content of <p> tag:
Expression used in the above screenshot:
xpath(xml(outputs('Compose')),'string(/p)')
Note: In the above example, the root tag is <p> but in the actual case, you need to traverse from root tag to the <p> tag to get the content.
This way you will be able to retrieve the new line breaks within the content along with the content in the xml node or html tag.
If this helps & solves your problem, please remember to give a 👍 and accept my solution as it will help others in the future.
Thanks