web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / How do I convert HTML ...
Power Automate
Unanswered

How do I convert HTML to an XML object?

(0) ShareShare
ReportReport
Posted on by 70

I have HTML emails from which I want to extract information. My emails are well-formed HTML. Normally, in other environments, to handle this, I would use an XML library’s HTML parse mode to get an XML DOM to look at it.

 

It looks like the xpath() and xml() functions exist. Those are pretty powerful and would provide the ability to access the data as using the XML DOM.

 

However, I cannot figure out how to parse HTML as XML. When I pass my HTML to xml(), I get the error “The provided value cannot be converted to XML: 'The 'meta' start tag on line 2 position 161 does not match the end tag of 'head'. Line 214, position 11.'. Please see https://aka.ms/logicexpressions#xml for usage details.'” This error makes sense and is how this should work. However, I cannot figure out how to get an XML object from HTML like I can in other libraries.

 

Is there any equivalent to an HTML DOM library with XPath support in power automate? Or something that can process HTML into XML similar to xmllint --html - or DOMDocument::loadHTML())?

Categories:
I have the same question (0)
  • tom_riha Profile Picture
    10,185 Most Valuable Professional on at

    Hello @binki-dcx ,

    Power Automate doesn't have anything to pre-process HTML, the only way to handle it would be as a string with some combination of split(...), replace(...), concat(...), etc. until you get a clear HTML that can be converted into xml.

    But even if you remove the header and keep only the body, it'll still have problems with tags that don't have a closing, e.g. <img... />, <br>.

    In the end you might be better off with a solution as shown e.g. here: https://www.youtube.com/watch?v=7tZ6bRtco3Y, get rid of all the HTML tags and parse it from plain text.

  • binki-dcx Profile Picture
    70 on at

    The problem is that I want to do this “properly” and I need to use data from the DOM, such as attributes, to correctly identify the information I want to load and to extract data. The text content/text rendering of the HTML loses all semantics. An HTML parser outputting XML does exactly what I need. Just this component seems to be missing from the entire Microsoft ecosystem (even .net’s XmlDocument supports serializing in HTML format but not deserializing—and that is probably why Power has no html() function).

    A more proper, but less performant, solution would be for me to write a custom connector which literally just passes the data to `xmllint --html -`.

    Right now, I am using an improper string-based solution because the HTML I get is clean/self-consistent enough that I can know that splitting on the double-quote character, finding all non-spacey strings starting with https://,  and filtering down the URIs I have identified using a substring works. But that is only because I am lucky with the contents of the HTML documents I am working with.

    This shouldn’t be the case. Microsoft should add an HTML parser to .net akin to libxml2’s HTML parser.

  • tom_riha Profile Picture
    10,185 Most Valuable Professional on at

    You can submit it as an idea to the ideas forum, but I'm afraid that's all that can be done at this moment: Power Automate · Community

  • binki-dcx Profile Picture
    70 on at

    I have submitted it as an idea here.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 538 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 405 Moderator

#3
abm abm Profile Picture

abm abm 252 Most Valuable Professional

Last 30 days Overall leaderboard