Hello there Power Automate people, I'm at the end of the line with the XPath feature from the expression in the Power Automate Flow.
The problem I'm facing is:
I have to create a Flow that will extract data from Word files, data that is part of CVs profile made in .docx and I've discovered that you can .zip them and then go trough the nodes of the document.xml file where is the whole written content of the file. The first issue was that those files are in SharePoint, and I couldn't use "Export archive to folder" (for SharePoint) flow since is just a OneDrive feature (very disappointing IMO) so the process was that I take the file from SharePoint, extract it to OneDrive, and then I travel through the files and look for data inside the document.xml.
Fair and square this worked but the problem on me is for the xpath command itself, I'm afraid of the syntax but I'm having a hard time to adapt it to my need.
So until now I've used a mixed-up process of combining two tutorials for extracting data from Word document since I can't use any premium connectors, I'm limited to this. The tutorials I've used until now are:
https://www.expiscornovus.com/2022/03/20/retrieve-docprops-from-a-docx-file/ - saved my life since I couldn't extract the document.xml via SharePoint itself.
https://www.tachytelic.net/2021/05/power-automate-extract-text-from-word-docx-file/ - also great since it helped me get the point of XPath (at least what it does)
I will provide you with what I've done until now and what works and not I'm really looking for a resolution here, if possible:
The flow breakpoint:
The Flow was 1:1 made from: https://www.expiscornovus.com/2022/03/20/retrieve-docprops-from-a-docx-file/ the only change is that I retrive the data from the document.xml by using xpath command provided by https://www.tachytelic.net/2021/05/power-automate-extract-text-from-word-docx-file/
The content of the outputs its just the whole content of the document.xml file.
When I'm running this command or many others like this I'm faced with the error above:
The only command that proof working was:
Which retrieves whole data of the file but the disadvantage is that it looks so unsorted and very messed up.
I also used XPather.com, but the commands work there but in case of using in the Flow they return the Invalid Template error.
The above returns the right data I want by using /w:document/w:body/w:p[*]/w:r/w:t[contains(text(),"Profile type")]/../..
In power automate it will return error.
I'm looking for a solution or to understand what's going on here, but if possible to also make it working. I've also thinked of extracting whole text and with spaces to look great, it does, the command used was:
Which again works on the XPather site but nor in the Power Automate.
Is there a difference between xpath from Power Automate and the standard one ?
Or if you have just another solution to make this working I'm really up to (maybe convert to JSON, or extract all text and then sort it).
Also if is necessary I can provide the full document.xml file so maybe there is a solution.
Thank you so much!
You could also use the OneDrive action to convert a given word document to a pdf, then use parts of this template to either extract exact data values from the document or just get the entire file contents as a text replica:
Up
Up
Michael E. Gernaey
18
Super User 2025 Season 1
stampcoin
16
Churchy
12