Skip to main content

Notifications

Community site session details

Community site session details

Session Id : JiCn3nAuwL5zQLAHf88+Cr
Power Automate - Building Flows
Unanswered

Search and extract data from Word document trough .zip and using document.xml

Like (1) ShareShare
ReportReport
Posted on 29 Jun 2023 08:36:58 by 16

Hello there Power Automate people, I'm at the end of the line with the XPath feature from the expression in the Power Automate Flow.
The problem I'm facing is:
I have to create a Flow that will extract data from Word files, data that is part of CVs profile made in .docx and I've discovered that you can .zip them and then go trough the nodes of the document.xml file where is the whole written content of the file. The first issue was that those files are in SharePoint, and I couldn't use "Export archive to folder" (for SharePoint) flow since is just a OneDrive feature (very disappointing IMO) so the process was that I take the file from SharePoint, extract it to OneDrive, and then I travel through the files and look for data inside the document.xml.


Fair and square this worked but the problem on me is for the xpath command itself, I'm afraid of the syntax but I'm having a hard time to adapt it to my need.

So until now I've used a mixed-up process of combining two tutorials for extracting data from Word document since I can't use any premium connectors, I'm limited to this. The tutorials I've used until now are:

https://www.expiscornovus.com/2022/03/20/retrieve-docprops-from-a-docx-file/ - saved my life since I couldn't extract the document.xml via SharePoint itself.
https://www.tachytelic.net/2021/05/power-automate-extract-text-from-word-docx-file/ - also great since it helped me get the point of XPath (at least what it does)

I will provide you with what I've done until now and what works and not I'm really looking for a resolution here, if possible:

The flow breakpoint:

ZeroPower_0-1688026133822.png

The Flow was 1:1 made from: https://www.expiscornovus.com/2022/03/20/retrieve-docprops-from-a-docx-file/  the only change is that I retrive the data from the document.xml by using xpath command provided by https://www.tachytelic.net/2021/05/power-automate-extract-text-from-word-docx-file/ 

The content of the outputs its just the whole content of the document.xml file.

When I'm running this command or many others like this I'm faced with the error above:

 

  • xpath(outputs('Get_file_content_using_path_-_OdfB')?[], '/w:document/w:body/w:p[*]/w:r/w:t/../../../..')
  • xpath(xml(outputs('Get_file_content_using_path_-_OdfB')?['body']), '//w:document/w:body/w:p[*]/w:r/w:t[contains(text(),"Profile type")]/../../text()')
  • xpath(xml(outputs('Get_file_content_using_path_-_OdfB')?['body']), '//*/w:r/w:t[contains(text(),"Profile type")]/../../descendant::t[position()>1]/text()')
  • xpath(xml(outputs('Get_file_content_using_path_-_OdfB')?['body']), '//*[contains(text(), 'Profile type:')]/text(), 'Profile type:'')

The only command that proof working was:

 

  • xpath(xml(outputs('Get_file_content_using_path_-_OdfB')?['body']), '//*[name()=''w:t'']/text()')

Which retrieves whole data of the file but the disadvantage is that it looks so unsorted and very messed up.

 

I also used XPather.com, but the commands work there but in case of using in the Flow they return the Invalid Template error.

ZeroPower_1-1688026631820.png

The above returns the right data I want by using /w:document/w:body/w:p[*]/w:r/w:t[contains(text(),"Profile type")]/../..

In power automate it will return error.

I'm looking for a solution or to understand what's going on here, but if possible to also make it working. I've also thinked of extracting whole text and with spaces to look great, it does, the command used was:

  • xpath(xml(outputs('Get_file_content_using_path_-_OdfB')?['body']), '//*[name()=''w:t'']/text()/normalize-space()')

 

Which again works on the XPather site but nor in the Power Automate.

Is there a difference between xpath from Power Automate and the standard one ?
Or if you have just another solution to make this working I'm really up to (maybe convert to JSON, or extract all text and then sort it).

Also if is 
necessary I can provide the full document.xml file so maybe there is a solution.

 

Thank you so much!

  • takolota1 Profile Picture
    4,859 Super User 2025 Season 1 on 02 Sep 2023 at 19:12:29
    Re: Search and extract data from Word document trough .zip and using document.xml

    You could also use the OneDrive action to convert a given word document to a pdf, then use parts of this template to either extract exact data values from the document or just get the entire file contents as a text replica:

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Extract-Data-From-PDFs-and-Images-With-GPT/td-p/2201345

  • ZeroPower Profile Picture
    16 on 05 Jul 2023 at 07:51:01
    Re: Search and extract data from Word document trough .zip and using document.xml

    Up

     

  • ZeroPower Profile Picture
    16 on 03 Jul 2023 at 06:32:37
    Re: Search and extract data from Word document trough .zip and using document.xml

    Up

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Understanding Microsoft Agents - Introductory Session

Confused about how agents work across the Microsoft ecosystem? Register today!

Warren Belz – Community Spotlight

We are honored to recognize Warren Belz as our May 2025 Community…

Congratulations to the April Top 10 Community Stars!

Thanks for all your good work in the Community!

Leaderboard > Power Automate - Building Flows

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 18 Super User 2025 Season 1

#2
stampcoin Profile Picture

stampcoin 16

#3
Churchy Profile Picture

Churchy 12

Overall leaderboard
Loading complete