web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / parse and extract the ...
Power Automate
Unanswered

parse and extract the data from pdf file

(0) ShareShare
ReportReport
Posted on by

Hi all,

 I am trying to extract data from a PDF file that contains multiple pages. Pdf files are present in the SharePoint. I want to extract and parse the data using PAD. For example, my PDF file has data like this:

DOS          CustomerNo
6/28/24     67555544
6/5/24       88999999
5/5/24        666875554
3/5/24         787987987

I would like to extract this data from the PDF and add "CN0" as a prefix to each Customer No. The output should be:

DOS             Customer No
6/28/24        CN067555544
6/5/24           CN088999999
5/5/24           CN0666875554
3/5/24            CN0787987987

However, some Customer Nos already have the prefix "CN0" and some do not. Please help me with this issue. Thanks in advance!

I have the same question (0)
  • MichaelAnnis Profile Picture
    5,727 Moderator on at

    Create a new list %Customers%

    Extract text from PDF

    Split Text by New Line (this will create a list variable for each row)
    For each %SplitText%

        Get Subtext %CurrentItem.Length - 8% to end of text

        Add to %Customers%:  CNO%SubText%

    End (for each)

     

    Good luck!

     

  • Th11 Profile Picture
    on at

     

    Hi @MichaelAnnis , Thanks for your response. Since I am new to PAD, I don't get your solution exactly.

    From this sample invoice PDF (Multiple page) file, I need to extract the data from the City, DOS, and Customer No columns. The correct format for the Customer No is "CN08765432". After extracting the data from the PDF, I need to parse the Customer No column data to ensure that all entries have the prefix ‘CN0’. However, some customer numbers are already in the correct format, while others may have ‘0’ as a prefix. Please guide me on how to resolve this issue. Thanks in advance!

    Th11_0-1719853340303.png

     

  • Deenuji_Loganathan_ Profile Picture
    6,250 Super User 2025 Season 2 on at

    @Th11 

     

    Please share the screenshot of your workflow which you have created thus far in power automate desktop, Based on that I can provide guidance accordingly.

     


    Thanks,
    Deenuji Loganathan 👩‍💻
    Automation Evangelist 🤖
    Follow me on LinkedIn 👥

    -------------------------------------------------------------------------------------------------------------
    If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

  • Th11 Profile Picture
    on at

    Hi @Deenuji ,

    Thanks for your response. I am trying to use parse text action. But I don't think it will help to extract the data's from column. 

    Th11_0-1719859499800.png

     

  • Deenuji_Loganathan_ Profile Picture
    6,250 Super User 2025 Season 2 on at

    @Th11 
    ok got it. Could you pls share screenshot how it look like? previously you mentioned some field contains CN right, I want to take a look on it.

     


    Thanks,
    Deenuji Loganathan 👩‍💻
    Automation Evangelist 🤖
    Follow me on LinkedIn 👥

    -------------------------------------------------------------------------------------------------------------
    If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

     

     

  • Th11 Profile Picture
    on at

    Th11_2-1719864320234.png

    @Deenuji , the one which I circled is a correct format. Need to parse the prefix "CN0" for some of the customer no and need to add just "CN" for some of the customer no because already prefix "0" is there. Thank you!

  • MichaelAnnis Profile Picture
    5,727 Moderator on at

    The original idea I had was to extract just the right 8 digits, but let's try another approach since you need everything.

     

    Does this table appear in the PDF?  Have you tried extract table from PDF action?

  • Deenuji_Loganathan_ Profile Picture
    6,250 Super User 2025 Season 2 on at

    @Th11 

     

    If the table exists in a PDF, and you're attempting to extract it as a data table to loop through and check if the customer number contains "CN0," is that correct?

     

    In the above shared desktop flow screenshot where the PDF was extracted as text only, rather than a table, how would you iterate through the table from the text? Please clarify and confirm my understanding.

     


    Thanks,
    Deenuji Loganathan 👩‍💻
    Automation Evangelist 🤖
    Follow me on LinkedIn 👥

    -------------------------------------------------------------------------------------------------------------
    If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

  • Th11 Profile Picture
    on at

     

    @MichaelAnnis . Thank you for your response. Earlier, I didn't know that I could use the 'Extract table from pdf' action to extract the data table from a PDF.

    @Deenuji  sorry, the desktop flow I shared earlier was incorrect. The tables are present in the PDF file, and it contains multiple pages. Can you help me loop through the pages and check if the customer number contains 'CN0'? If not, we need to add the prefix 'CN' if the number starts with '0'. Otherwise, we need to add the prefix 'CN0' if the number does not start with '0'.

  • Th11 Profile Picture
    on at

    Th11_0-1719949488989.pngTh11_1-1719949536816.pngTh11_2-1719949685780.png

    My PDF file has 17 pages, and I extracted all the table values from the file. However, the data I want to extract starts from index 4. I can't check through the column names because the column names are not present on all the pages. For example, the "Starting Period" column name is only present on the first page. I can see all the table values in the display message, but at the end, it throws an error saying "index is out of range.

    I need to extract "CE03110965" and write it in excel. If "CE0 " is not present, then i need to add the prefix "CE0"

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 501 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 323 Moderator

#3
abm abm Profile Picture

abm abm 237 Most Valuable Professional

Last 30 days Overall leaderboard