Unanswered

parse and extract the data from pdf file

(0) Share

Report

Posted on by Th11

Hi all,

I am trying to extract data from a PDF file that contains multiple pages. Pdf files are present in the SharePoint. I want to extract and parse the data using PAD. For example, my PDF file has data like this:

DOS CustomerNo
6/28/24 67555544
6/5/24 88999999
5/5/24 666875554
3/5/24 787987987

I would like to extract this data from the PDF and add "CN0" as a prefix to each Customer No. The output should be:

DOS Customer No
6/28/24 CN067555544
6/5/24 CN088999999
5/5/24 CN0666875554
3/5/24 CN0787987987

However, some Customer Nos already have the prefix "CN0" and some do not. Please help me with this issue. Thanks in advance!

Categories:

Power Automate Desktop

I have the same question (0)

All responses (10)

Answers (0)

MichaelAnnis 5,727 Moderator on at

Like (0)

Report

Create a new list %Customers%

Extract text from PDF

Split Text by New Line (this will create a list variable for each row)
For each %SplitText%

Get Subtext %CurrentItem.Length - 8% to end of text

Add to %Customers%: CNO%SubText%

End (for each)

Good luck!

Was this reply helpful? Yes No
Th11 on at

Like (0)

Report

Hi @MichaelAnnis , Thanks for your response. Since I am new to PAD, I don't get your solution exactly.
From this sample invoice PDF (Multiple page) file, I need to extract the data from the City, DOS, and Customer No columns. The correct format for the Customer No is "CN08765432". After extracting the data from the PDF, I need to parse the Customer No column data to ensure that all entries have the prefix ‘CN0’. However, some customer numbers are already in the correct format, while others may have ‘0’ as a prefix. Please guide me on how to resolve this issue. Thanks in advance!

Was this reply helpful? Yes No
Deenuji_Loganathan_ 6,250 Super User 2025 Season 2 on at

Like (0)

Report

@Th11

Please share the screenshot of your workflow which you have created thus far in power automate desktop, Based on that I can provide guidance accordingly.

Thanks,
Deenuji Loganathan 👩‍💻
Automation Evangelist 🤖
Follow me on LinkedIn 👥

-------------------------------------------------------------------------------------------------------------
If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

Was this reply helpful? Yes No
Th11 on at

Like (0)

Report

Hi @Deenuji ,
Thanks for your response. I am trying to use parse text action. But I don't think it will help to extract the data's from column.

Was this reply helpful? Yes No
Deenuji_Loganathan_ 6,250 Super User 2025 Season 2 on at

Like (0)

Report

@Th11
ok got it. Could you pls share screenshot how it look like? previously you mentioned some field contains CN right, I want to take a look on it.

Thanks,
Deenuji Loganathan 👩‍💻
Automation Evangelist 🤖
Follow me on LinkedIn 👥

-------------------------------------------------------------------------------------------------------------
If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

Was this reply helpful? Yes No
Th11 on at

Like (0)

Report

@Deenuji , the one which I circled is a correct format. Need to parse the prefix "CN0" for some of the customer no and need to add just "CN" for some of the customer no because already prefix "0" is there. Thank you!

Was this reply helpful? Yes No
MichaelAnnis 5,727 Moderator on at

Like (0)

Report

The original idea I had was to extract just the right 8 digits, but let's try another approach since you need everything.

Does this table appear in the PDF? Have you tried extract table from PDF action?

Was this reply helpful? Yes No
Deenuji_Loganathan_ 6,250 Super User 2025 Season 2 on at

Like (0)

Report

@Th11

If the table exists in a PDF, and you're attempting to extract it as a data table to loop through and check if the customer number contains "CN0," is that correct?

In the above shared desktop flow screenshot where the PDF was extracted as text only, rather than a table, how would you iterate through the table from the text? Please clarify and confirm my understanding.

Thanks,
Deenuji Loganathan 👩‍💻
Automation Evangelist 🤖
Follow me on LinkedIn 👥

-------------------------------------------------------------------------------------------------------------
If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

Was this reply helpful? Yes No
Th11 on at

Like (0)

Report

@MichaelAnnis . Thank you for your response. Earlier, I didn't know that I could use the 'Extract table from pdf' action to extract the data table from a PDF.
@Deenuji sorry, the desktop flow I shared earlier was incorrect. The tables are present in the PDF file, and it contains multiple pages. Can you help me loop through the pages and check if the customer number contains 'CN0'? If not, we need to add the prefix 'CN' if the number starts with '0'. Otherwise, we need to add the prefix 'CN0' if the number does not start with '0'.

Was this reply helpful? Yes No
Th11 on at

Like (0)

Report

My PDF file has 17 pages, and I extracted all the table values from the file. However, the data I want to extract starts from index 4. I can't check through the column names because the column names are not present on all the pages. For example, the "Starting Period" column name is only present on the first page. I can see all the table values in the display message, but at the end, it throws an error saying "index is out of range.
I need to extract "CE03110965" and write it in excel. If "CE0 " is not present, then i need to add the prefix "CE0"

Was this reply helpful? Yes No