web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Not all table extracti...
Power Automate
Answered

Not all table extracting correctly in PAD and how to solve it ??

(0) ShareShare
ReportReport
Posted on by 14

I have a pdf of multiple pages and i want to extract all the tables. But only green mark table (in the attached image) extracted correctly but red mark table (in the attached image) is not. It is missing the header, couple of first row and first column. I have attached my flow here along with pdf. Please give me a solution that how can I extract the red mark table exactly as it is. Though column header is not a problem if its missing through extraction. Need a solution that any future pdf of such format must satisfy the flow and give correct extraction.

I have the same question (0)
  • mmonline Profile Picture
    169 on at

    Well.... the Extract tables from pdf action is fairly limited. 

     

    If the document is always formatted like what you uploaded, you can extract to text and then pass that to something like Javascript, Python, or VBScript. I am looking at it now.

    =======

    mmonline_0-1689026736120.png

     

    There are definitely pattern-esqe elements. I am guessing this would require a bit of work to nail down and make work properly.

     

    The tables return are inconsistent even between the first non-standard table and the next.

    mmonline_1-1689026946847.png

    The above does not include the Index No. column.

     

    The next one:

    mmonline_2-1689027005826.png

    Closer but pretty ugly.

     

    I have not evaluated any of the other tables.

     

    Sorry I could not provide a better answer.

  • Verified answer
    Agnius Bartninkas Profile Picture
    Most Valuable Professional on at

    A table like that will not be extracted nicely via the Extract tables from PDF action. You will need to use the Extract text from PDF action and then parse the text to create a table variable.

     

    Here's a sample flow that could handle the table you marked as red in your PDF file:

    Agnius_0-1689046362332.png

     

    What it does is as follows:

    1. Creates a new table and deletes the empty row (as it is impossible to create a table without the empty row)
    2. Extracts the text from your PDF file
    3. Parses the text with regex to retrieve the relevant lines
    4. Splits the retrieved text into a list with an item for each line
    5. Loops through the lines and splits it by space to get each item separately
    6. If the line contains 5 items, inserts it directly into the table created in step 1, else, inserts it with some blank values in between.

    Here's a snippet you can copy and paste directly into PAD to have the actions created automatically for you:

    Variables.CreateNewDatatable InputTable: { ^['Index No.', 'Unit', 'Pack Breakup', 'Pack', 'Quantity'], [$'''''', $'''''', $'''''', $'''''', $''''''] } DataTable=> DataTable
    Variables.DeleteRowFromDataTable DataTable: DataTable RowIndex: 0
    Pdf.ExtractTextFromPDF.ExtractText PDFFile: $'''C:\\RPA\\pad_pdf_file.pdf''' DetectLayout: False ExtractedText=> ExtractedPDFText
    Text.ParseText.RegexParseForFirstOccurrence Text: ExtractedPDFText TextToFind: $'''(?<=MULTIPACK\\r\\nQUANTITY\\r\\nIndex No..+\\r\\n)(.+\\r\\n)+?.+(?=\\r\\n\\d+\\s\\d+\\r\\n.+PREPACK)''' StartingPosition: 0 IgnoreCase: False OccurrencePosition=> Position Match=> Match
    Text.SplitText.Split Text: Match StandardDelimiter: Text.StandardDelimiter.NewLine DelimiterTimes: 1 Result=> TextList
    LOOP FOREACH TextLine IN TextList
     Text.SplitText.Split Text: TextLine StandardDelimiter: Text.StandardDelimiter.Space DelimiterTimes: 1 Result=> TextLineList
     IF TextLineList.Count = 5 THEN
     Variables.AddRowToDataTable.AppendRowToDataTable DataTable: DataTable RowToAdd: TextLineList
     ELSE
     Variables.AddRowToDataTable.AppendRowToDataTable DataTable: DataTable RowToAdd: [TextLineList[0], '', TextLineList[1], '', TextLineList[2]]
     END
    END

     

    Note you will need to change the file path in the Extract text from PDF action for this to work.

     

    An important note here: this flow will not work for a document where the table is split over two separate pages. That's because it parses the text using the headers and the totals after the table. Since the headers are repeated for each page and the totals are only there in the last page where the table ends, if your table is split over two pages, this flow would include the headers as text lines, so you would need to handle that. 

     

    In order to handle it, you could add some extra conditions into the loop like this:

    Agnius_1-1689046873583.png

     

    This will skip the part of the loop that splits the line and inserts it into the table, if the line contains some of the text that should be in the headers.

     

    Here's a snippet with the total flow:

    Variables.CreateNewDatatable InputTable: { ^['Index No.', 'Unit', 'Pack Breakup', 'Pack', 'Quantity'], [$'''''', $'''''', $'''''', $'''''', $''''''] } DataTable=> DataTable
    Variables.DeleteRowFromDataTable DataTable: DataTable RowIndex: 0
    Pdf.ExtractTextFromPDF.ExtractText PDFFile: $'''C:\\RPA\\pad_pdf_file.pdf''' DetectLayout: False ExtractedText=> ExtractedPDFText
    Text.ParseText.RegexParseForFirstOccurrence Text: ExtractedPDFText TextToFind: $'''(?<=MULTIPACK\\r\\nQUANTITY\\r\\nIndex No..+\\r\\n)(.+\\r\\n)+?.+(?=\\r\\n\\d+\\s\\d+\\r\\n.+PREPACK)''' StartingPosition: 0 IgnoreCase: False OccurrencePosition=> Position Match=> Match
    Text.SplitText.Split Text: Match StandardDelimiter: Text.StandardDelimiter.NewLine DelimiterTimes: 1 Result=> TextList
    LOOP FOREACH TextLine IN TextList
     IF (Contains(TextLine, 'MULTIPACK', False) OR Contains(TextLine, 'QUANTITY', False) OR Contains(TextLine, 'Index No', False)) = True THEN
     NEXT LOOP
     END
     Text.SplitText.Split Text: TextLine StandardDelimiter: Text.StandardDelimiter.Space DelimiterTimes: 1 Result=> TextLineList
     IF TextLineList.Count = 5 THEN
     Variables.AddRowToDataTable.AppendRowToDataTable DataTable: DataTable RowToAdd: TextLineList
     ELSE
     Variables.AddRowToDataTable.AppendRowToDataTable DataTable: DataTable RowToAdd: [TextLineList[0], '', TextLineList[1], '', TextLineList[2]]
     END
    END

    -------------------------------------------------------------------------
    If I have answered your question, please mark it as the preferred solution.
    If you like my response, please give it a Thumbs Up.

    If you are interested in Power Automate, you might want to follow me on LinkedIn at https://www.linkedin.com/in/agnius-bartninkas/

     

     

  • Verified answer
    Agnius Bartninkas Profile Picture
    Most Valuable Professional on at

    P.S. You actually can create a data table with headers but no rows if you paste the code as 

    Variables.CreateNewDatatable InputTable: { ^['Index No.', 'Unit', 'Pack Breakup', 'Pack', 'Quantity'] } DataTable=> DataTable

    In which case you would not need the Delete row from data table action.

    But this is not something you can actually do via the UI of PAD at all. The only way to do it is to create the action first, then copy it to a text editor, modify it to remove the empty row and then paste it back to PAD. So, the more natural solution is the one I posted above.

  • LPad Profile Picture
    14 on at

    @mmonline Thanks for trying it out. I have tested the text version but it's pretty much work.

  • Agnius Bartninkas Profile Picture
    Most Valuable Professional on at

    Did you notice that I provided the entire script in my reply above?

    -------------------------------------------------------------------------

    If I have answered your question, please mark it as the preferred solution.

    If you like my response, please give it a Thumbs Up.

    If you are interested in Power Automate, you might want to follow me on LinkedIn at https://www.linkedin.com/in/agnius-bartninkas/

     

  • LPad Profile Picture
    14 on at

    Thanks @Agnius . But I don't want to stop at "PREPACK" instead all the table in between order to order. This model stops before prepack, though prepack table structure is same. I found that solution. But it stops before the next order. Indeed it's a great solution. Thanks again for helping me towards the solution. 👍✌

  • Agnius Bartninkas Profile Picture
    Most Valuable Professional on at

    Glad I could help. You should be able to take it from here and handle other tables in the file by adjusting the starting and ending keywords around the pattern.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the April Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Vish WR Profile Picture

Vish WR 816

#2
Valantis Profile Picture

Valantis 603

#3
Haque Profile Picture

Haque 566

Last 30 days Overall leaderboard