Since nobody has replied yet, I will just let you know, what you want to do can probably be done, but it most likely is not going to be easy or beginner level. I could be wrong and it may be easier than I think.
I say this because yesterday I was looking into a request I had, which involves a spreadsheet and a pdf file. The requestor wants me to develop a flow to take a spreadsheet with part numbers and their descriptions to check if any of the part numbers were included in a quote to a customer, as the part numbers in the spreadsheet are outdated parts and should no longer be used in quotes. The pdf has a consistent format since it comes from our quoting system and as you can see can be multiple pages long:
The goal is to create a spreadsheet of the part numbers and descriptions found in the quote and also highlight the rows in the pdf where these part numbers are found so that the salesperson can be notified the items should not be quoted and they should quote with the newer part numbers.
I attempted this first by just trying to do this with Copilot in Microsoft edge by attaching the two files. After some optimizing the prompt for Copilot, I was able to get it to do what is needed consistently. Below is the prompt I have provided to the requester so they can do more testing.
I then asked Copilot how I could do the same thing through Power Automate so the process could be automated. It offered a number of suggestions that seem reasonable, but they're not simple steps. This is where I am at and why I say what you want to do is probably possible. Therefore, I suggest you try attaching the pdf in Copilot and prompting it to do what you want. When you get that working, ask how it can then be automated through Power Automate. Note, be sure that the pdf is not image based and the text is searchable. I have found that Copilot will not do OCR on image-based PDF files like Google Gemini will.
Good luck!
This is the prompt I used:
You will receive two files:
1) Excel: contains a column named “Part no.” (may also include “Description”).
2) PDF: contains part numbers and descriptions.
Your tasks — please proceed without reconfirmation:
A) Compare the Excel “Part no.” values against all part numbers found in the PDF (treat as strings; exact, case-sensitive match; trim whitespace).
- If the Excel column header varies slightly (e.g., “Part no”, “Part number”, “Part #”), map it to “Part no.”.
- Do NOT perform fuzzy matching or guess missing characters.
B) Produce an **Excel workbook** named `matched_parts_from_pdf.xlsx` with these sheets:
1) **Matches**
Columns (in this order):
- Part no.
- Description → Take the description text from the PDF line where the part number is found (not from Excel).
- PDF Page → Page number where the first match was found.
- PDF Line (optional) → The extracted line text containing the match (useful for audits).
Rules:
- If a part number appears multiple times in the PDF, keep **one row** in this sheet using the first non-blank description encountered; still fill “PDF Page” with the page of that first occurrence.
- Deduplicate by “Part no.”.
2) **Non-Matches (Excel)**
Columns:
- Part no.
Rules:
- List every Excel “Part no.” that did **not** appear in the PDF (one row per part).
3) **Summary**
Include:
- Total Excel part numbers processed.
- Count of matches.
- Count of non-matches.
- Pages that contain matches (comma-separated).
- Timestamp of run.
- Up to 3 sample rows from **Matches**.
Formatting preferences:
- Freeze top row on each sheet.
- Auto-fit column widths.
- Apply a simple header style (bold) and filters.
C) Create a highlighted version of the original PDF showing the actual part-number text highlighted in place:
- If word-level text coordinates are available, generate `[original_pdf_basename]_highlighted_exact.pdf` using true text highlights directly over each matched part number (highlight all occurrences across pages).
- If coordinates are not available, fall back to a non-destructive **page-level overlay** (e.g., a margin ribbon + small list of matched part numbers for that page) named `[original_pdf_basename]_highlighted.pdf`.
D) Return:
- Direct download links to `matched_parts_from_pdf.xlsx` and the highlighted PDF (exact or fallback).
- A short results summary including:
• Total matches,
• Page numbers that contain matches,
• 3 sample rows (Part no., Description, Page) from the **Matches** sheet.
Edge cases & quality rules:
- Always return the Excel workbook, even if **no matches** are found:
- Create the three sheets with headers; **Matches** may be empty but must exist.
- State “No matches found” in the Summary sheet.
- Do not invent descriptions; only use text that exists in the PDF.
- Keep comparisons exact (string match), but normalize Excel strings by trimming whitespace.
- If helpful, include an auxiliary “Index” PDF listing each matched line with its page number for quick visual lookup.
Inputs I will provide each time (replace with current filenames):
- Excel file: <your_excel_filename.xlsx>
- PDF file: <your_pdf_filename.pdf>
Now perform all steps above with the files I attach in this chat.