Skip to main content

Notifications

Query Large PDFs With GPT RAG

takolota1 Profile Picture Posted by takolota1 4,617

RAG-GPT-PDF-Thumbnail.png

Query Large PDFs With GPT RAG

Need to efficiently prompt / query on very large PDFs with dozens, hundreds, maybe even thousands of pages, but can't possibly fit and/or pay for all that in a prompt?

 

This template builds off a previous Extract Data From PDFs and Images With GPT template. But this template uses Retrieval Augmented Generation (RAG) to essentially do a google search for the file pages with the most relevant text related to a user's query before using those pages to answer the query with a GPT prompt.
It returns the most relevant file page texts to the GPT prompt up to the chosen character limit so the GPT model can use only those most relevant pages to answer the user's query. For example, if there is a question like "According to the file, what happens if goods are damaged?" about a 600 page PDF and the MaxFileTextCharacters parameter is set to 80000 & each page is roughly 4000 characters / 1000 tokens, then the flow will select only the 20 pages' texts most relevant to the question of damaged goods & submit those pages' texts along with the question to the GPT prompt to generate an answer.

(Will not be very useful for more aggregate queries that need the entire document like "Summarize the entire PDF".)

 

To read more on semantic search & embeddings used in Retrieval Augmented Generation, see these resources

https://medium.com/@pankaj_pandey/exploring-semantic-search-using-embeddings-and-vector-databases-with-some-popular-use-cases-2543a79d3ba6

 

https://platform.openai.com/docs/guides/embeddings

 

https://www.youtube.com/watch?v=orLGv2LgWDE 

 

 

Flow Run Example

Overall the flow takes in a query to ask GPT, the MaxFileTextCharacters controlling the amount of file context given to the GPT prompt, & the file relevant to the question.
It then OCRs each page of the file, converts the OCR text & coordinates to a text replica of each page, uses text embeddings values of the query text & each page text to give each page text a score of how relevant it is to the query text, filters to only the most relevant pages that can fit in the MaxFileTextCharacters limit, resorts to a chronological page order to make a combined replica text of the most relevant pages in page order, & then feeds those pages' texts & the query to a GPT prompt to get a response.

takolota_0-1710184872654.png

 

Setting of the MaxFileTextCharacters parameter & setting of the Query.

MaxTextAndQuery.png

 

Getting the query text embeddings.

QueryTextEmbed.png

 

Selecting the file from OneDrive & OCR it for a list of all the document text values & their coordinates.

OCR-Doc0.pngOCR-Doc.png

 

Run actions to process each page of text values & coordinates into replications of each document page with proper horizontal & vertical spacing between text values.

takolota_0-1710190284756.png

 

Take the text replica for each page & get the text embeddings.

PageTextEmbed.png

 

So this replicated page text...

 

{
 "input": " xxxxxxxx\n\n xxxxxxxxxx\n\n\n\n\n\n Indefinite Delivery / Indefinite Quantity Subcontract\n\n\n Between\n\n\n xxxxxxxxxxx, Inc.\n\n\n And\n\n xxxxxxx xxxxxxx\n\n\n Hereinafter referred to as the \"Subcontractor\"\n\n For\n\n xxxxx Global Health Supply Chain Program- Procurement and\n\n Supply Management (GHSC-PSM) project\n\n Contract No .: xxx-xxx-xx-xxxxxx Task Order No .: xxx-xxx-xx xxxxxx\n\n\n\n Subcontract number: xxxx\n\n Start Date: 10/01/2019\n\n End Date: 11/23/2020\n - -\n Subcontract Ceiling Price: $212,523,728.56\n\n\n\n ISSUED BY:\n\n xxxxxxxxx, Inc.\n\n xxxx x Street, xx, Washington, DC, xxxxx, United States of America\n\n\n ISSUED TO:\n\n\n A to Z xxxxxxxxxxxx\n\n xxxxxxx xxxxx xxxx, xxxxxx\n\n P.O. Box\n\n xxxxxxxxxx\n Limited\n\n\n Subcontractor Tax ID Number: N/A\n\n Subcontractor DUNS Number: xx-xxx-xxxxxx\n xxxxxxxx\n\n\n O.Box xxx - xxxxx - xxxxxx\n\n\n\n\n Page 1 of 49\n\n\n\n\n\n\n\n\n\n\n\n"
}

 

 

Becomes this set of vectors / text embeddings.

 

[0.0050294143,0.011339636,0.0052986285,0.025061412,-0.017213404,-0.0768811,-0.01173122,0.035634194,-0.030413067,0.044412214,0.01928554,-0.038212124,0.021080302,-0.004319667,-0.024033502,0.008268145,-0.046990145,0.027916715,0.025012463,-0.0068690456,0.040920585,0.03573209,0.041246906,0.025469312,0.014839423,0.004132033,-0.021602415,0.010287252,-0.0076522147,0.016462868,-0.03651526,0.031489924,0.016789189,-0.027459867,0.0599777,-0.008884074,0.021259777,0.041246906,-0.0028777386,-0.020705033,
...
...
...
-0.0130365025,-0.007668531,0.030184643,0.018224997,-0.0099772485,0.0034202463,-0.005804425,-0.0076603726]

 

 

With the query text embeddings & the page text embeddings we can now run cosine similarity / dot product calculations across them to get a score of how relevant the page text is to the query text.

CosSimilarityDotProduct.png

 

Run a Filter array action to filter out all the least relevant page texts that don't fit in the MaxFileTextCharacters limit.

FilterHighestScores.png

 

Re-sort all the most relevant pages to be ordered by page number so we can combine all the most relevant page texts into the relevant file text we will feed the prompt.

CombinedText.png

 

Example combined page texts (some pages in the middle removed for brevity). This started with page 19 & ended on page 29. And it also skipped pages between, like page 28, if their cosine similarity relevance scores were lower.

 

 xxxxxxxx
 arrangements, and the Services to be provided, along with the information specified in
 AIDAR 752.7004, EMERGENCY LOCATOR INFORMATION.

 (2) The Supplier shall ensure that its personnel, while in a Cooperating Country, abide by all

 applicable laws of the Cooperating Country and political subdivisions thereof.

 (3) Other than work performed under the Subcontract for which personnel are assigned by
 the Supplier, the Supplier's personnel shall not engage, directly or indirectly, either in

 their own name or in the name or through the agency of another person, in any business,
 profession or occupation in the Cooperating Country, nor shall they make loans or
 investments to or in any business, profession or occupation in the Cooperating Country,

 without xxxxxxxx' approval. This provision does not apply to personnel who are citizens
 or legal residents of the Cooperating Country.

 (4) The Supplier shall obtain (a) worker's compensation (Defense Base Act) insurance
 pursuant to FAR 52.228-3 and AIDAR 752.228-3, and (b) medical evacuation insurance

 for personnel travelling to a Cooperating Country in connection with this Subcontract.

 (5) Personnel travelling on the Supplier's behalf for performance of Related Services shall
 possess appropriate language skills, if any, stated in the Subcontract, and shall be

 physically fit in accordance with AIDAR 752.7033.

 (6) In performing Related Services, the Supplier shall comply with USAID guidance, if any,
 relating to branding/marking of activities.

 (7) FAR 52.246-4 INSPECTION OF SERVICES - FIXED PRICE (AUG 1996) shall apply
 to Related Services.


 (8) All logistics support, visas, legal compliance matters and taxes in connection with its
 personnel overseas shall be the sole responsibility of the Supplier, as will all liability for
 the acts and omissions of the Supplier's personnel performing the Related Services.

 (9) Compensation for satisfactory performance of Related Services shall be paid upon

 completion thereof in compliance with the terms and conditions of the Subcontract and
 solely in the form of the firm, fixed, all-inclusive prices.

 (10) Notwithstanding any other provisions of this Subcontract, no additional
 compensation or reimbursement will be provided to the Supplier for complying with these

 requirements concerning provision of Related Services

 ARTICLE 8. PACKING, EXPORT MARKING, PREPARATION FOR
 SHIPMENT AND PACKAGING

 A. All Goods supplied under this Subcontract shall be packed and marked for export as required

 by the Subcontract/Orders and by all applicable transportation regulations, carrier tariffs, US
 FDA/SRA regulations (if any), and sound commercial practice. Without limiting the generality
 of the foregoing, all Goods shall be properly prepared for export according to the best

 international packing standards suitable to prevent theft, loss, or damage and to withstand
 exposure to the elements, including extreme temperature and water, and rough handling
 during air, sea or land shipment.

 B. The Supplier shall be solely responsible for complying with all applicable laws and sound
 international practices, which includes having all relevant licenses in places at the Supplier's

 factory for the Goods and for shipping/loading in accordance with the applicable INCOTERM, for
 the packaging and labeling of the Goods (including, if applicable, hazardous materials safeguards).
 xxxxxxxx
 C. Packaging shall be prepared in accordance with the Subcontract and to ensure that:

 (1) All tertiary, secondary, and primary (whenapplicable) packaging for Goods are properly

 Page 18 of 49


 xxxxxxxx
 labelled per Section D below and clearly identifies any special handling instructions and/or
 temperature requirements

 (2) Preference is for EUR2 pallets (100x120). EUR1 (80x120) pallets are also acceptable.

 Other pallet types may be acceptable, in consultation with xxxx-xxx. In cases wherein
 the destination requires a specific pallet type/size, it will be specified in the relevant
 Purchase Order, and the Supplier will provide goods utilizing the specified pallet type/size.


 (3) Pallet height not to exceed 1.25 m (incl. pallet) for shipments using air freight. Pallet height

 may not exceed 2.1 m (incl. pallet) for shipments using sea freight.


 (4) Partial cartons, including those with batch-end products require an extra label clearly
 marking the cartons as "Partial" or equivalent and the quantity of units included within.


 (5) Like product and batches should be kept contiguous when loaded into containers and
 should not be separated. Corrugated separator sheets should be used between batches when
 multiple batches are packed on the same pallet.


 D. xxxxxxxx may be implementing xxx labeling requirements on tertiary packaging

 (pallet/logistics unit and carton/trade item) and/or secondary packaging and/or on the LLIN care
 label during the period of performance of this Subcontract. The Supplier may be required to
 comply with GS1 General Specifications for identification and marking details under the

 Subcontract. The Supplier may refer to the xxx barcode specifications for detailed requirements
 (xxxxxxxxxxxxx). xxxxxxxx will provide
 the Supplier with reasonable notice of the implementation requirement applicable to the

 Subcontract.
 -- - -
 E. Transaction and Production Data


 For orders with Incoterms other than DAP or DDP, all transaction and production data must be

 provided to xxxx-xxx through the xxxxx Logistics System
 (xxxx), including but not limited to the SSCC, GTIN, batch/lot number, and expiration date.
 For orders with incoterms DAP or DDP, all transaction and production data must be provided to

 xxxx-xxx via the Procurement Specialist. Data presented on transaction documents -
 including but not limited to the packing list, commercial invoice, and advanced ship notice -

 must align with the identifiers used on the shipping label (i.e. once the Subcontractor has
 transitioned to using the GTIN as the primary identifier, this must be used on packing lists as
 well).


 (1) Within 30 days of a request, the Supplier will make serial number data for goods procured
 under this subcontract in the format requested by xxxxxxxx.


 (2) A complete itemized packing list shall be carried in a secure, durable clearly-marked

 "packing list" envelope affixed to the outside of each pallet, shipping container or box
 that represents a separate unit of the shipment used to deliver the Goods. Each packing
 list must show the specified xxxxxxxx Subcontract/Order number (unless otherwise

 required by xxxxxx in writing, a complete narrative description of the Goods, all
 applicable part numbers, and the corresponding line item number.


 (3) Damage resulting from improper packing, export marking and preparation for shipment
 shall be the liability of the Supplier and deducted from amounts due.


 xxxxxxxxxxxxxxxxx Page 19 of 49
...
...
...
...
...

 xxxxxxxxx
 10 business days of a request by xxxxxxx, xxxx-QA, and or xxxx-xxx QA. Finished
 product must be retained for at least one year past the expiration date or according to supplier's

 retention procedure, whichever is longer.

 ARTICLE 13. TITLE AND RISK OF LOSS OR DAMAGE

 A. Supplier shall ensure that the title to Goods delivered and supplied hereunder shall pass
 directly to xxxxx upon acceptance pursuant to Article Quality Assurance, Testing,

 Inspection and Acceptance above.

 B. Notwithstanding completion of delivery, Supplier shall bear all risk of loss or damage to the

 Goods prior to acceptance, except to the extent that any loss or damage is due to xxxxxxx'
 fault, or occurs after delivery and not due to fault on the Supplier's part.


 ARTICLE 14. PAYMENT AND PAYMENT TERMS

 A. xxxxxxx will pay the total Order price as a lump sum, or in installments for agreed upon
 shipments, after the Supplier's delivery of the corresponding Goods and/or Related Services

 and xxxxxxx' designated agent's acceptance thereof, or as otherwise provided in the Order,
 according to the delivery schedule agreed by the Parties. xxxxxxx will pay the Supplier's
 invoice within forty-five (45) net days of receipt of a complete invoice and receipt of the

 corresponding evidence of delivery per the INCOTERM. The Supplier's submission must be
 in compliance with the Article labeled "Invoice Requirements" below.


 In the specific event that the Supplier agrees to hold xxxxxxxx' Orders after quality assurance
 processes have been completed but import waivers are still pending, then xxxxxxx will pay

 the total price within forty-five (45) net days after xxxxxxxx has issued a Certificate of
 ,Compliance (CoC) indicating that the Goods have passed the requisite QA testing and informed

 , the subcontractor of the pending importation waiver. Invoices for any orders placed with
 INCOTERMs other than FCA or ExWorks will also require proof of delivery and acceptance.
 xxxxxx will pay the total Subcontract price as a lump sum, or in installments for agreed upon

 shipments corresponding to complete Subcontract Order documentation. Should products
 require additional Quality Assurance testing prior to shipment, in the event of a batch rejection,
 Supplier will refund the pro rata amount of the invoice price applicable to the rejected products

 to xxxxxx, within ten (10) business days of notification of rejection in addition to remedies
 for non-conforming goods herein including as set forth in Article 13. Quality Assurance Testing,

 Inspection and Acceptance.

 B. Payments for approved invoices will be made by check or via Electronic Funds Transfer

 (EFT) for US bank/financial institution accounts or Wire Transfer for non-US bank accounts.
 Payment will be sent to the Supplier's designated recipient account name, account number,
 and bank or financial institution as identified in the Subcontract and in the payment account

 forms required herein to establish a payment account with xxxxxx xxxxxxxx.
 Incomplete or incorrect payment account forms to establish a new account or update an

 existing account will delay payment. All costs and risks arising out of, relating to, or resulting
 from EFT or Wire Transfer shall be borne by the Supplier. The following account forms are
 required to establish or update a payment account;

 (1) All US based Suppliers are required to complete the xxxxxxxx Electronic Funds Transfer

 Form and W9 Tax form to set up a payment account with xxxxxxxxx.

 (2) The Supplier with international banks are required to complete the xxxxxxxx
 International Wire Transfer form, including the Domestic (US) Intermediary Bank

 xxxxxxxx section. Selecting a US intermediaty/ bank facilitates an efficient transfer of funds and is Page 27 of 49


 xxxxxxxxx
 INCOTERM:

 INCOTERMS
 Documents Information Attributes
 EXW/FCA CIP/CPT/DDP

 Air Freight Dimensions or Volume and Gross
 Weight, Airport Departure, Airport
 Shipping/Delivery
 Doc: Airway Bill Destination, Shipper's Name,
 (AWB) X (Provided by Consignee, Carrier Charges
 X (Prepared for
 xxxxxxx) Supplier) Dimensions or Volume and Gross
 Ocean
 Shipping/Delivery Weight, Seaport Departure,
 Seaport Destination, Shipper's
 Doc: (BOL) Name, Consignee, Carrier Charges


 Delivery to
 Freight
 Forwarders X (Supplier
 Volume and Gross/Net Weight,
 Certificate of collects from Consignee, Shipper's Name,
 Receipt Designated X (Provided by
 Freight Supplier) Invoice #, PO #, Description of
 (Note: for Goods, Packaging Details,
 International Forwarder and destination
 Provides)
 Trucking only the
 Freight Forwarders
 Certificate of

 Receipt is
 provided)

 Volume and Gross/Net Weight,
 End Recipient Consignee, Shipper's Name, Invoice
 Goods Receipt X (Provided by #, PO #, Description of Goods,
 IN/A Supplier) Packaging Details, destination
 Notice or Proof of
 Delivery Receipt



 D. Invoices determined to be proper will be paid by xxxxxxx in accordance with the Article
 labeled "Payment and Payment Term" above and the terms of the Subcontract and the

 Order. Invoices determined not to be proper due to the existence of deficiencies will be
 rejected and the Supplier promptly notified, generally within ten (10) business days of
 submission, with deficiencies noted for correction. In the event that an invoice is submitted,

 which is partially proper, xxxxxxx may, in its sole discretion, either reject the entire
 invoice for correction or make payment of the proper portion and return the portion deemed

 not to be proper."

 ARTICLE 16. COOPERATING COUNTRY FEES, TAXES, AND DUTIES

 A. This Subcontract is entered into by xxxxxxx on behalf of the xxxx-xxx Project, in Cooperating Country(ies).
 As such, the Subcontract is free and exempt from any taxes, VAT, tariffs, duties, or

 other levies imposed by the laws in effect in the Cooperating Country(ies). The Supplier
 shall not pay any host country taxes, VAT, tariffs, duties, levies, etc. from which this

 xxxxx program is exempt. In the event that any exempt charges are paid by the Supplier,
 they will not be reimbursed to the Supplier by xxxxxxx unless approved in advance in
 writing by xxxxxxxx. The Supplier shall immediately notify xxxxxx if any such taxes
 alls Limited * quezue
 are assessed against the Supplier of if/subcontractors/suppliers at any tier.
 Page 29 of 49

 

 

See how the prompt is structured & the run of the GPT action. In this run I used GPT4 Turbo

prompt.pngGPTCall.png

 

GPT Response:

 

"According to the file, if goods are damaged as a result of improper packing, export marking, and preparation for shipment, the liability falls on the Supplier, and the cost associated with such damage shall be deducted from amounts due to the Supplier."

 

 

 

 

 

 

 

 

 

Import & Set-Up

To import the flow, download the GPTRAGQueryLargePDFs_1_0_0_x.zip file at the bottom of this post. Go to https://make.powerapps.com/, go to Solutions on the left-side menu, select Import solution, select Browse, & select the file you just downloaded. Select Next, select Next again. Then provide and/or create the connections for the solution / flow. Select Import & wait for the import to load. Select GPT RAG Query Large PDFs in the list of Solutions. Select the 3 vertical dots to the side of the GPT RAG Query Large PDFs flow title & in the pop-up menu select edit.

RAG-GPT0.png

 

Once in the flow, you can delete the Delete after import action.

RAG-GPT1.png

 

Then go to the Azure Function DotProduct Code action so we can set up a call to an Azure function to perform cosine similarity / dot product calculations to get our text relevance to query scores. Go to https://portal.azure.com/#view/HubsExtension/BrowseResource/resourceType/Microsoft.Web%2Fsites/kind/functionapp. Select Create and input Resource, Name, & Node.js as the stack, Select Review + create & then select Create. Then select go to Resource.

RAG-GPT2.pngRAG-GPT3.pngRAG-GPT4.pngRAG-GPT5.pngRAG-GPT6.png

 

Select Create Function & select HTTP trigger. Then select Create.

RAG-GPT7.pngRAG-GPT8.png

If you get an error, then you may have to refresh the resource page & Select the HTTPtrigger1 link.

RAG-GPT9(May Need To).png

 

Go to Code + Test, remove the placeholder code in the editor, copy the code from the flow Azure Function DotProduct Code action & paste that code into the Azure editor.

RAG-GPT10.pngRAG-GPT11.pngRAG-GPT12.png

 

On the Azure Function Code + Test editor page, select Get function URL & copy the URL. Go to the flow, inside the "Convert to txt and select most relevant pages" scope & inside the "Apply to each Convert to txt" loop the "HTTP Page text to Query score" action will need that URL pasted to the URI input.

RAG-GPT13.pngRAG-GPT14.png

After setting up the Azure Function call for the dot product calculations, we can then check in our Static Variables action & Query action. Adjust MaxFileTextCharacters & the Query to your needs. The larger the MaxFileTextCharacters, the more pages' texts will be sent to the prompt. So the higher the number of characters, the more file context the prompt will use, but the larger the prompt & token count will be.
So the higher the number, the higher the accuracy, but the lower the number, the lower the cost per query.

RAG-GPT15.png

 

Next we will set up our call to a custom text embeddings model to get the text embeddings / vectors for the text we are using in our Query.
Go to https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/OpenAI. Select Create. Input Resource, Name, & Price tier. Select Next 3 times. Select Create & then select Go to Resource.

Embed1.pngRAG-GPT17.pngRAG-GPT18.pngRAG-GPT19.pngRAG-GPT20.pngRAG-GPT21.pngRAG-GPT22.png

 

Go to OpenAI Studio. Select Models on the left-side menu. Then select text-embedding-3-large & select Deploy. Then input Name for your deployment, in Advanced options set the rate to 200k+ Tokens & select Create.

RAG-GPT23.pngRAG-GPT24.pngRAG-GPT25.png

 

Go back to the Azure AI Services resource page, & copy the name of the resource you just created. Go to the "HTTP Query text reference" flow action & paste that resource name over where the URI input says YOUR_RESOURCE_NAME.

RAG-GPT26.pngEmbed1Resource.png

Then go to the OpenAI Studio page again. Go to Deployments on the left-side menu. Copy the name of the deployment you just made to the clipboard & paste that deployment name to the same URI input, but this time paste over where it says YOUR_DEPLOYMENT_NAME.

RAG-GPT28.pngEmbed1Deploy.png

Go back to the Azure AI services resources page, select the resource you just created, then select Keys and Endpoint on the left-side menu & copy KEY 1 to the clipboard. Go back to the "HTTP Query text reference" flow action, remove all the text in the api-key value input & paste in the API key.

RAG-GPT30.pngRAG-GPT31.pngRAG-GPT32.png

 

Go to the "Get file metadata" flow action and select a large PDF for your use-case / for your test.

RAG-GPT33.png

 

Next we will set up the final call to the custom Azure GPT4 Turbo model we create.
Go to https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/OpenAI. Select Create. The input Resource, Name, & Price Tier. Select Next 3 times. Select Create & then select Go to Resource.

GPTAction.pngRAG-GPT35.pngRAG-GPT36.pngRAG-GPT37.pngRAG-GPT38.pngRAG-GPT39.pngRAG-GPT40.png

Go to OpenAI Studio. Select Models on the left-side menu. Select what GPT instance you want to use (at the time of writing this GPT4 Turbo 0125 Preview was the latest model). Select Deploy. Then input the Name for your deployment, to to the advanced options & set the rate to 30k+ Tokens. Then select Create.

RAG-GPT41.pngRAG-GPT42.pngRAG-GPT43.png

 

Go back to the Azure AI services resources page & copy the name of the resource you just created. Go to the flow "HTTP LLM Prompt" action & paste the resource name over where the URI input says YOUR_RESOURCE_NAME.

RAG-GPT44.pngRAG-GPT45.png

 

Then go to the OpenAI Studio page again. Go to Deployments on the left-side menu. Copy the name of the deployment you just made to the clipboard & paste that deployment name to the same URI input, but this time paste over where it says YOUR_DEPLOYMENT_NAME.

RAG-GPT46.pngRAG-GPT47.png

 

Go back to the Azure AI services resources page, select the resource you just created, then select Keys and Endpoint on the left-side menu & copy KEY 1 to the clipboard. Go back to the "HTTP Query text reference" flow action, remove all the text in the api-key value input & paste in the API key.

RAG-GPT48.pngRAG-GPT49.pngRAG-GPT50.png

 

Thanks for any feedback,

Please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86).

And reach out on LinkedIn (https://www.linkedin.com/in/kolota/) if you want to hire me to consult or build more custom Microsoft solutions for you.



Solution Zip Download Link: https://drive.google.com/file/d/1e5YXp2vJkeInAJSnZQjm9xt5J8fJnG0S/view?usp=sharing

Legacy Power Automate Import: https://drive.google.com/file/d/1hxyo7BoKlXTKETSTeypxu9vTYPtCCoyO/view?usp=sharing

Categories:

AI Builder

Comments

  • DJ-20090719-0 Profile Picture DJ-20090719-0
    Posted at
    Query Large PDFs With GPT RAG
    Hi,
     
    I am very interested into knowing more about this flow. Unfortunately, all the images you pasted in this article are not visible (links to these images seem broken).
     
     
    Is there a way to get it fixed ?
     
    Thanks a lot,
     
    Regards,
     
    Damien
  • takolota1 Profile Picture takolota1 4,617
    Posted at
    Query Large PDFs With GPT RAG

    Bug Fix

     

     

    I found the Convert to txt loop inside the Convert to txt scope would error if it was passed a page that the Recognize text action found no text on (so the lines parameter was an empty array [ ]).
    I changed the "Filter array RemoveUnselectedPageBlanks" action logic to...

     

    @And(greater(length(string(item())), 0),not(equals(empty(item()?['lines']), true)))

     

     to remove any blank pages & avoid this error.

  • takolota1 Profile Picture takolota1 4,617
    Posted at
    Query Large PDFs With GPT RAG

    @mm00 Ah good catch. Must have copied something from the LLM prompt when resetting the templates. I adjusted the downloads & pictures to fix that.

    Thanks,

  • mm00 Profile Picture mm00
    Posted at
    Query Large PDFs With GPT RAG

    ah, thanks for that. re-running now - seems to be happier.

     

    the screenshot you provided above for the configuration has "completions":

     

    mm00_1-1711124333286.png

     

     

  • takolota1 Profile Picture takolota1 4,617
    Posted at
    Query Large PDFs With GPT RAG

    @mm00 What did you use for the URI?

    It should be something like
    https://xxxxxxx.openai.azure.com/openai/deployments/xxxxxxxxx/embeddings?api-version=2023-05-15

    There should not be a /completion part of the uri

  • mm00 Profile Picture mm00
    Posted at
    Query Large PDFs With GPT RAG

    hiya! impressive flow, thanks for the detailed exposition.

     

    i imported into my project and made the changes you described, but get the following error (in the node "HTTP Query Text Embeddings") when the flow runs:


    "The completion operation does not work with the specified model, text-embedding-3-large. Please choose different model and try again."

     

    I put a company report pdf into one drive, and my query is "what companies are mentioned in the document?"

  • Tjan Profile Picture Tjan 1,064
    Posted at
    Query Large PDFs With GPT RAG

    @takolota  You amaze me once again, awesome! If I can find some time in the future, I'll make sure to test this out.

  • ARB_wcc Profile Picture ARB_wcc 283
    Posted at
    Query Large PDFs With GPT RAG

    Impressive work, thanks for sharing!

  • takolota1 Profile Picture takolota1 4,617
    Posted at
    Query Large PDFs With GPT RAG

    Placeholder