Power Platform Community Forum Thread Details

Hi Team,

I am creating an agent which helps users to be updated of the projects based on the documents available in the sharepoint document library. The documents are actually scanned pdf. So, I have created an independent flow extracting information from scanned documents and storing as .txt file into sharepoint document library. This sharepoint document library is the knowledge source for the agent.

Out of 8 files, the extraction was done and uploaded on an hour ago. and other 4 files were uploaded after 30 mins. Now, the agent could only respond to queries only based on files uploaded on an hour ago. The newly uploaded SharePoint documents are not indexed and made searchable by the agent in real time.

How to solve this issue? could anyone help on this issue?

Thanks!

Categories:

General topics

Scanned pdf : so it's not pdf, it's image. Most time "image PDF" cannot be read by default copilot studio, it has some basic OCR capacity but they are not instand (wait one day, some time there are surprise) but scanned PDF is a totaly wrong way of publishing content : most search index (IA or not)just hate this and need advanced indexation to be read.

Other possibility : you didn't use the sharepoint big button in the knowledge (wich is sharepoint keyword search) with fast index but the small one wich is dataverse sharepoint in reallity and take hup to 8H to index (and OCR image !)

last possibility : sometime when citizen dev are messing with doc like scanned PDF, IT cut there copilot studio sharepoint connector :( so index stop.