Identify Text/ No. from a pdf document

(0) Share

Report

Posted on by concjames

115

Hi,

I would like to identify Text/ No. from a pdf document. For example, when I uploaded a pdf document, I want to know where and how many texts (e.g. 20000424) in this document.

Could you suggest what AI model I should choose?

Categories:

AI Builder

I have the same question (0)

All responses (7)

Answers (0)

Sort by

plarrue Microsoft Employee on at

Like (0)

Report
Copy link

Link copied!

Hi @concjames ,

Thanks for reaching out.

Have you tried if the prebuilt model Entity extraction could help you on your project ?

Entity extraction prebuilt AI model - AI Builder | Microsoft Docs

Hope it helps

Was this reply helpful? Yes No
concjames 115 on at

Like (0)

Report
Copy link

Link copied!

Thanks so much. I actually have two questions.

1. Let say, the document is saying "The Review shall follow procedure Doc ID 91209684, and its related templates, Doc ID 91205339 and 91205340." How can we train the model to recognize the Doc ID No. ?

2. I actually want to import a pdf document with few pages and search this Doc ID No. automatically. However, when I tried the 1st time, it said the loading is too much. I dun know why.

Was this reply helpful? Yes No
Antrod Microsoft Employee on at

Like (0)

Report
Copy link

Link copied!
Hi @concjames,

1. You can train a custom AI Builder Entity Extraction model for that. The only tricky thing is that it doesn't accept files as training input but plain text sentences that you first need to add into a Dataverse table. You can learn more in this documentation: Create an entity extraction custom AI model - AI Builder | Microsoft Docs
So, you will need to extract the text from your pdf and add it into a Dataverse table to create the training set. It can take some time but you would need to do it only once.

As for running the model, same thing: you can only pass plain text to the model and not a file. If you want to run the model using a Power Automate cloud flow, there are couple of possibilities to extract text from pdf:
Use AI Builder text recognition model to first extract text from pdf in your flow (Text recognition prebuilt AI model - AI Builder | Microsoft Docs). Use the output of this action as an input for your Entity extration model action.
Use the action "Extract text from pdf" using Power Automate Desktop. With that you can for example add the text into txt files that can be used by your cloud flow
Use an external premium connector to extract PDF content from a file

FYI, we are also working on an AI Builder model that is capable of extracting unstructured data directly from files, it should be released next year.

2. Can you elaborate which test you performed? Where did you try to import your file?

Thanks

Was this reply helpful? Yes No
concjames 115 on at

Like (0)

Report
Copy link

Link copied!

Thx for the input. My test is quite simple. After I imported the document in Power Automate cloud version, the flow could identify Doc ID 91209684 (for example), and tell me how many and where.

Was this reply helpful? Yes No
concjames 115 on at

Like (0)

Report
Copy link

Link copied!

also i had this error, when tried to created a customized model. You have any advise?

Was this reply helpful? Yes No
Antrod Microsoft Employee on at

Like (0)

Report
Copy link

Link copied!
Hi @concjames,

You need to add the training sentences in the selected table/column prior to creating the model.

From your pdf docs, identify key sentences that contain the entities you want to extract. (Try to identify sentences with different format to train the model efficiently).
Copy each identified sentence and add them in your Dataverse table that will be used as training set of your model. Each sentence should be added as a new line in your Dataverse table, but always in the same column.
Create your Entity extraction model and point at the Dataverse table and column in which the sentences are stored.
Start tagging the entities you need to extract in each sentences.
Thanks.

Was this reply helpful? Yes No
concjames 115 on at

Like (1)

Report
Copy link

Link copied!

Thank you so much. Let me see how to create the Dataverse table. It seems it needs some time.

Was this reply helpful? Yes No