web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Identify Text/ No. fro...
Power Automate
Unanswered

Identify Text/ No. from a pdf document

(0) ShareShare
ReportReport
Posted on by 115

Hi,

 

I would like to identify Text/ No. from a pdf document. For example, when I uploaded a pdf document, I want to know where and how many texts (e.g. 20000424) in this document. 

 

Could you suggest what AI model I should choose?

 

Categories:
I have the same question (0)
  • plarrue Profile Picture
    Moderator on at

    Hi @concjames ,

     

    Thanks for reaching out.

    Have you tried if the prebuilt model Entity extraction could help you on your project ?

    Entity extraction prebuilt AI model - AI Builder | Microsoft Docs

     

    Hope it helps

     

  • concjames Profile Picture
    115 on at

    Thanks so much. I actually have two questions. 

     

    1. Let say, the document is saying "The Review shall follow procedure Doc ID 91209684, and its related templates, Doc ID 91205339 and 91205340." How can we train the model to recognize the Doc ID No. ?

     

    2. I actually want to import a pdf document with few pages and search this Doc ID No. automatically. However, when I tried the 1st time, it said the loading is too much. I dun know why. 

  • Antrod Profile Picture
    Moderator on at

    Hi @concjames,

     

    1. You can train a custom AI Builder Entity Extraction model for that. The only tricky thing is that it doesn't accept files as training input but plain text sentences that you first need to add into a Dataverse table. You can learn more in this documentation: Create an entity extraction custom AI model - AI Builder | Microsoft Docs

    So, you will need to extract the text from your pdf and add it into a Dataverse table to create the training set. It can take some time but you would need to do it only once. 

     

    As for running the model, same thing: you can only pass plain text to the model and not a file. If you want to run the model using a Power Automate cloud flow, there are couple of possibilities to extract text from pdf:

    • Use AI Builder text recognition model to first extract text from pdf in your flow (Text recognition prebuilt AI model - AI Builder | Microsoft Docs). Use the output of this action as an input for your Entity extration model action.
    • Use the action "Extract text from pdf" using Power Automate Desktop. With that you can for example add the text into txt files that can be used by your cloud flow
    • Use an external premium connector to extract PDF content from a file

     

    FYI, we are also working on an AI Builder model that is capable of extracting unstructured data directly from files, it should be released next year.

     

    2. Can you elaborate which test you performed? Where did you try to import your file?

     

    Thanks

  • concjames Profile Picture
    115 on at

    Thx for the input. My test is quite simple. After I imported the document in Power Automate cloud version, the flow could identify Doc ID 91209684 (for example), and tell me how many and where. 

  • concjames Profile Picture
    115 on at

    also i had this error, when tried to created a customized model. You have any advise? 

     

    concjames_0-1636301869505.png

     

  • Antrod Profile Picture
    Moderator on at

    Hi @concjames,

     

    You need to add the training sentences in the selected table/column prior to creating the model.

     

    • From your pdf docs, identify key sentences that contain the entities you want to extract. (Try to identify sentences with different format to train the model efficiently).
    • Copy each identified sentence and add them in your Dataverse table that will be used as training set of your model. Each sentence should be added as a new line in your Dataverse table, but always in the same column.
    • Create your Entity extraction model and point at the Dataverse table and column in which the sentences are stored.
    • Start tagging the entities you need to extract in each sentences.

    Thanks.

  • concjames Profile Picture
    115 on at

    Thank you so much. Let me see how to create the Dataverse table. It seems it needs some time. 

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 501 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 323 Moderator

#3
abm abm Profile Picture

abm abm 237 Most Valuable Professional

Last 30 days Overall leaderboard