web
You’re offline. This is a read only version of the page.
close
Skip to main content

Announcements

News and Announcements icon
Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Failed to Extract Text...
Power Automate
Unanswered

Failed to Extract Text with OCR with Tesseract Engine

(0) ShareShare
ReportReport
Posted on by 47

I am currently trying to extract a small bit of text from a scanned pdf file.  I am using the "Extract Text with OCR" action and get the error below every time. I have tried either reading all of the text from the fall or a subregion with the same result. I have confirmed that the Tesseract connector is on my local machine.  I've also tried this with "Create Tesseract OCR engine" as the prior action (even though I believe that is no longer needed) with the same result. 

 

2021-06-16_16h34_34.png

 

Parameter is not valid.: Robin.Core.ActionException: Failed to extract text with OCR ---> System.ArgumentException: Parameter is not valid.
 at System.Drawing.Bitmap..ctor(String filename)
 at Robin.Modules.OCR.Utilities.Utilities.GetImageForOCR(OCRSource source, SourceScanMode sourceScanMode, Nullable`1 scanRegionX1, Nullable`1 scanRegionY1, Nullable`1 scanRegionX2, Nullable`1 scanRegionY2, IEnumerable`1 imagesToFind, Int32 tolerance, Boolean waitForImage, Boolean timeoutSet, Nullable`1 timeout, Nullable`1 searchRegionImageX1, Nullable`1 searchRegionImageY1, Nullable`1 searchRegionImageX2, Nullable`1 searchRegionImageY2, Action suspendSecureScreen, Action restoreSecureScreen, String imageFilepath, IImageFinder imageFinder)
 at Robin.Modules.OCR.Actions.ExtractTextWithOCRBase.Execute(ActionContext context)
 --- End of inner exception stack trace ---
 at Robin.Modules.OCR.Actions.ExtractTextWithOCRBase.Execute(ActionContext context)
 at Robin.Runtime.Engine.ActionRunner.RunAction(String action, Dictionary`2 inputArguments, Dictionary`2 outputArguments, IActionStatement statement)

I would greatly appreciate some help with this! 

I have the same question (0)
  • Pavel_NaNoi Profile Picture
    1,074 on at

    I'm just making sure here, but is the file a PDF or an actual image? I'm fairly certain that action cannot extract text from an actual PDF file, only images or a foreground window. If it is an image, this might honestly be a case of a weird image extension, make sure its in .jpeg or .png

  • afmc2238 Profile Picture
    47 on at

    I had played around with this and got it to partially work when I changed the file to a .png.  However, it still doesn't work when I use the selector tool to grab only a certain area of the image.  It only works if I grab all text from the image, and the results are very inaccurate. 

    Most likely we will just need to incorporate a better OCR tool to get it to work as we need for our use case.

    Thanks for the suggestion!!

  • Pavel_NaNoi Profile Picture
    1,074 on at

    Oh wait I forgot to ask, isn't there a PDF action in power automate desktop that extracts all the text instantly?

    Pavel_NaNoi_0-1624453629178.png

    You could probably just parse the text that you want from the variable that action produces. with regex

     

    Also, yeah the OCR can be a bit of a pain when it comes to this, I recommend the free trial of AI builder on the power automate platform if you haven't accessed it yet, that thing works with pdfs and images and you can select exactly what you want to extract, fairly simple to understand as well, god that sounds like an advertisement when I read it out loud ^^| but yeah, give that a spin if you're out of options.

  • afmc2238 Profile Picture
    47 on at

    Well the problem is that this is a scanned document rather than a readable PDF so that's why I needed to use OCR. 

     

    I started a free trial of AI Builder last week but didn't see how to use this with desktop Power Automate. I see that you could use Microsoft Computer Vision....but would love to play around with AI Builder in PAD if possible. Do you know how to make that work?

  • Pavel_NaNoi Profile Picture
    1,074 on at

    It depends if you have windows 10 pro/windows server 2016/windows server 2019 or not, if you do, it should be easy to feed ai-builder items into PAD through power automate, and I can help guide you through it a bit, otherwise it wont work.

     

    Also, If you got it to run and its just being in-accurate, in the Tesseract OCRengine change the image width and height multiplier to 2 instead of 1,

    Pavel_NaNoi_1-1624455113120.png

     

    this should help it a lot, from there its more of finding the correct x and y positions of the text (use If Text on screen (OCR) to find the position of a specific text value more accurately)

     

     

     

  • afmc2238 Profile Picture
    47 on at

    Great -Thank you!

  • henryhvb5 Profile Picture
    46 on at

    I have the same problem, but this problem is found after update from 2.13xx version to 2.14.173.21294, and my account is a free account, the OCR engine variable value show blank without any error message. Before update this engine can extract value. But now I have start a new flow and use the same pdf image use the same extraction method, but the unable to extract any text. What should I do ?

  • Pavel_NaNoi Profile Picture
    1,074 on at

    Its because the tesseract engine initialization action has been depreciated in that update,  the OCR engine initialization action didn't have much use outside of being an extra action, so its now just in any "Extract OCR Text" action where you have to select instead of "OCR engine variable" in OCR Engine type, to "Tesseract Engine" where it will work just like before. If that's not it you can also keep increasing the width and height like I mentioned in the previous post as that can also be the reason because OCR is just very janky.

     

    Also, there's an action for extracting text from a pdf directly called "Extract text from PDF", try that if you get stuck and just parse it.

  • afmc2238 Profile Picture
    47 on at

    Unfortunately I was never able to get this to work consistently. Luckily the option to use an API call instead became available, and that works every time. 

  • henryhvb5 Profile Picture
    46 on at

    Thank you for your reply, my cases can't use the Extract text from PDF, since the PDF is an invoice for user to sign and then scan back as an image.

     

    In this case, base on my understanding from your advice, I should got another OCR Engine to install in windows and use the OCR engine variable to my flow, am I right ? ( btw, this version can select the tesseract engine in the pull down menu)

     

    If the tesseract engine not working, where should I got those OCR engine ? (those require to paid and free engine)

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Introducing the 2026 Season 1 community Super Users

Congratulations to our 2026 Super Users!

Kudos to our 2025 Community Spotlight Honorees

Congratulations to our 2025 community superstars!

Congratulations to the March Top 10 Community Leaders!

These are the community rock stars!

Leaderboard > Power Automate

#1
Haque Profile Picture

Haque 594

#2
Valantis Profile Picture

Valantis 328

#3
David_MA Profile Picture

David_MA 281 Super User 2026 Season 1

Last 30 days Overall leaderboard