I built a flow to count the number of pages of a PDF using the expression below that I found in one of the forums:
sub(length(split(replace(base64ToString(outputs('Get_file_content')?['body']['$content']), 'Type/Pages', ''), 'Type/Page')),1)
It was working well until a while ago. Now the flow no longer counts the number of pages and neither is reading the PDF to include the DocuSign tabs.
For context, I receive several documents from a supplier that I need to put the number of pages that each one has in an Excel and send for signature in the appropriate fields. Since January, the flow that I built in December and which was working very well, is no longer working and the only difference I noticed was that the PDF version that used to be 1.4 became 1.7.
Can anyone tell me why this is happening and how to resolve it?
I already tried to see some ways to convert these PDFs to 1.4 to see if it worked, but I couldn't find it.
sub(length(split(replace(base64ToString(outputs('Get_file_content')?['body']['$content']), 'Type/Pages', ''), 'Type/Page')),1)
Hi @ThauanyMoraes ,
I needed the same function and saw the post with this expression. Which did not work for me from the start. I saw your post and tried to dig into the expression a little.
I found that for my pdf documents there is a space between "Type" and "/Page"
I adjsuted the expression and it works for me now:
Regards,
Sijing
Hello @DTV
I don't see an action "Extract Information from forms" at AI Builder. But just look at the output and search for information about what you are searching for.
I have a flow that's using "Extract Information from forms" action of Ai builder to get the content of PDF documents - can I get the total page count from that itself ? Alternately if I add the "Recognize text in an image or a PDF document" step as you mentioned here, then will that increase the AI builder units consumed by my flow ?
I did read somewhere to use "Prediction output - page count" or something like that but I don't see such a dynamic content available from the "Extract Information ..." mentioned above.
Well to be fair, I visited this thread last before your response and I hadn't seen your approach to it. Once I figured it out, I came back to respond with my solution. I never tested for whether it would work without having to pass the "result" attribute through a Parse JSON action, but my understanding at that time was that it would work with a Parse JSON action.
Isn't this exactly the same as I already suggested with an additional "Parse JSON" that is not needed?
Hi!
I have figured out how to do this the smarter way. This solution works for all kinds of PDFs (scanned included) unlike other solutions listed here which work only for some but not all PDFs.
See flow below:
Formula: length(body('Parse_JSON'))
To get the schema for the Parse JSON action, build the flow up to the AI model action "Recognise text in an image or a PDF document". Run it and then go to the run history, copy the output of the "Results" property, come back to editing the flow and paste it in the "Generate from sample" section of the action.
Voila! This works for all PDF types and returns the count of total pages in a PDF.
Hope this helps!
Hello @ThauanyMoraes,
In case you haven't found a workaround already. You could also use "Recognize text in an image or a PDF document" and check for the length of the result.
Hi @ThauanyMoraes
If the native solution is providing inconsistent results for you, you can try Encodian's Get PDF Document Information action which will return the number of pages as well as other metadata (e.g. file size, author, creator, title, subject, keywords, page width and height, orientation, created/modified dates, PDF format and more).
Hope that helps.