Skip to main content

Notifications

Merge And Split PDFs

takolota1 Profile Picture Posted 31 Jul 2024 by takolota1 4,768


Merge PDFs, split PDFs by page, & split PDFs by text found on pages without any 3rd party connectors.

There are 3rd party connectors like Adobe & Encodian to merge or split PDFs, but some organizations would prefer the data security, privacy, & lower cost of a Microsoft solution for merging or splitting PDFs. This template provides such a solution using HTTP calls to an Azure Function that uses the Python PyMuPDF library to merge or split PDFs. Also with higher volume workloads, performing say 20,000 pdf actions per month using a service like Encodian would cost more than $80 per month, whereas an Azure Function would cost less than 40 cents for 20,000 actions, plus $15 per month if you do not already have a premium Power Automate license.



Flow Preview






Import & Set-Up

Find & download the Solution import package at the bottom of this main post. Go to the Power Apps home page (https://make.powerapps.com/). Select Solutions on the left-side menu, select Import solution, Browse your files & select the MergeAndSplitPDFs_1_0_0_xx.zip file you just downloaded. Then select Next & follow the menu prompts to apply or create the required connections for the solution flows. And finish importing the solution.


Once the solution is done importing, select the solution name in the list at the center of the screen. Once inside the solution click on the 3 vertical dots next to the flow name & select edit.



Now that the flow is imported & open, we need to set up the Azure Function used for the Page Split, Text Split, & Merge HTTP calls.


If you have already worked with and deployed Azure Functions before, then you can skip the extra installations.
If you haven't deployed Azure Functions, you can go to the Microsoft Store & make sure you have VS Code & Python installed.


Once VS Code is installed, open it. Go to the 4 blocks on the left side menu to open the list of extensions. Search for Azure in the extensions & select to install Azure Functions. Azure Account & Azure Resources will automatically be installed too.


Once all the extensions are installed, go to the Azure A on the left side menu & select to sign in to Azure.


Next set up a project folder on your machine for Azure Functions & a sub-folder for this Merge And Split project.


Back in VS Code select the button to create a new Azure Function. Follow the Function set-up instructions selecting the Merge And Split project folder you just created, Python language Model V2, and where in VS Code to open the new Azure Function project.

​​​​​​​

Once all the project files are loaded in VS Code, select the function_app.py file. Remove all the code in the file. Go back to the tab with the flow, open the "Azure Function Python Script" action, copy its contents & paste them into the function_app.py file in VS Code. Cntrl+S / Save the file.



Next go to the requirements.txt file. Go to the flow to the "Azure Function Requirements.txt" action & copy its contents. Paste the contents into the requirements.txt file in VS Code. Cntrl+S / Save the file.



Go back to the Azure A on the left-side menu. Select the Deploy function button. Select Create New in the list of function. Follow the menus/prompts to create a new function. (If Create New doesn't appear, you may have to log in to Azure, navigate to Azure Functions & go through the process to create a new function so the new function will appear in the list of function options to deploy to.)


Go to Azure & login. Go to Function App. Find & select the newly deployed function. Select the 1st function under Name. Select Get function URL & in the pop-up menu & copy the Function key url.



Paste the function URL into the URI input of each of the HTTP actions.


Each HTTP action has a Content parameter for the Body that takes the content block from any of the Get file content actions. The content block that looks like...
{
"$content-type": "application/pdf",
"$content": "AbC123..."
}
No need to isolate the base64 in $content, it takes the whole block.
And each HTTP action also returns a similar content block or an array of content blocks as the Body of its outputs (MERGE calls should return a single content block & SPLIT calls should return an array with a content block for each split of PDF pages). These content blocks can feed directly into Create file, Attachment, and other actions.

The Page split requires the Operation parameter to be set to SPLIT, the Method parameter to be set to PAGE, and the Pages parameter to be an array of page numbers to split on. Splits start at the start of the page listed & end 1 page before the next page # listed.


The Text split requires the Operation parameter to be set to SPLIT, the Method parameter to be set to TEXT, and the Split Text parameter to be a text string to search for on each page to determine where to split. Splits start at the start of the page where the chosen text is found & end 1 page before the next page where the text is found.


The Merge requires the Operation parameter to be set to MERGE. Files are merged in the order they are present in the Content input array. Note you can manually create a Content array by inserting a comma-separated list of content blocks between array square brackets [ ], or if you already have an array of file content blocks then you can just put the dynamic content for that array in the Content parameter input.
​​​​​​​


The flow is set up so you can test things out with some OneDrive files, but once you are comfortable with how things work, then you can copy & paste the actions into any of your flows where you need to merge or split PDF files.



Thanks for any feedback,

Please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86).

And reach out on LinkedIn (https://www.linkedin.com/in/kolota/) if you want to hire me to consult or build more custom Microsoft solutions for you.





Categories:

Comments