• Fruits
• Apple
• Orange
• Car
• BMW
Hi all, the above was extracted from a PDF using Extract text from PDF action with Optimize for structured data enabled.
I would like to extract the main bullet points string and concat with their sub bullet points string. For example, the desired result is as follow:
Fruits; Apple; Orange
Car; BMW
OR
•Fruits •Apple •Orange
•Car •BMW
Both will do, how do I achieve that?
We have the flexibility to choose our scripting language based on convenience and execution time. For instance, if you’re comfortable with PowerShell or Python, you can achieve similar results as what I did in .NET.
Both Python and PowerShell offer alternatives for implementing the same functionality. However, Power Automate Desktop (PAD) execution time tends to be longer when dealing with long loops, whereas scripting languages often perform better in such scenarios.
Thanks,
Deenuji Loganathan 👩💻
Automation Evangelist 🤖
Follow me on LinkedIn 👥
-------------------------------------------------------------------------------------------------------------
If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀
I think I will just stick to the first method as it is still working for me after I made some changes to it.
Also, do you think using a script e.g powershell/python will be a better option?
Method 2:
Please be aware that if there are any modifications to the input provided, it may no longer function as expected. Additionally, direct regular expressions won’t suffice for your specific use case. The built-in parsing regex has certain limitations as it returning only one first match and not all.
So below I am suggesting .net script as alternative for the above.
• Fruits
• Apple
• Orange
• Car
• BMW
Using .Net scripts
Code(Refer the previous suggestion how to copy/paste the below code into your PAD):
SET ExtractedPDFText TO $'''• Fruits
• Apple
• Orange
• Car
• BMW'''
Variables.CreateNewList List=> Outputlist
Scripting.RunDotNetScript Imports: $'''System.Text.RegularExpressions''' Language: System.DotNetActionLanguageType.CSharp Script: $'''string pattern = @\"(?<=^•\\s)[^\\n]+(?:\\n\\s{5}•[^\\n]+)*\";
Regex regex = new Regex(pattern, RegexOptions.Multiline);
matches = new List<string>();
foreach (Match match in regex.Matches(input))
{
string[] lines = match.Value.Split(\'\\n\');
string mainPoint = lines[0].Trim();
string[] subPoints = new string[lines.Length - 1];
Array.Copy(lines, 1, subPoints, 0, lines.Length - 1);
string subPointsConcatenated = string.Join(\" \", subPoints);
matches.Add(\"• \" + mainPoint + \" \" + subPointsConcatenated);
}''' @'name:input': ExtractedPDFText @'type:input': $'''String''' @'direction:input': $'''In''' @'name:matches': $'''''' @'type:matches': $'''List''' @'direction:matches': $'''Out''' @matches=> Outputlist
Thanks,
Deenuji Loganathan 👩💻
Automation Evangelist 🤖
Follow me on LinkedIn 👥
-------------------------------------------------------------------------------------------------------------
If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀
Please follow the below approach:
Output:
Code:
SET ExtractedPDFText TO $'''• Fruits
• Apple
• Orange
• Car
• BMW'''
Text.SplitText.Split Text: ExtractedPDFText StandardDelimiter: Text.StandardDelimiter.NewLine DelimiterTimes: 1 Result=> TextList
Variables.CreateNewList List=> OutputList
SET mainPoint TO $'''%''%'''
SET Counter TO 0
SET TextListCount TO TextList.Count
LOOP FOREACH CurrentItem IN TextList
IF StartsWith(CurrentItem, $'''•''', True) THEN
IF IsNotEmpty(mainPoint) THEN
Variables.AddItemToList Item: mainPoint List: OutputList
END
SET mainPoint TO CurrentItem.Trimmed
ELSE
SET mainPoint TO mainPoint + CurrentItem.Trimmed
END
SET Counter TO Counter + 1
IF Counter < TextListCount THEN
IF StartsWith(TextList[Counter], $'''•''', True) THEN
Variables.AddItemToList Item: mainPoint List: OutputList
SET mainPoint TO $'''%''%'''
END
ELSE
Variables.AddItemToList Item: mainPoint List: OutputList
END
END
How to copy/paste above code into your Power automate desktop?
Thanks,
Deenuji Loganathan 👩💻
Automation Evangelist 🤖
Follow me on LinkedIn 👥
-------------------------------------------------------------------------------------------------------------
If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀