Skip to main content

Notifications

Power Automate - Power Automate Desktop
Answered

Regex

(0) ShareShare
ReportReport
Posted on by 75

 

 

• Fruits
 • Apple
 • Orange
• Car
 • BMW

 

 

 

Hi all, the above was extracted from a PDF using Extract text from PDF action with Optimize for structured data enabled.

nicklimcs_0-1714704968066.png

 

I would like to extract the main bullet points string and concat with their sub bullet points string. For example, the desired result is as follow:

 

 

Fruits; Apple; Orange
Car; BMW

 

 

 OR

 

 

•Fruits •Apple •Orange
•Car •BMW

 

 

Both will do, how do I achieve that?

  • Deenuji_Loganathan_ Profile Picture
    Deenuji_Loganathan_ 6,087 on at
    Re: Regex

    @nicklimcs 

    We have the flexibility to choose our scripting language based on convenience and execution time. For instance, if you’re comfortable with PowerShell or Python, you can achieve similar results as what I did in .NET.

     

    Both Python and PowerShell offer alternatives for implementing the same functionality. However, Power Automate Desktop (PAD) execution time tends to be longer when dealing with long loops, whereas scripting languages often perform better in such scenarios.

     


    Thanks,
    Deenuji Loganathan 👩‍💻
    Automation Evangelist 🤖
    Follow me on LinkedIn 👥

    -------------------------------------------------------------------------------------------------------------
    If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

  • nicklimcs Profile Picture
    nicklimcs 75 on at
    Re: Regex

    @Deenuji 

     

    I think I will just stick to the first method as it is still working for me after I made some changes to it.

     

    Also, do you think using a script e.g powershell/python will be a better option?

  • Deenuji_Loganathan_ Profile Picture
    Deenuji_Loganathan_ 6,087 on at
    Re: Regex

    Method 2:

    Please be aware that if there are any modifications to the input provided, it may no longer function as expected. Additionally, direct regular expressions won’t suffice for your specific use case. The built-in parsing regex has certain limitations as it returning only one first match and not all. 

    So below I am suggesting .net script as alternative for the above.

    • Fruits
     • Apple
     • Orange
    • Car
     • BMW

     

    Using .Net scripts

    Deenuji_0-1714715023586.png

    Deenuji_1-1714715059600.png

     

     

    Code(Refer the previous suggestion how to copy/paste the below code into your PAD):

    SET ExtractedPDFText TO $'''• Fruits
     • Apple
     • Orange
    • Car
     • BMW'''
    Variables.CreateNewList List=> Outputlist
    Scripting.RunDotNetScript Imports: $'''System.Text.RegularExpressions''' Language: System.DotNetActionLanguageType.CSharp Script: $'''string pattern = @\"(?<=^•\\s)[^\\n]+(?:\\n\\s{5}•[^\\n]+)*\";
    Regex regex = new Regex(pattern, RegexOptions.Multiline);
    
     matches = new List<string>();
    foreach (Match match in regex.Matches(input))
    {
     string[] lines = match.Value.Split(\'\\n\');
     string mainPoint = lines[0].Trim();
     string[] subPoints = new string[lines.Length - 1];
     Array.Copy(lines, 1, subPoints, 0, lines.Length - 1);
     string subPointsConcatenated = string.Join(\" \", subPoints);
     matches.Add(\"• \" + mainPoint + \" \" + subPointsConcatenated);
    }''' @'name:input': ExtractedPDFText @'type:input': $'''String''' @'direction:input': $'''In''' @'name:matches': $'''''' @'type:matches': $'''List''' @'direction:matches': $'''Out''' @matches=> Outputlist

     


    Thanks,
    Deenuji Loganathan 👩‍💻
    Automation Evangelist 🤖
    Follow me on LinkedIn 👥

    -------------------------------------------------------------------------------------------------------------
    If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

  • Verified answer
    Deenuji_Loganathan_ Profile Picture
    Deenuji_Loganathan_ 6,087 on at
    Re: Regex

    @nicklimcs 

     

    Please follow the below approach:

    Deenuji_0-1714712789470.png

     

    Output:

    Deenuji_1-1714712837248.png

    Code:

    SET ExtractedPDFText TO $'''• Fruits
     • Apple
     • Orange
    • Car
     • BMW'''
    Text.SplitText.Split Text: ExtractedPDFText StandardDelimiter: Text.StandardDelimiter.NewLine DelimiterTimes: 1 Result=> TextList
    Variables.CreateNewList List=> OutputList
    SET mainPoint TO $'''%''%'''
    SET Counter TO 0
    SET TextListCount TO TextList.Count
    LOOP FOREACH CurrentItem IN TextList
     IF StartsWith(CurrentItem, $'''•''', True) THEN
     IF IsNotEmpty(mainPoint) THEN
     Variables.AddItemToList Item: mainPoint List: OutputList
     END
     SET mainPoint TO CurrentItem.Trimmed
     ELSE
     SET mainPoint TO mainPoint + CurrentItem.Trimmed
     END
     SET Counter TO Counter + 1
     IF Counter < TextListCount THEN
     IF StartsWith(TextList[Counter], $'''•''', True) THEN
     Variables.AddItemToList Item: mainPoint List: OutputList
     SET mainPoint TO $'''%''%'''
     END
     ELSE
     Variables.AddItemToList Item: mainPoint List: OutputList
     END
    END

     

     

    How to copy/paste above code into your Power automate desktop?

    Deenuji_2-1714712903604.gif

     


    Thanks,
    Deenuji Loganathan 👩‍💻
    Automation Evangelist 🤖
    Follow me on LinkedIn 👥

    -------------------------------------------------------------------------------------------------------------
    If I've helped solve your query, kindly mark my response as the solution ✔ and give it a thumbs up!👍 Your feedback supports future seekers 🚀

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Microsoft Kickstarter Events…

Register for Microsoft Kickstarter Events…

Announcing Our 2025 Season 1 Super Users!

A new season of Super Users has arrived, and we are so grateful for the daily…

Announcing Forum Attachment Improvements!

We're excited to announce that attachments for replies in forums and improved…

Leaderboard

#1
WarrenBelz Profile Picture

WarrenBelz 145,580

#2
RandyHayes Profile Picture

RandyHayes 76,287

#3
Pstork1 Profile Picture

Pstork1 64,909

Leaderboard