Skip to main content

Notifications

Community site session details

Community site session details

Session Id : l3Doxb02s/HRdLUhI0QnMk
Power Automate - Building Flows
Answered

Filter duplicates from CSV file

Like (0) ShareShare
ReportReport
Posted on 13 Jul 2022 06:19:56 by 75

Hi all,


So I'm building a Flow that's basically creating a user list and does things with it afterwards.
I want to programmatically filter out duplicates in a CSV file based on specific columns.

 

My data looks like this:

firstnamelastnameemail
KeithJacksonkeith.jackson@domain1.tld
DaleWestdale.west@domain1.tld
JoanEvansjoan.evans@domain1.tld
DaleWestdale.west@domain2.tld
LauraCaballerolaura.caballero@domain2.tld
KeithJacksonkeith.jackson@domain3.tld

Carrie

Klinecarrie.kline@domain3.tld
(random names generated via fakenamegenerator.com)

 

My data table is being generated by the following:

  • a "create CSV file" for each domain1.tld, domain2.tld and domain3.tld (parallel action)
    Noteworthy: domain1.tld is created with headers, domain2 and domain3 are created without headers
  • a "create file on SharePoint" which then merges the output of the three files in the order "domain1 -> domain2 -> domain3"

_n_MarianLein_1-1657692979253.png

 

What I now want to filter out are the entries for Keith and Dale which are duplicates. It won't matter which email address to keep, and which one to throw away. But I want to remove duplicates from column 1 and 2 only, as column 3 does not contain any duplicates.

 

Any idea what to do?

  • _n_MarianLein Profile Picture
    75 on 18 Jul 2022 at 09:27:37
    Re: Filter duplicates from CSV file

    Awesome, this worked!

  • Verified answer
    v-bofeng-msft Profile Picture
    on 15 Jul 2022 at 06:35:32
    Re: Filter duplicates from CSV file

    Hi @_n_MarianLein ,

     

    I've made a test for your reference:

    1\My flow

    vbofengmsft_0-1657866849937.png

    vbofengmsft_1-1657866863960.png

     

    vbofengmsft_2-1657866876385.png

     

    1)

    union(body('Select'),body('Select'))

     

    2)

    @and(
    equals(item()['firstname'],items('Apply_to_each')['firstname']),
    equals(item()['lastname'],items('Apply_to_each')['lastname'])
    )

     

    3)

    first(body('Filter_array'))

     

    The Result:

    vbofengmsft_3-1657866924415.jpeg

     

    Best Regards,

    Bof

     

  • takolota1 Profile Picture
    4,859 Super User 2025 Season 1 on 14 Jul 2022 at 09:00:55
    Re: Filter duplicates from CSV file

    @_n_MarianLein 

     

    FYI, you may want to use a Compose instead of a variable, I think a Compose can hold many more items.

     

    But you should be able to insert the JSON array in the From field of the Select action.

     

    Then I don’t know where your data is actually coming from. But instead of the Apply to each delete action, you will need to do something like in this video to remove the duplicate CSV lines.

    https://youtu.be/2NO1Px-Cy9w

     

    In your case, you may need to add an extra Filter array & some actions in the loop in to get the CSV line with all the columns/fields from your original data, where the first & last name match the first & last name item from the list of duplicates.

  • _n_MarianLein Profile Picture
    75 on 14 Jul 2022 at 05:41:03
    Re: Filter duplicates from CSV file

    Hi @takolota,
    I've added an "initialize variable" step, which creates an array that contains the data from the three parallel steps before. At this point it looks like this:

     

    [
     {
     "firstname": "Keith",
     "lastname": "Jackson",
     "email": "Keith.Jackson@domain1.tld"
     },
     {
     "firstname": "Dale",
     "lastname": "Boudouhi",
     "email": "Dale.Boudouhi@domain1.tld"
     },
     {
     "firstname": "Joan",
     "lastname": "Evans",
     "email": "Joan.Evans@domain1.tld"
     },
     {
     "firstname": "Dale",
     "lastname": "Butakow",
     "email": "Dale.Butakow@domain2.tld"
     },
     {
     "firstname": "Laura",
     "lastname": "Caballero",
     "email": "Laura.Caballero@domain2.tld"
     },
     {
     "firstname": "Keith",
     "lastname": "Jackson",
     "email": "Keith.Jackson@domain3.tld"
     },
     {
     "firstname": "Carrie",
     "lastname": "Kline",
     "email": "Carrie.Kline@domain3.tld"
     },

     

    ...and so on...

    _n_MarianLein_1-1657777232092.png

    The "union" statement is the following:

     

    union(
    	body('Select_domain1.tld'),
    	body('Select_domain2.tld'),
    	body('Select_domain3.tld')
    )

     

     

  • takolota1 Profile Picture
    4,859 Super User 2025 Season 1 on 13 Jul 2022 at 13:35:51
    Re: Filter duplicates from CSV file

    At what point in your flow are you trying to insert this?

     

    What does the data look like there / how is it formatted?

     

    The data needs to be in a JSON array for this. I’m guessing that’s how your data is structured somewhere before the CSV steps?

  • _n_MarianLein Profile Picture
    75 on 13 Jul 2022 at 13:31:56
    Re: Filter duplicates from CSV file

    @takolota Thanks! How can I rebuild this to take an array as the starting point? Because I can't seem to get this to work. I would have expected to just use the existing array as input, but then the latter steps just go wrong as well.

  • takolota1 Profile Picture
    4,859 Super User 2025 Season 1 on 13 Jul 2022 at 12:48:12
    Re: Filter duplicates from CSV file

    @_n_MarianLein @MarconettiMarco 

     

    Union won’t remove duplicates based on specific columns.

     

    For that see my post & template on finding & removing duplicates:

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Find-and-Remove-Duplicates/m-p/2191403#M1611

     

    Also if you need to convert the CSVs to JSON, you can try my CSV template:

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/CSV-to-Dataset/td-p/1508191

  • _n_MarianLein Profile Picture
    75 on 13 Jul 2022 at 09:14:12
    Re: Filter duplicates from CSV file

    @MarconettiMarco This still leaves me with the mentioned duplicates... Any idea how to tell "union" to only look into the first two columns?

  • MarconettiMarco Profile Picture
    3,812 Super User 2024 Season 1 on 13 Jul 2022 at 06:59:45
    Re: Filter duplicates from CSV file

    Hello @_n_MarianLein 

    it has to be an array of elements, otherwise it won't work.

    Please see this article:

     

    https://www.flowjoe.io/2020/12/17/remove-duplicates-from-a-power-automate-array-with-one-action/

     

    BR,

    Marco

  • _n_MarianLein Profile Picture
    75 on 13 Jul 2022 at 06:48:47
    Re: Filter duplicates from CSV file

    OK, after changing to

    union(
    body(
    'Create_CSV_domain1.tld_with_header'),
    body(
    'Create_CSV_domain2.tld_without_header'),
    body(
    'Create_CSV_domain3.tld_without_header')
    )


    I am getting a new error message that suggests your approach, @MarconettiMarco, won't work:

    InvalidTemplate. Unable to process template language expressions in action 'Init_UserList' inputs at line '0' and column '0': 'The template language function 'Union' expects either a comma separated list of arrays or a comma separated list of objects as its parameters. The function was invoked with parameters of type 'String'. Please see https://aka.ms/logicexpressions#union for usage details.'.
    Any ideas?

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Understanding Microsoft Agents - Introductory Session

Confused about how agents work across the Microsoft ecosystem? Register today!

Warren Belz – Community Spotlight

We are honored to recognize Warren Belz as our May 2025 Community…

Congratulations to the April Top 10 Community Stars!

Thanks for all your good work in the Community!

Leaderboard

#1
WarrenBelz Profile Picture

WarrenBelz 146,745 Most Valuable Professional

#2
RandyHayes Profile Picture

RandyHayes 76,287 Super User 2024 Season 1

#3
Pstork1 Profile Picture

Pstork1 66,091 Most Valuable Professional

Leaderboard
Loading started
Loading complete