web
You’re offline. This is a read only version of the page.
close
Skip to main content

Notifications

Announcements

Community site session details

Community site session details

Session Id :
Power Platform Community / Forums / Power Automate / Filter duplicates from...
Power Automate
Unanswered

Filter duplicates from CSV file

(0) ShareShare
ReportReport
Posted on by 75

Hi all,


So I'm building a Flow that's basically creating a user list and does things with it afterwards.
I want to programmatically filter out duplicates in a CSV file based on specific columns.

 

My data looks like this:

firstnamelastnameemail
KeithJacksonkeith.jackson@domain1.tld
DaleWestdale.west@domain1.tld
JoanEvansjoan.evans@domain1.tld
DaleWestdale.west@domain2.tld
LauraCaballerolaura.caballero@domain2.tld
KeithJacksonkeith.jackson@domain3.tld

Carrie

Klinecarrie.kline@domain3.tld
(random names generated via fakenamegenerator.com)

 

My data table is being generated by the following:

  • a "create CSV file" for each domain1.tld, domain2.tld and domain3.tld (parallel action)
    Noteworthy: domain1.tld is created with headers, domain2 and domain3 are created without headers
  • a "create file on SharePoint" which then merges the output of the three files in the order "domain1 -> domain2 -> domain3"

_n_MarianLein_1-1657692979253.png

 

What I now want to filter out are the entries for Keith and Dale which are duplicates. It won't matter which email address to keep, and which one to throw away. But I want to remove duplicates from column 1 and 2 only, as column 3 does not contain any duplicates.

 

Any idea what to do?

Categories:
I have the same question (0)
  • MarconettiMarco Profile Picture
    3,812 Super User 2024 Season 1 on at

    Hello @_n_MarianLein ,

    I think you can initialize an Array variable including your list, then in a "Compose" action use this expression:

    union(variables('<your_variable>'),variables('<your_variable>'))

     

    Please see if it works.

     

    If I have answered your question, please mark your post as Solved.
    If you like my response, please give it a Thumbs Up.

    BR,

    Marco

  • _n_MarianLein Profile Picture
    75 on at

    When trying this, I get the error message

    Flow save failed with code 'InvalidTemplate' and message 'The template validation failed: 'The template action 'Init_UserList' at line '1' and column '10332' is not valid: "The template function 'body' is not expected at this location.".'.'

     

    My code is (formatted for better readability):

    union( 
    variables(
    body('Create_CSV_domain1.tld_with_header')
    ),
    variables(
    body('Create_CSV_domain2.tld_without_header')
    ),
    variables(
    body('Create_CSV_domain3.tld_without_header')
    )
    )

    Any idea what went wrong?

  • _n_MarianLein Profile Picture
    75 on at

    OK, after changing to

    union(
    body(
    'Create_CSV_domain1.tld_with_header'),
    body(
    'Create_CSV_domain2.tld_without_header'),
    body(
    'Create_CSV_domain3.tld_without_header')
    )


    I am getting a new error message that suggests your approach, @MarconettiMarco, won't work:

    InvalidTemplate. Unable to process template language expressions in action 'Init_UserList' inputs at line '0' and column '0': 'The template language function 'Union' expects either a comma separated list of arrays or a comma separated list of objects as its parameters. The function was invoked with parameters of type 'String'. Please see https://aka.ms/logicexpressions#union for usage details.'.
    Any ideas?
  • MarconettiMarco Profile Picture
    3,812 Super User 2024 Season 1 on at

    Hello @_n_MarianLein 

    it has to be an array of elements, otherwise it won't work.

    Please see this article:

     

    https://www.flowjoe.io/2020/12/17/remove-duplicates-from-a-power-automate-array-with-one-action/

     

    BR,

    Marco

  • _n_MarianLein Profile Picture
    75 on at

    @MarconettiMarco This still leaves me with the mentioned duplicates... Any idea how to tell "union" to only look into the first two columns?

  • takolota1 Profile Picture
    4,974 Moderator on at

    @_n_MarianLein @MarconettiMarco 

     

    Union won’t remove duplicates based on specific columns.

     

    For that see my post & template on finding & removing duplicates:

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Find-and-Remove-Duplicates/m-p/2191403#M1611

     

    Also if you need to convert the CSVs to JSON, you can try my CSV template:

    https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/CSV-to-Dataset/td-p/1508191

  • _n_MarianLein Profile Picture
    75 on at

    @takolota Thanks! How can I rebuild this to take an array as the starting point? Because I can't seem to get this to work. I would have expected to just use the existing array as input, but then the latter steps just go wrong as well.

  • takolota1 Profile Picture
    4,974 Moderator on at

    At what point in your flow are you trying to insert this?

     

    What does the data look like there / how is it formatted?

     

    The data needs to be in a JSON array for this. I’m guessing that’s how your data is structured somewhere before the CSV steps?

  • _n_MarianLein Profile Picture
    75 on at

    Hi @takolota,
    I've added an "initialize variable" step, which creates an array that contains the data from the three parallel steps before. At this point it looks like this:

     

    [
     {
     "firstname": "Keith",
     "lastname": "Jackson",
     "email": "Keith.Jackson@domain1.tld"
     },
     {
     "firstname": "Dale",
     "lastname": "Boudouhi",
     "email": "Dale.Boudouhi@domain1.tld"
     },
     {
     "firstname": "Joan",
     "lastname": "Evans",
     "email": "Joan.Evans@domain1.tld"
     },
     {
     "firstname": "Dale",
     "lastname": "Butakow",
     "email": "Dale.Butakow@domain2.tld"
     },
     {
     "firstname": "Laura",
     "lastname": "Caballero",
     "email": "Laura.Caballero@domain2.tld"
     },
     {
     "firstname": "Keith",
     "lastname": "Jackson",
     "email": "Keith.Jackson@domain3.tld"
     },
     {
     "firstname": "Carrie",
     "lastname": "Kline",
     "email": "Carrie.Kline@domain3.tld"
     },

     

    ...and so on...

    _n_MarianLein_1-1657777232092.png

    The "union" statement is the following:

     

    union(
    	body('Select_domain1.tld'),
    	body('Select_domain2.tld'),
    	body('Select_domain3.tld')
    )

     

     

  • takolota1 Profile Picture
    4,974 Moderator on at

    @_n_MarianLein 

     

    FYI, you may want to use a Compose instead of a variable, I think a Compose can hold many more items.

     

    But you should be able to insert the JSON array in the From field of the Select action.

     

    Then I don’t know where your data is actually coming from. But instead of the Apply to each delete action, you will need to do something like in this video to remove the duplicate CSV lines.

    https://youtu.be/2NO1Px-Cy9w

     

    In your case, you may need to add an extra Filter array & some actions in the loop in to get the CSV line with all the columns/fields from your original data, where the first & last name match the first & last name item from the list of duplicates.

Under review

Thank you for your reply! To ensure a great experience for everyone, your content is awaiting approval by our Community Managers. Please check back later.

Helpful resources

Quick Links

Forum hierarchy changes are complete!

In our never-ending quest to improve we are simplifying the forum hierarchy…

Ajay Kumar Gannamaneni – Community Spotlight

We are honored to recognize Ajay Kumar Gannamaneni as our Community Spotlight for December…

Leaderboard > Power Automate

#1
Michael E. Gernaey Profile Picture

Michael E. Gernaey 523 Super User 2025 Season 2

#2
Tomac Profile Picture

Tomac 406 Moderator

#3
abm abm Profile Picture

abm abm 245 Most Valuable Professional

Last 30 days Overall leaderboard