Hi all,
So I'm building a Flow that's basically creating a user list and does things with it afterwards.
I want to programmatically filter out duplicates in a CSV file based on specific columns.
My data looks like this:
firstname | lastname | |
Keith | Jackson | keith.jackson@domain1.tld |
Dale | West | dale.west@domain1.tld |
Joan | Evans | joan.evans@domain1.tld |
Dale | West | dale.west@domain2.tld |
Laura | Caballero | laura.caballero@domain2.tld |
Keith | Jackson | keith.jackson@domain3.tld |
Carrie | Kline | carrie.kline@domain3.tld |
(random names generated via fakenamegenerator.com)
My data table is being generated by the following:
What I now want to filter out are the entries for Keith and Dale which are duplicates. It won't matter which email address to keep, and which one to throw away. But I want to remove duplicates from column 1 and 2 only, as column 3 does not contain any duplicates.
Any idea what to do?
Awesome, this worked!
Hi @_n_MarianLein ,
I've made a test for your reference:
1\My flow
1)
union(body('Select'),body('Select'))
2)
@and(
equals(item()['firstname'],items('Apply_to_each')['firstname']),
equals(item()['lastname'],items('Apply_to_each')['lastname'])
)
3)
first(body('Filter_array'))
The Result:
Best Regards,
Bof
FYI, you may want to use a Compose instead of a variable, I think a Compose can hold many more items.
But you should be able to insert the JSON array in the From field of the Select action.
Then I don’t know where your data is actually coming from. But instead of the Apply to each delete action, you will need to do something like in this video to remove the duplicate CSV lines.
In your case, you may need to add an extra Filter array & some actions in the loop in to get the CSV line with all the columns/fields from your original data, where the first & last name match the first & last name item from the list of duplicates.
Hi @takolota,
I've added an "initialize variable" step, which creates an array that contains the data from the three parallel steps before. At this point it looks like this:
[
{
"firstname": "Keith",
"lastname": "Jackson",
"email": "Keith.Jackson@domain1.tld"
},
{
"firstname": "Dale",
"lastname": "Boudouhi",
"email": "Dale.Boudouhi@domain1.tld"
},
{
"firstname": "Joan",
"lastname": "Evans",
"email": "Joan.Evans@domain1.tld"
},
{
"firstname": "Dale",
"lastname": "Butakow",
"email": "Dale.Butakow@domain2.tld"
},
{
"firstname": "Laura",
"lastname": "Caballero",
"email": "Laura.Caballero@domain2.tld"
},
{
"firstname": "Keith",
"lastname": "Jackson",
"email": "Keith.Jackson@domain3.tld"
},
{
"firstname": "Carrie",
"lastname": "Kline",
"email": "Carrie.Kline@domain3.tld"
},
...and so on...
The "union" statement is the following:
union(
body('Select_domain1.tld'),
body('Select_domain2.tld'),
body('Select_domain3.tld')
)
At what point in your flow are you trying to insert this?
What does the data look like there / how is it formatted?
The data needs to be in a JSON array for this. I’m guessing that’s how your data is structured somewhere before the CSV steps?
@takolota Thanks! How can I rebuild this to take an array as the starting point? Because I can't seem to get this to work. I would have expected to just use the existing array as input, but then the latter steps just go wrong as well.
@_n_MarianLein @MarconettiMarco
Union won’t remove duplicates based on specific columns.
For that see my post & template on finding & removing duplicates:
Also if you need to convert the CSVs to JSON, you can try my CSV template:
https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/CSV-to-Dataset/td-p/1508191
@MarconettiMarco This still leaves me with the mentioned duplicates... Any idea how to tell "union" to only look into the first two columns?
Hello @_n_MarianLein
it has to be an array of elements, otherwise it won't work.
Please see this article:
https://www.flowjoe.io/2020/12/17/remove-duplicates-from-a-power-automate-array-with-one-action/
BR,
Marco
OK, after changing to
union(
body('Create_CSV_domain1.tld_with_header'),
body('Create_CSV_domain2.tld_without_header'),
body('Create_CSV_domain3.tld_without_header')
)
I am getting a new error message that suggests your approach, @MarconettiMarco, won't work:
InvalidTemplate. Unable to process template language expressions in action 'Init_UserList' inputs at line '0' and column '0': 'The template language function 'Union' expects either a comma separated list of arrays or a comma separated list of objects as its parameters. The function was invoked with parameters of type 'String'. Please see https://aka.ms/logicexpressions#union for usage details.'.
WarrenBelz
146,745
Most Valuable Professional
RandyHayes
76,287
Super User 2024 Season 1
Pstork1
66,091
Most Valuable Professional