Find and Remove Duplicates From Any Datasource
Need to identify and/or remove records in a datasource based on duplicate values in one or more columns?
This template uses a SharePoint list to demonstrate, but this method ultimately pulls in the data from any source as a standard JSON array and processes it to identify all the duplicates based on the selected columns. From there the duplicates can be easily deleted.
So it will work for SharePoint, Excel, SQL, Dataverse, API GET/DELETE calls to external datasets, and any other dataset used in Power Automate.
Example...
Use any get data action to pull the data into the flow. Use built in action functions like Order By or later use a Power Automate Sort( ) expression and Power Automate Reverse expression to format your data in ascending or descending order by your chosen column.
Sorted by date, Ascending will keep the oldest & delete the newest, while Descending will keep the newest & delete the oldest.
Sorted by number, Ascending will keep the smallest & delete the largest, while Descending will keep the largest & delete the smallest.
Move to the Select DuplicateCheckFields action and fill the column name & dynamic content value for each data source column you want to include in the check for duplicates. Only records where all listed columns match will count as duplicates.
The middle actions will process the records, identify the duplicate records, & return a JSON array of only duplicate records.
Put whatever action you will use to delete the records inside the loop and use items('Apply_to_each')?['InsertColumnNameHere'] to reference fields within the apply to each loop. For example, this SharePoint Delete item action just required the ID primary key field for each item to delete, so I used items('Apply_to_each')?['ID'] in the expression editor on the Id input.
Starting Demo SharePoint List
Final Result on The Demo SharePoint List
Rows with IDs 14, 16, & 19 were removed as both the Title & Column 2 fields on those rows matched other rows in the dataset. Notice rows with IDs 15 & 20 were not removed. Although they contain duplicate Title values, the Column 2 values were still unique for those rows.
Thanks for any feedback, & please subscribe to my YouTube channel (https://youtube.com/@tylerkolota?si=uEGKko1U8D29CJ86)
V1.1 added a Reverse( ) expression to the processing actions to make the records kept by ascending or descending sort more intuitive for the user-inputs.
For anyone having trouble with the legacy import method, here is a post on an alternate import method: Re: Find and Remove Duplicates - Page 3 - Power Platform Community (microsoft.com)
Or https://powerusers.microsoft.com/t5/Power-Automate-Cookbook/Find-and-Remove-Duplicates/m-p/2368263/highlight/true#M2090
Solution Zip Download Link: https://drive.google.com/file/d/140nGv9rKxAJz2z-vfQCUZFm4tjTIdB-M/view?usp=sharing