Further update...
I was using LoadData to populate collections from the device cache. Users were then able to hit a sync button to start a sync process to upload new data (if there is any) to the data source (Azure SQL DB) followed by a ClearCollect to replace all data in the collections (so local data gets loaded into the PowerApp on start, and on button press any new/modified/deleted records are pushed to the server, finally all records are pulled back from the server into the collection).
The LoadData command was on AppStart.
If the user hit the sync button very quickly we would get duplicates.
I believe what was happening was that LoadData in the AppStart was running aysonchronously and allowing the user to hit the sync button while still pulling in data from the local cache. If there was no data to upload, the ClearCollect was running while the LoadData command was still running and so we ended up with records being added to the collection both from the LoadData and the ClearCollect, doubling up on the records.
I moved the code out of AppStart and also put a timer in place to give two seconds between the LoadData command being called and the ClearCollect being called. So far we have been unable to reproduce the duplication error.
This is only what I can determine by watching what is going on. If I'm correct this is a bug and I believe the LoadData command needs to block the ClearCollect command until the LoadData has completed rather than allowing the merging of records from the two sources.