This will destroy the duplicates in waves, selecting only a single duplicate per map_id
, on each pass. The loop will automatically finish when no more duplicates exist.
loop do
duplicates = Summary.select("MAX(id) as id, map_id").group(:map_id).having("count(*) > 1")
break if duplicates.length == 0
duplicates.destroy_all
end
If the database looks like this:
| id | map_id |
| 1 | 235 |
| 2 | 299 |
| 3 | 324 |
| 4 | 235 |
| 5 | 235 |
| 6 | 299 |
| 7 | 235 |
| 8 | 324 |
| 9 | 299 |
In the first wave, these records would be returned and destroyed:
| id | map_id |
| 7 | 235 |
| 8 | 324 |
| 9 | 299 |
In the second wave, this record would be returned and destroyed:
| id | map_id |
| 5 | 235 |
| 6 | 299 |
The third wave would return and destroy this record:
| id | map_id |
| 4 | 235 |
The fourth wave would complete the process. Unless there are numerous duplicates for a given map_id
, it's likely that this process will finish in single-digit loop iterations.
Given the approach, only duplicates will ever be returned, and only the newer duplicates will be removed. To remove older duplicates, instead, the query can be changed to this:
duplicates = Summary.select("MIN(id) as id, map_id").group(:map_id).having("count(*) > 1")
In that case, wave 1 would return and destroy:
| id | map_id |
| 1 | 235 |
| 2 | 299 |
| 3 | 324 |
Wave 2 would return and destroy:
| id | map_id |
| 4 | 235 |
| 6 | 299 |
Wave 3 would return and destroy:
| id | map_id |
| 5 | 235 |
Wave 4 would complete the process.