0

I have a long list of email addresses (8000) sorted alphabetically but there are duplicates.

With python - how can I count the number of recurrences of a unique email (count duplicates) and while maintaining one instance of the email delete the recurring duplicate emails from the list.

example list:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

results:

[email protected] (1)
[email protected] (3)
[email protected] (2)

I've searched online, but only find methods of removing duplicate numbers, dictionaries and tuples.

Guage
  • 85
  • 10

2 Answers2

1

Use itertools.groupby() for alphabetical sorted order:-

 >>>l = list of emails 
 >>>[(key, sum(1 for _ in group)) for key, group in groupby(sorted(l))]

[('[email protected]', 1), ('[email protected]', 3), ('[email protected]', 2)]

Use collections.Counter to count the items that are duplicate.

>>>from collections import Counter
>>>d = Counter(['[email protected]',
>>>'[email protected]',
>>>'[email protected]',
>>>'[email protected]',
>>>'[email protected]',
>>>'[email protected]'])
>>>d 

Output:-

Counter({'[email protected]': 3, '[email protected]': 2, '[email protected]': 1})

It is similar to(or in simplest way)

d = {}
for i in l: # l = list or all emails.
    if i in d:
        d[i] += 1
    else:
        d[i] = 1

or use dict.get for i in l: d[i] = d.get(i, 0) + 1

Vishnu Upadhyay
  • 5,043
  • 1
  • 13
  • 24
0

you can use collections.Counter:

>>> from collections import Counter
>>> my_email
['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]\n']
>>> Counter(my_email)
Counter({'[email protected]': 3, '[email protected]': 2, '[email protected]': 1})

if you want in order:

>>> sorted(Counter(my_email).items())
[('[email protected]', 1), ('[email protected]', 3), ('[email protected]', 2)]

you can print like this:

>>> for x in sorted(Counter(my_email).items()):
...     print x[0],x[1]   # if you sung python 3 print(x[0],x[1])
... 
[email protected] 1
[email protected] 3
[email protected] 2
Hackaholic
  • 19,069
  • 5
  • 54
  • 72