0

I have multi-lines of Emails and I need to do a couple of things:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected] 
... etc
  1. I need to put them in one list: ['[email protected]','[email protected]','[email protected]'..etc]
  2. need to figure out which Email is the most repetitive within that list, That's how I startred my code and I hope I could complete it from where I ended my code!

    fname = raw_input("Enter file name: ")
    if len(fname) < 1 : fname = "mbox-short.txt"
    fh = open(fname)
    lines = []
    count = 0 # For next step
    for line in fh:
        line = line.rstrip()
        if not line.startswith("From ") : continue
        x = line.split()
        emails = x[1]
     #print y
    
    maxapperence = 0 
    famous = None
    for mail in emails:
        count = emails.count(mail)
        if count > maxapperence:
            famous = mail
    print famous
    
    apparence = dict()
    for mail in set(emails):
        apparence[mail] = emails.count(mail)
    print apparence]
    

    out put :

    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    
Khalida
  • 1
  • 1

2 Answers2

1

If you've got a file that only contains email addresses:

import collections
filename = ''
c = collections.Counter(map(str.strip, open(filename).readlines()))
print(c.most_common(10)) # dumb example of possible output format
chelmertz
  • 20,399
  • 5
  • 40
  • 46
  • Sorry maybe I should have said that from the beginning, but actually I've extracted those email addresses from a txt file! – Khalida Jul 09 '15 at 16:48
  • @Khalida If you already have email adresses in a list, you can replace `map(....)` in my example with your list: `l = ['[email protected]', '[email protected]']; c = collections.Counter(l)` – chelmertz Jul 09 '15 at 16:54
  • No the emails where not in a list but just in plain text file. – Khalida Jul 09 '15 at 17:26
  • Then my answer should be good enough I think? Just set `filename` – chelmertz Jul 09 '15 at 18:04
  • I don't need to go back to file name, I've already extracted the emails and they print like above , Now I need to put them in one list and loop through the list to find the most in the list and how many times exist there? – Khalida Jul 11 '15 at 13:36
  • @Khalida: That code seems invalid, `emails` is a string that gets overwritten for each iteration, and in the second loop you treat it as a list. If it were a list, you could just `c = collections.Counter(emails)`. I'm sorry but I'm not sure I can help you any more if you don't get what I'm saying over and over again. – chelmertz Jul 13 '15 at 07:39
  • Never mind, I solved the problem by myself , Thanks to all . – Khalida Jul 14 '15 at 23:14
0

First example

emails = """[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]""".split("\n")

maxapperence = 0 
famous = None
for mail in set(emails):
    count = emails.count(mail)
    if count > maxapperence:
        famous = mail
        maxapperence = count
print famous, maxapperence

You can also store all mail apparence

apparence = dict()
for mail in set(emails):
    apparence[mail] = emails.count(mail)
print apparence
wilfriedroset
  • 217
  • 1
  • 8