your regex e-mail is not working at all: emails = re.findall(r'.[@]', String)
matches anychar then @
.
I would try a different approach: match the sentences and extract name,e-mails couples with the following empiric assumptions (if your text changes too much, that would break the logic)
- all names are followed by
's"
and is
somewhere (using non-greedy .*?
to match all that is in between
\w
matches any alphanum char (or underscore), and only one dot for domain (else it matches the final dot of the sentence)
code:
import re
String = "'Jessica's email is [email protected], and Daniel's email is [email protected]. Edward's is [email protected], and his grandfather, Oscar's, is [email protected].'"
print(re.findall("(\w+)'s.*? is (\w+@\w+\.\w+)",String))
result:
[('Jessica', '[email protected]'), ('Daniel', '[email protected]'), ('Edward', '[email protected]'), ('Oscar', '[email protected]')]
converting to dict
would even give you a dictionary name => address:
{'Oscar': '[email protected]', 'Jessica': '[email protected]', 'Daniel': '[email protected]', 'Edward': '[email protected]'}
The general case needs more chars (not sure I'm exhaustive):
String = "'Jessica's email is [email protected], and Daniel's email is [email protected]. Edward's is [email protected], and his grandfather, Oscar's, is [email protected].'"
print(re.findall("(\w+)'s.*? is ([\w\-.]+@[\w\-.]+\.[\w\-]+)",String))
result:
[('Jessica', '[email protected]'), ('Daniel', '[email protected]'), ('Edward', '[email protected]'), ('Oscar', '[email protected]')]