Python3
I need help creating a regex to extract names and emails from a forwarded email body, which will look similar to this always (real emails replaced by dummy emails):
> Begin forwarded message:
> Date: December 20, 2013 at 11:32:39 AM GMT-3
> Subject: My dummy subject
> From: Charlie Brown <[email protected]>
> To: [email protected], George Washington <[email protected]>, =
[email protected], [email protected], Juan =
<[email protected]>, Alan <[email protected]>, Alec <[email protected]>, =
Alejandro <[email protected]>, Alex <[email protected]>, Andrea =
<[email protected]>, Andrea <[email protected]>, Andres =
<[email protected]>, Andres <[email protected]>
> Hi,
> Please reply ASAP with your RSVP
> Bye
My first step was extracting all emails to a list with a custom function that I pass the whole email body to, like so:
def extract_emails(block_of_text):
t = r'\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b'
return re.findall(t, block_of_text)
A couple of days ago I asked a question about extracting names using regex to help me build the function to extract all the names. My idea was to join both later on. I accepted an answer that performed what I asked, and came up with this other function:
def extract_names(block_of_text):
p = r'[:,] ([\w ]+) \<'
return re.findall(p, block_of_text)
My problem now was to make the extracted names match the extracted emails, mainly because sometimes there are less names than emails. So I thought, I could better try to build another regex to extract both names and emails,
This is my failed attempt to build such a regex.
[:,]([\w \<]+)([\w.-]+@[\w.-]+\.[\w.-]+)
Can anyone help and propose a nice, clean regex that grabs both name and email, to a list or dictionary of tuples? Thanks
EDIT: The expected output of the regex in Python would be a list like this:
[(Charlie Brown', '[email protected]'),('','[email protected]'),('George Washington', '[email protected]'),('','[email protected]'),('','[email protected]'),('Juan','[email protected]',('Alan', '[email protected]'), ('Alec', '[email protected]'),('Alejandro','[email protected]'),('Alex', '[email protected]'),('Andrea','[email protected]'),('Andrea','[email protected]',('Andres','[email protected]'),('Andres','[email protected]')]