1

I have two pandas dataframe : df1 and df2. df1 contains multiple emails of the customer and I want to match it with df2 to see how many customer did a test with the company by looking at if any of the emails is present in df1 is in df2.

I tried to do .str.split(";", expand=True) to split the email ids and use pd.merge to join on multiple email ids but it's too lengthy. Posting it here to find a better solution.

df1

myid      emails                                                                      price

1001     [email protected];[email protected]                                        1
1002     [email protected]                                                           2
1003     [email protected];[email protected];[email protected];[email protected]          8           
1004     [email protected];[email protected]                                                  7
1005     [email protected]                                                            9

df2

tr_id      latest_em                                     test

101     [email protected]; [email protected]                  12                            
102     [email protected]                                     13            
103     [email protected]                                16
104     [email protected]                              18                               
105     [email protected];[email protected]                     10                                           

Expected Output :

myid      emails                      price   tr_id   latest_em                      test
1004     [email protected];[email protected]  7      102     [email protected]                   13
1004     [email protected];[email protected]  7      105     [email protected];[email protected]   10
1005     [email protected]            9      103     [email protected]              16
Ash
  • 319
  • 1
  • 11
  • Why are there multiple email addresses in a single field? Can you break each email address out to single rows? – itprorh66 Aug 19 '22 at 12:23
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Aug 19 '22 at 12:24

1 Answers1

0

You can split, explode, then merge:

(df1
 .assign(key=df1['emails'].str.split(';\s*')).explode('key')
 .merge(df2.assign(key=df2['latest_em'].str.split(';\s*')).explode('key'),
        on='key'
       )
 .drop(columns='key')
)

output:

   myid                        emails  price  tr_id                      latest_em  test
0  1004  [email protected];[email protected]      7    102                  [email protected]    13
1  1004  [email protected];[email protected]      7    105  [email protected];[email protected]    10
2  1005            [email protected]      9    103             [email protected]    16
mozway
  • 194,879
  • 13
  • 39
  • 75