I have two dataframes that have the following columns : Phone, Email and Name
Dataframe1 has 20k in length, whereas dataframe2 has 1k length. I would like to fill the blanks in the Phone column in dataframe1 with the phone numbers in dataframe2 using the email as a match index between the two dataframes.
What is the best way to do this? I have tried combine_frist() and Merge() but combine_first() returns the value in the same row rather than the value that matches the email address. Merge() resulted in the same thing.
Am I wrong to think I need to set email as an index and then map phones to that index? I feel like this is correct but I simply do not know how to do this. Any help is appreciated! Thank you :)
Example :
In [1]
import pandas as pd
df1 = pd.DataFrame({'Phone': [1, NaN, 3, 4, 5, NaN, 7],
'Name': ['Bob', 'Jon', 'Iris', 'Jacob','Donald','Beatrice','Jane'],
'Email': ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'})
df2 = pd.DataFrame({'Phone': [2, 1, 3, 5],
'Name': ['Jon', 'Bob', 'Donald'],
'Email': ['[email protected]','[email protected]', '[email protected]'})
In [2]: df1
Out [2]:
Phone Name Email
1 Bob [email protected]
NaN Jon [email protected]
3 Iris [email protected]
4 Jac [email protected]
5 Don [email protected]
NaN Bea [email protected]
7 Jane [email protected]
x 20000 len
In [3]: df2
Out [3]:
Phone Name Email
2 Jon [email protected]
1 Bob [email protected]
6 Bea [email protected]
5 Don [email protected]
x 1100 len
What I've tried
In [4]: df3 = pd.merge(df1,df2, on="Email", how="left")
Out [4]:
Phone Name Email
1 Bob [email protected]
1 Jon [email protected]
3 Iris [email protected]
4 Jac [email protected]
5 Don [email protected]
NaN Bea [email protected]
7 Jane [email protected]
In [5]: df3 = df1.combine_first(df2)
Out [5]:
Phone Name Email
1 Bob [email protected]
1 Jon [email protected]
3 Iris [email protected]
4 Jac [email protected]
5 Don [email protected]
NaN Bea [email protected]
7 Jane [email protected]
What I would like it to look like:
In [6]: df3
Out [6]
1 Bob [email protected]
2 Jon [email protected]
3 Iris [email protected]
4 Jac [email protected]
5 Don [email protected]
6 Bea [email protected]
7 Jane [email protected]