1

Dataframe One

    number   email          name
0   1234     [email protected]    Me
1   5678     [email protected]   You
2   9012     [email protected]    Us
3   3456     [email protected]  Them

Dataframe Two

    email         name    open  click
0   [email protected]  You     31    7  
1   [email protected] Them    84    15
2   [email protected]   Me      6     1
3   [email protected]   Us      0     4

I would like to combine the two dfs so I end up with one df only that combines the two:

Desired Output:

    number   email          name   open  click
0   1234     [email protected]    Me     6     1
1   5678     [email protected]   You    31    7
2   9012     [email protected]    Us     0     4
3   3456     [email protected]  Them   84    15

What is confusing me is how to ensure the data in columns 'open' and 'click' from dataframe two matches up correctly when combined with dataframe one as the 'email' and 'name' columns are in a different order in each dataframe.

Ed Jefferies
  • 165
  • 9

1 Answers1

1

If you mean by a different order, that the rows do not match at the indexes of the second dataframe that you gave as an example. Just merge by email.

df3 = pd.merge(df1, df2, on= 'email',  suffixes=('', '_y')).filter(regex='^(?!.*_y)')

Or do a left-side merge on the left of the first dataframe and on the right of the second. And if there are no rows to merge in the second dataframe, then there will be empty rows, and the rows of the first will be all.

df3 = pd.merge(df1, df2, left_on='email', right_on='email', how='left',
               suffixes=('', '_y')).filter(regex='^(?!.*_y)')

Update

If I understand you correctly. You need to merge, but you have different values in the open, click columns. Made two dataframes. So that there are no duplicate columns, I use the prefix '_y' for the second duplicate and filter the duplicate filter(regex='^(?!.*_y)' on it. To save the necessary columns, I renamed them (so that they do not fall into duplicates).

df3 = pd.DataFrame(
    {'number': [1234, 5678, 9012, 3456], 'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
     'name': ['Me', 'You', 'Us', 'Them'], 'open': [6, 31, 0, 84], 'click': [1, 7, 4, 15]})

df4 = pd.DataFrame(
    {'number': [3456, 5678, 9012, 1234], 'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]'],
     'name': ['Them', 'You', 'Us', 'Me'], 'open': [1, 2, 3, 4], 'click': [4, 3, 2, 1]})


df3.rename(columns={'open': ' open_df3', 'click': 'click_df3'}, inplace=True)
df4.rename(columns={'open': ' open_df4', 'click': 'click_df4'}, inplace=True)

df5 = pd.merge(df3, df4, left_on='email', right_on='email', how='left',
               suffixes=('', '_y')).filter(regex='^(?!.*_y)')
inquirer
  • 4,286
  • 2
  • 9
  • 16
  • 1
    Thank you! If I wanted to add another df (call it df4) to df3 afterwards - df4 is in same format to df2 but with different values in the 'Open' and 'Click' columns - how would I merge it so that 'Open' and 'Click' were the only columns that merge? So effectively I would be taking df3 + df4 and wanting to end up with columns that are: Number/Email/Name/Open/Click/Open/Click ? – Ed Jefferies Apr 19 '23 at 13:44
  • @EdJefferies see updates. If it's not then show df4 and the expected result. – inquirer Apr 19 '23 at 14:32