3

I have two data sets

1 set it has a column with a list of email address:

DF1

Email
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

2nd csv Dataframe2

Email
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

import pandas as pd

SansList = r'C:\\Sans compare\\SansList.csv'
AllUsers = r'C:\\Sans compare\\AllUser.csv'

## print Name column only and turn into data sets from CSV ##
df1 = pd.read_csv(SansList, usecols=[0])

df2 = pd.read_csv(AllUsers, usecols=[2])

**print(df1['Email'].isin(df2)==False)**

I want the results to be,

Dataframe3
[email protected]
[email protected]
[email protected]

Not quite sure how to fix my dataset... :(

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
davidjbeiler
  • 133
  • 2
  • 15

2 Answers2

1

Option 1
isin

df2[~df2.Email.isin(df1.Email)]

          Email
4  [email protected]
5  [email protected]
6  [email protected]

Option 2
query

df2.query('Email not in @df1.Email')

          Email
4  [email protected]
5  [email protected]
6  [email protected]

Option 3
merge

pd.DataFrame.merge with indicator=True, enables you to see which dataframe the row came from. We can then filter on it.

df2.merge(
    df1, 'outer', indicator=True
).query('_merge == "left_only"').drop('_merge', 1)

           Email
20  [email protected]
21  [email protected]
22  [email protected]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
1

Numpy solution:

In [311]: df2[~np.in1d(df2.Email, df1.Email)]
Out[311]:
          Email
4  [email protected]
5  [email protected]
6  [email protected]
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419