8

I am trying to find the working of dataframe.columns.difference() but couldn't find a satisfactory explanation about it. Can anyone explain the working of this method in detail?

ebrahimi
  • 1,307
  • 7
  • 20
  • 40
Parth S.
  • 83
  • 1
  • 1
  • 5

2 Answers2

21

The function dataframe.columns.difference() gives you complement of the values that you provide as argument. It can be used to create a new dataframe from an existing dataframe with exclusion of some columns. Let us look through an example:

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))

In [5]: df
Out[5]: 
          A         B         C         D
0 -1.023134 -0.130241 -0.675639 -0.985182
1  0.270465 -1.099458 -1.114871  3.203371
2 -0.340572  0.913594 -0.387428  0.867702
3 -0.487784  0.465429 -1.344002  1.216967
4  1.433862 -0.172795 -1.656147  0.061359

In [6]: df_new = df[df.columns.difference(['B', 'D'])]

In [7]: df_new
Out[7]: 
          A         C
0 -1.023134 -0.675639
1  0.270465 -1.114871
2 -0.340572 -0.387428
3 -0.487784 -1.344002
4  1.433862 -1.656147

The function returns as output a new list of columns from the existing columns excluding the ones given as arguments. You can also check it:

In [8]: df.columns.difference(['B', 'D'])
Out[8]: Index(['A', 'C'], dtype='object')

I suggest you to take a look at the official documentation here.

bkshi
  • 2,235
  • 2
  • 11
  • 23
0

See below an example using dataframe.columns.difference() on 'employee attrition' dataset. Here we want to separate categorical columns from numerical columns to perform feature engineering.

# Empty list to store columns with categorical data
categorical = []
for col, value in attrition.iteritems():
    if value.dtype == 'object':
        categorical.append(col)

Store the numerical columns in a list

numerical = attrition.columns.difference(categorical)

Notice that the columns.difference() method returns the complement of the passed argument, in this case the numerical columns.

tripleee
  • 127
  • 7