LabelEncoding selected columns in a Dataframe using for loop

Question

I have certain columns in my dataset that are "object" type, so first I found them and now I want to transform from categorical to numerical data. How can I do it in multiple columns using a for loop? I've been struggling with it. I don't want to repeat code.

I've been trying this, but it doesn't work

for column in object_columns:
    df.loc[:,column] = labelencoder.fit_transform(df.loc[:,column])

score 6 · Answer 1 · answered Mar 27 '20 at 10:16

LabelEncoding your features is a bad practice

You should avoid using LabelEncoder to encode your input features! Don't believe me? Here's what scikit-learn's official documentation for LabelEncoder says:

This transformer should be used to encode target values, i.e. y, and not the input X.

That's why it's called LabelEncoding.

Why you shouldn't use `LabelEncoder` to encode features.

This encoder simply makes a mapping of a feature's unique values to integers. For example, let's say we want to encode a feature called shirt color, which represents the color of the shirt someone's wearing. This feature has values ['red', 'green', 'blue', ...]. If you encode these into integers, i.e. [1, 2, 3, ...], you might confuse your model by because you have now given relationships to these values that don't exist in the real world, e.g. red < greed < blue or red + green = blue. This type of feature is called nominal and preferably should be one-hot encoded.

There are features however, where you might want to map their values to integers. These are called ordinal. For example, the feature rating, which has values ['bad', 'good', 'excellent', ...]. By mapping these to integers you actually preserve the relationsips these values hold in the real world, e.g. bad < good < excellent. There is a catch to this however, in order to do the above, you need to map each value with a specific integer (e.g. we can't map 'good' -> 1, 'bad' -> 2, 'excellent' -> 3, because that doesn't preserve the real-world relationship of these values). The computer doesn't know which number to map to each value, though, so if you use LabelEncoder even on ordinal variables, it most likely won't generate the correct encoding.

How to properly encode ordinal features

A more proper way of encoding ordinal variables is manually choosing the mapping. This requires more work and isn't as elegant as a one-liner that encodes all values, but is the only correct way. Let's see how we can do this in pandas.

custom_mapping = {'bad': 1, 'good': 2, 'excellent': 3}


df['rating'] = df['rating'].map(custom_mapping)

Obviously this needs to be done for each ordinal feature.

At this point I think it's clear that I strongly recommend against using LabelEncoder, but if you still want to do it at least do it correctly.

If you still want to use LabelEncoding

While both answers by @ggordon and @Anan Srivastava will do what you want, they don't have much value in practice. The problem isthat by not bounding the fitted LabelEncoder to a variable, you are loosing the mapping from categories to numbers. If you want to predict on future data, you won't know which number to encode each category with.

Expanding upon @ggordon's answer

columns_to_be_encoded = [...]  # list of column names you want encoded

# Instantiate the encoders
encoders = {column: LabelEncoder() for column in columns_to_be_encoded}

for column in columns_to_be_encoded:
    df[column] = encoders[column].fit_transform(df[column])

This way you have a dictionary of fitted encoders so that you can reuse the same encoding if you wish.

score 1 · Answer 2 · answered Mar 27 '20 at 13:45

+1 to @Djib2011: LabelEncoder is for the targets/labels, not for other data columns. Also, I agree that generally you don't want an ordinal encoding, when one-hot is more faithful to the original data.

But, if you do want to ordinal encode, there's a better way: OrdinalEncoder. And if you want it to only apply to certain columns, you can use ColumnTransformer, e.g.

encoder = ColumnTransformer(
              transformers=[('ord_enc', OrdinalEncoder(), object_columns)],
              remainder='passthrough'
              )
df_enc = encoder.fit_transform(df)

ggordon · Answer 3 · 2020-03-27T05:07:02.630

0

You could try

for column in object_columns:
    df[column] = LabelEncoder().fit_transform(df[column])

edited Mar 27 '20 at 05:07

answered Mar 27 '20 at 05:01

ggordon

129
2

score 0 · Answer 4 · answered Mar 27 '20 at 06:33

0

df.apply(LabelEncoder().fit_transform)

This will label encode all the columns

answered Mar 27 '20 at 06:33

Anan Srivastava

51
5

LabelEncoding selected columns in a Dataframe using for loop

4 Answers4

LabelEncoding your features is a bad practice

Why you shouldn't use `LabelEncoder` to encode features.

How to properly encode ordinal features

If you still want to use LabelEncoding

Expanding upon @ggordon's answer

Linked

LabelEncoding selected columns in a Dataframe using for loop

4 Answers4

LabelEncoding your features is a bad practice

Why you shouldn't use LabelEncoder to encode features.

How to properly encode ordinal features

If you still want to use LabelEncoding

Expanding upon @ggordon's answer

Linked

Why you shouldn't use `LabelEncoder` to encode features.