Keras: How to normalize dataframe with continuous and categorical data?

Question

I have a dataframe with about 50 columns. The columns are either categorical or continuous data. The continuous data can be between 0.000001-1.00000 or they can be between 500,000-5,000,000. The categorical data is usually a name, for example a store name.

How can I normalize this data so that I can feed it into a dense layer of a Sequential model?

The Y values are either 0 or 1, so it is a binary classification problem. I am currently normalizing all of the continuous data to be 0-1 and one-hot encoding all of the categorical data, so that if I have a column with 5 names it in, I will get a matrix with 5 columns filled with 0's and 1's. Then I join all of the continuous and categorical data and feed it into a Dense layer with init='uniform' and activation='relu'.

Is this the standard way of doing things?

score 6 · Answer 1 · answered Feb 04 '17 at 09:50

6

Yes it does, you're doing well!

In most cases, categorical features(columns) should be one-hot encoded. However, continuous features might be a little complicated.

There are two common ways to preprocess continuous feature:

scaling features to range [0, 1] (as you have done)
removing the mean and scaling to unit variance(make the feature has zero mean and 1 standard variance)

In my practice, I take these two ways depending on my dataset.

answered Feb 04 '17 at 09:50

Icyblade

4,326
1
24
34

Isn't it better to scale features to [-1,1] instead of [0,1]? If biases are initialized randomly with mean 0 then we want features to have a mean of 0 – Hugh Feb 04 '17 at 10:31
2

In my practice, [0, 1] is always better than [-1, 1]. But [-1, 1] might be better for some scenario I haven't met before. By the way, there are reports mentioning that we can increase the scaling range(let's say [-5, 5]) and increase learning rate synchronously(1e-3 to 1e-2 for instance). – Icyblade Feb 04 '17 at 10:39

Keras: How to normalize dataframe with continuous and categorical data?

1 Answers1

Linked