Efficacy of model depends on scaling?

Question

I'm trying to train a model to distinguish between two kinds of time signals, those with RTS noise, and those with only white noise.

I have a simple 1D CNN that works well (92% accuracy) with one training set, but turns into a complete coin flip with another. To the eye these sets are very similar. One was created using real signals, the other with simulated signals. The only real difference to me is the mean magnitude. Is there a reason the model fails so reliably with this second set? Do I need to normalize the data somehow?

from keras.models import Sequential
from keras.layers import Dense, Dropout,Activation
from keras.layers import Embedding
from keras.layers import Conv1D, GlobalAveragePooling1D, MaxPooling1D, 
Flatten, LSTM
import numpy as np

x_test = np.load('C:/Users/Ben WORK ONLY/Desktop/GH repos/RTS ML detect 
beta/x_test.npy')
x_train = np.load('C:/Users/Ben WORK ONLY/Desktop/GH repos/RTS ML detect 
beta/x_train.npy')
y_test = np.load('C:/Users/Ben WORK ONLY/Desktop/GH repos/RTS ML detect 
beta/y_test.npy')
y_train = np.load('C:/Users/Ben WORK ONLY/Desktop/GH repos/RTS ML detect 
beta/y_train.npy')
X_train = np.expand_dims(x_train, axis=2) 
X_test = np.expand_dims(x_test, axis=2) 


model = Sequential()
model.add(Conv1D(32, 12, activation='relu', input_shape=(1500, 1)))
model.add(MaxPooling1D(3))
model.add(Conv1D(64, 12, activation='relu'))

model.add(MaxPooling1D(3))
model.add(Conv1D(128, 12, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
          optimizer='rmsprop',
          metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=16, epochs=5)
score = model.evaluate(X_test, y_test, batch_size=16)


#model.save('C:/Users/Ben WORK ONLY/Desktop/GH repos/RTS ML detect 
beta/CNNlin_model.h5')

The shapes of the two train/test sets:

Leevo · Accepted Answer · 2019-09-30T10:15:35.387

The answer to your question is: yes, efficacy of a model depends on scaling. It's very important to scale your variables in the right range and combine them with the right activation function.

The reason is the following: the power of Neural Networks is due to the fact that they can learn any non-linear regularity of your data. This depends on the use of non-linear activation functions (tanh, ReLU, ELU, you name it). However, most of activation functions tend to behave in a non-linear way only around zero. Take the plot of a ReLU, for example. If you move further away from zero (in both directions) the function becomes very "linear" (i.e. its derivative is a constant).

All common activation functions tend to behave like this: non-linear (i.e. very powerful) in the locality of zero, and very linear (or flat) further away from zero. That is why all data are usually scaled in the [0, 1] or in the [-1, 1] range. In this way, activation functions can give their best, and Neural Networks can learn all the most complex patterns in your data.

When you work with CNNs, for example, most of pixel data come in the [0, 255] range. This is very bad for all activation functions, since between 0 and 255 pretty much any of them will look almost completely linear. In this way, your CNN wouldn't be able to learn much.

score 1 · Answer 2 · answered Oct 31 '18 at 23:44

1

With Neural Networks, it's always good practice to normalize / scale the input. I'll redirect you to this post for more information

answered Oct 31 '18 at 23:44

qmeeus

1,259
1
10
13

Efficacy of model depends on scaling?

2 Answers2