3

I am trying to understand what is going on so I built a simpler version of my project. I set the X and the Y to be identical and I'm trying to predict Y using X, this should be very simple, but my setup isn't working. Here is my code :

import numpy
import keras
import pandas


# I want to evaluate the model when X and Y are the same
# This should be very easy for the model to evaluate
X = numpy.random.randint(2, size=100000)
Y = X

# Setup the model
model = keras.models.Sequential()

model.add(keras.layers.Dense(1, input_dim=1, init='uniform', activation='relu'   ))
model.add(keras.layers.Dense(1,              init='uniform', activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

hist = model.fit(X, Y, nb_epoch=10, validation_split=.3, batch_size=10, verbose=1)

df             = pandas.DataFrame()
df['loss']     = hist.history['loss']
df['acc']      = hist.history['acc']
df['val_loss'] = hist.history['val_loss']
df['val_acc']  = hist.history['val_acc']
df.index       = df.index + 1
#
print(df)

And this is my output:

        loss       acc  val_loss   val_acc
1   0.693162  0.504357  0.693300  0.496233
2   0.693150  0.503100  0.693250  0.496233
3   0.693157  0.502357  0.693132  0.503767
4   0.693171  0.502214  0.693119  0.503767
5   0.693167  0.502043  0.693121  0.503767
6   0.693129  0.504014  0.693133  0.503767
7   0.693167  0.503243  0.693129  0.503767
8   0.693157  0.502357  0.693181  0.496233
9   0.693180  0.502614  0.693141  0.503767
10  0.693170  0.502300  0.693119  0.503767

I expected that accuracy to go to 100%, but that is not the case. What am I doing wrong?

This is the example that I was following.

user1367204
  • 201
  • 1
  • 3
  • 6
  • i find it useful to use 'ones' as initializer in every try. and 3/10 success in different trials with original code. – Lynn Wong Sep 03 '17 at 03:37

3 Answers3

3

You should think about how the initial values impact the ReL Units. If, for example, you use init='one' for the activation='relu' layer you'll get the desired result (in this simple setup).

oW_
  • 6,347
  • 4
  • 28
  • 47
  • Thanks for your help. What should a person do if his X matrix has like 100 columns, some of which are either 0 or 1, and other columns are anywhere between 0 and 1, like 0.0021. This is the result of one-hot-encoding the some columns and normalizing other columns, and then putting them all together. – user1367204 Feb 01 '17 at 23:14
  • 1
    It is often recommended to set the biases to small positive values to ensure all units are activated initially. If you want to set different weights for different units, you can do so with the "weights" argument to the keras Dense layer (instead of the "init" parameter). – oW_ Feb 01 '17 at 23:24
  • Do you think it would make sense to make all the initial biases the same by setting smaller weights for larger inputs and larger weights for smaller inputs? – user1367204 Feb 01 '17 at 23:27
  • I don't think I understand what you're asking. But I guess the answer is if you normalize your input appropriately this should not be necessary. If you have another question, you can ask it separately. – oW_ Feb 01 '17 at 23:38
  • Ok, thank you for your responses. http://datascience.stackexchange.com/questions/16692/keras-how-to-normalize-dataframe-with-continuous-and-categorical-data – user1367204 Feb 01 '17 at 23:50
1

The problem is the relu unit. It is not a very good choice in such a simple network. There is a good chance that the ReLU starts off "dead" - - if the weight for the neuron in the first layer is negative (a 50/50 chance), then both 0 and 1 inputs will produce a 0 output and no gradient, so the network cannot learn to separate them.

Change to tanh instead will completely fix the problem, and the network will learn the relationship trivially. This as will also work with "leaky" ReLU or any other unit without the simple cutoff of ReLU.

A leaky ReLU version of your model would look like this:

model.add(keras.layers.Dense(1, input_dim=1, init='uniform' ))
model.add(keras.layers.advanced_activations.LeakyReLU(alpha=0.01))
model.add(keras.layers.Dense(1, init='uniform', activation='sigmoid'))

In larger/deeper networks with more complex input data, this disadvantage of ReLU units generally has lower impact and can be worked around more easily.

Neil Slater
  • 28,918
  • 4
  • 80
  • 100
0

The answer is that the code above works as I thought it should, most of the time. Each run of the program is slightly different due to some randomness, and sometimes this randomness means that the program will not find a link between the X and the Y. The way to fix this is to run the program several times over. After running it 10 times, I got a successful result 8/10 times.

user1367204
  • 201
  • 1
  • 3
  • 6