8

I'm trying to predict movie genres using a neural network. I initially considered using a softmax layer as my output layer, but since a movie can have multiple genre labels, how should my output be?

Also, how would I have to format my data to make it work with Keras?

Green Falcon
  • 14,058
  • 9
  • 57
  • 98
Lakshay Sharma
  • 181
  • 1
  • 1
  • 3

3 Answers3

4

how would I have to format my data to make it work with Keras?

Your training labels in the output layer should be a binary vector that is 1 for class which is present and 0 for class which is not. For example, let us assume you have 3 classes of genres - comedy, romantic and horror. There are many ways to make it and Scikit-learn has a method which makes it very easy which I show below.

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> y = mlb.fit_transform([[0,2],[1]])
array([[1, 0, 1],
   [0, 1, 0]])

I initially considered using a softmax layer as my output layer, but since a movie can have multiple genre labels, how should my output be?

This is a simple Keras example I suggest.

>>> from keras.models import Sequential
>>> from keras.layers import Dense, Activation

>>> model = Sequential([
    Dense(32, input_dim=784),
    Activation('relu'),
    Dense(10),
    Activation('sigmoid'),
    ])
>>> model.compile(optimizer='rmsprop', loss='binary_crossentropy')
>>> model.fit(X_train, y_train)

Refer this for more info. I used sigmoid because it is better for multilabel classification.

Hima Varsha
  • 2,316
  • 14
  • 34
  • 1
    Categorical crossentropy doesn't make sense, it should be a sum of binary crossentropies because it is not a probability distribution over the labels but individual probabilities over every label individually – Jan van der Vegt Jan 10 '17 at 09:58
  • One-hot encoding is also not the correct word, this means that you would have exactly one 1 and the rest zeros, while in this case you have a binary vector where each entry is either a zero or a one. – Jan van der Vegt Jan 10 '17 at 09:59
1

In the case where you can have multiple labels individually from each other you can use a sigmoid activation for every class at the output layer and use the sum of normal binary crossentropy as the loss function. You would just use a vector with binary numbers as the target, for each label a 1 if it includes the label and a 0 if not.

Jan van der Vegt
  • 9,368
  • 35
  • 52
0

You have label of classes which are not mutually exclusive which means as the label of each sample, your data won't be in one-hot-encoding format. Each output vector may have multiple ones. In such occasions you shouldn't use soft-max as the output layer. You have to use Sigmoid activation function for each neuron in the last layer. Suppose you have ten labels and for a typical movie each of them may be activated. So, in the last layer use a dense layer with ten Sigmoid activation function. You can see here which may help you.

And as a side answer, each cost function that you are going to use should be categorical, because you have different categories. I respect so much for the other answer and I thank him/her for the answer, but I guess you should use categorical cost function otherwise the code won't work because your output matrix actually is a matrix of samples and not a vector of samples.

Green Falcon
  • 14,058
  • 9
  • 57
  • 98