6

I'm working on with an imbalanced dataset in Keras, and would like to give a higher weight to the samples from my minority class. The fit() function has a nice sample_weight argument, however because of the size of my data I have to use fit_generator().

fit_generator() has a class_weight argument, which seems useful for this purpose and is already discussed in Another question. However, in this case the labels are not one-hot-encoded/categorical and I could not find whether using class_weight also allows for categorical data.

Can use the class_weight argument for one-hot-encoded/categorical labels and if so how? Or do I have to resort to a custom weighted loss function?

DGIB
  • 161
  • 1
  • 1
  • 2

1 Answers1

2

For categorical data, it is best to use sample_weight instead of class_weight argument. This can be done by simply giving all samples of a particular class the same weight. sample_weight works for categorical data because it takes a numpy array as its value as opposed to a dictionary (which won't work for categorical class labels) in case of class_weight.

See: Keras sequential model methods

You can either pass a flat (1D) Numpy array with the same length as the input samples (1:1 mapping between weights and samples), or in the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile().

The other way, as you mentioned is using custom weighted loss function. A detailed discussion on it can be found here.

  • Using sample_weight would indeed be a nice solution, having the fit_generator function unfortunately does not provide this option. – DGIB Aug 21 '17 at 06:38
  • For large data size, you can use train_on_batch which has sample_weight, instead of fit_generator. Refer this link from Keras for further details. – Janki Mehta Aug 21 '17 at 07:31