I am writing a TensorFlow program which is trying to categorize a heavily skewed dataset between two categories. One category is represented at 30x the rate of the other.
Category 0: 800
Category 1: 30000
The label for each record is a one-hot vector of length 2, representing the two categories. Using a cross entropy cost equation doesn't work very well since a very high accuracy can be attained just by rating everything as category 1.
I then made a balanced dataset that has the following data:
Category 0: 800
Category 1: 800
This dataset has about 67% accuracy using softmax, but there is a real-world cost when falsely categorizing a category 1 item as category 0, which is blown way up when mistakenly categorizing many more items as category 0 mistakenly.
I wanted to fix this by making the cost equation penalize false categorization into category 0 more than other penalties. However, my cost equation is not producing any different results than the original cross entropy equation. I think I may have programmed it in wrong, but I am unsure where the mistake is.
In theory, the skewed equation will add 1 to the normal cost when the label for a record is [0,1] and the predicted value is [1,0], which represents a miscategorization of a category 1 item into category 0.
The skewed cost equation is in the condition where FLAGS.cost == 'skewed'
:
x = tf.placeholder(tf.float32, shape=[None, data_length])
W = tf.Variable(tf.zeros([data_length, 2]))
b = tf.Variable(tf.zeros([2]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
y_ = tf.placeholder(tf.float32, shape=[None, 2])
if FLAGS.cost == 'cross_entropy':
cost = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y,1e-10,1.0)))
elif FLAGS.cost == 'skewed':
positive_cutoff = tf.constant([0.5, 0.5])
desired_tensor = tf.constant([False, True])
true_tensor = tf.fill(tf.pack([2, tf.shape(y)[0], 2]), True)
cost = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y,1e-10,1.0)))+tf.to_float(
tf.reduce_all(
tf.logical_and(
tf.pack([
tf.logical_and(tf.greater_equal(y_, positive_cutoff), desired_tensor),
tf.logical_and(tf.less_equal(y, positive_cutoff), desired_tensor)
]),#shape [2,x,2]
true_tensor#shape [2,x,2]
)#shape [2,x,2]
)#shape [1]
)
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))