How does Keras calculate accuracy from the classwise probabilities? Say, for example we have 100 samples in the test set which can belong to one of two classes. We also have a list of the classwise probabilites. What threshold does Keras use to assign a sample to either of the two classes?
Asked
Active
Viewed 5.7k times
39
-
1are you using model.evaluate in keras? – Hima Varsha Oct 07 '16 at 08:15
-
1Yes, I am using model.evaluate. More specifically, model.evaluate_generator. – pseudomonas Oct 07 '16 at 10:10
-
http://datascience.stackexchange.com/questions/13920/accuracy-doesnt-match-in-keras/14500#14500 – Hima Varsha Oct 13 '16 at 11:47
-
1Possibly related @SO: How does Keras evaluate the accuracy?) – desertnaut Jul 03 '18 at 14:52
1 Answers
34
For binary classification, the code for accuracy metric is:
K.mean(K.equal(y_true, K.round(y_pred)))
which suggests that 0.5 is the threshold to distinguish between classes. y_true should of course be 1-hots in this case.
It's a bit different for categorical classification:
K.mean(K.equal(K.argmax(y_true, axis=-1), K.argmax(y_pred, axis=-1)))
which means "how often predictions have maximum in the same spot as true values"
There is also an option for top-k categorical accuracy, which is similar to one above, but calculates how often target class is within the top-k predictions.

Ethan
- 1,633
- 9
- 24
- 39

Mikhail Yurasov
- 726
- 6
- 7
-
Thank you for the answer. Does that mean even for binary classification, the labels need to be one hot encoded? – pseudomonas Mar 20 '17 at 05:02
-
@Raghuram No, for binary classification you just need 0 or 1 as class, no need to one hot encode them. Since K.mean(K.equal(y_true, K.round(y_pred))) will match 2 float values for each case, so it has to be 0 or 1 and not [0,1],[1,0]. – Divyanshu Kalra Jul 04 '17 at 20:13
-
-
2for a multi-class problem (with more than two classes), is there a difference between using "accuracy" vs "categorical_accuracy" – Quetzalcoatl Nov 06 '18 at 20:03
-
2And just in case, if the classes are mutually exclusive then use
sparse_categorical_accuracy
instead ofcategorical_accuracy
, this usually improves the outputs. The difference is discused here. – Noir Dec 10 '19 at 19:51 -
@mikhail - in my case my GT labels are [ 1 0 0 0 0 1 ] and values are generally [ 0.23 0.34 0.45 0.22 0.10 0.9] ..basically only the last one matches and the rest are counted as match because of the threshold artificially inflating results ..any suggestions on what other metric can be used here ? – Vikram Murthy Apr 16 '20 at 07:04
-
What is
K
? Because if it's supposed to bekeras
, I getmodule 'tensorflow.keras' has no attribute 'round'
– Jack M Dec 05 '20 at 19:29