Do I need validation data if my train and test accuracy/loss is consistent?

Question

I am trying to understand the purpose of a 3rd split in the form of a validation dataset. I am not necessarily talking about cross-validation here.

In the scenario below, it would appear that the model is overfit to the training dataset.

Train dataset {acc: 97%, loss: 0.07}
Test dataset {acc: 90%, loss: 8.02}

However, in this scenario it appears much more balanced.

Train dataset {acc: 95%, loss: 1.14}
Test dataset {acc: 93%, loss: 1.83}

Do I need validation data if my train and test accuracy/loss is consistent? Is the purpose of setting a validation split of 10% to ensure this kind of balance before evaluating the model on the training set? What does it prove?

Donald S · Answer 1 · 2020-06-17T00:07:16.513

You don't always need 3 separate datasets. You usually split a dataset into 3 if you are doing some parameter or hyperparameter tuning before choosing a final model. Tuning will usually add bias from the 2nd dataset into your model, decreasing it's performance. For instance:

If you are manually tuning a model over several iterations and using the results from the 2nd dataset to find the optimal parameters. By doing so, you built some information from the 2nd dataset into your model. This will make the 2nd dataset not a good, unbiased benchmark for your final model. Therefore, you will want to use a 3rd untouched dataset, to give you unbiased final performance measurements of your model
Some models use a validation dataset internally while building model to evaluate loss, etc. This will cause the same problem with including bias into the model. Example:
```
model.fit(
    train_features,
    train_labels,
    batch_size=20,
    epochs=20,
    validation_data=(val_features, val_labels), # <- here
    verbose=0)
```

Thank you! Based on your second point, would you recommend manually running model.evaluate() after every training when tuning? — Kermit, Jun 16 '20 at 01:49
Yes, if you are manually tuning your algorithm, you can look at the loss for each layer and see the effect each one has. If you are not sure what the output from model.evaluate() is, you can use model.metrics_names to see what each one means. Just be sure to totally ignore the 3rd dataset until you have finished your tuning. If you plan to use the model in a production application, you don't want to include any bias from this 3rd dataset in your model. This way you can use the 3rd dataset to can see how your model will perform on new datasets. — Donald S, Jun 16 '20 at 04:06

score 1 · Answer 2 · answered Jun 20 '20 at 08:44

1

The only reason you keep different set it to test the model on unseen data.

Seeing is not just about using the data in training but also when you use it for testing and tweak your parm. In that hit-and-trial, you are actually fitting your model to the test data.

Crux is - The last set must be treated as new data and tried only a few times

answered Jun 20 '20 at 08:44

10xAI

5,584
2
8
24

By evaluating the model against a 3rd set, you are less likely to be overfit – Kermit Feb 05 '21 at 02:34

Do I need validation data if my train and test accuracy/loss is consistent?

2 Answers2