The question is under a wrong assumption. Many people do what you say they "cannot" do.
In fact, the grid search implementation in the widely deployed sklearn package does just that. Unless refit=False
, it will retrain the final model using the entire data.
I think for some hyperparameters this might not be very desirable, because they are relative to the volume of data. For instance, consider the min_samples_leaf
pre-pruning tactic for a decision tree. If you have more data, the pre-pruning may not perform as you want.
But again, most people do in fact retrain using the entire data after cross-validation, so that they end up with the best model possible.
Addendum: @NeilSlater says below that some people perform hold-out on top of CV. In other words, they have a train-test split and then perform model selection on the training. According to him, they re-train using the original training set split, but not the testing set. The testing set is then used to perform a final model estimation. Personally, I see three flaws on this: (a) it does not solve the problem I mentioned with some hyperparameters being dependent on the volume of training since you are re-training anyway, (b) when testing many models, I prefer more sophisticated methods such as nested cross validation so that no data goes to waste, and (c) hold-out is an awful method to infer how a model will generalize when you have little data.