Which of the below set of steps options is the correct one when creating a predictive model?
Option 1:
First eliminate the most obviously bad predictors, and preprocess the remaining if needed, then train various models with cross-validation, pick the few best ones, identify the top predictors each one has used, then retrain those models with those predictors only and evaluate accuracy again with cross-validation, then pick the best one and train it on the full training set using its key predictors and then use it to predict the test set.
Option 2:
First eliminate the most obviously bad predictors, then preprocess the remaining if needed, then use a feature selection technique like recursive feature selection (eg. RFE with rf ) with cross-validation for example to identify the ideal number of key predictors and what these predictors are, then train different model types with cross-validation and see which one gives the best accuracy with those top predictors identified earlier. Then train the best one of those models again with those predictors on the full training set and then use it to predict the test set.