Model selection refers to selecting a model among many alternatives and is essential for particular modelling. There are several ways to perform model selection. The first one is subset selection, where we identify a subset of the predictors and the fit of a model on the reduced set of variables. The second one is shrinkage, where we fit a model with all predictors. However in our estimation procedure, we force the coefficients to move toward zero. Variables with zero coefficients are then dropped from the model, hence shrinkage can be viewed as a massive subset selection. We do not cover shrinkage in this course. The third one is dimension reduction where we project the predictors into a lower dimension of subspace. Principle component analysis we mentioned in the previous module is an example of this. In this video, we focus on subset selection. There are several method for model selection. We briefly talk about a few of them. In best subset selection, we fit all combinations of the predictor variables and choose the best one. In order to choose the best model, we can use cross validation or some other criteria. Notice as the number of combinations to consider grows quickly with the number of predictor variables. Except when the number of variables is very small, there are many combinations to consider. Even though it is desirable to consider all combination of a predictor variable in subset selection, we often end up with too many combination to examine. An alternative is to use the so-called stepwise selection, which can be viewed as a heuristic method for best subset selection. Instead of considering all combinations, we choose to add or remove one variable at a time, which will significantly reduce the number of models we need to evaluate. There are two variants of stepwise selection. In forward stepwise selection, we start with a one variable model and successively add variables, one at a time. Backward stepwise selection is almost the reverse. We start with the model containing all variables and successively remove variables. Like in best subset selection, we can use cross validation or some other criteria to choose the best model in each step. Now consider an example best subset selection using housing data. In the original dataset, there are 9 predictor variables. If we choose to consider all possible combinations of variables, they are 2 to the 9th power which is 512 possible models. In order to simplify our discussion, let's consider only three predictor variables, SQFT, PARKING.TYPE, and BEDS. There are a total of 2 to the 3rd power, which is 8 combinations to consider. This table, these all combinations of particular variables. As we mentioned before, there are eight of them, the first one is a null model containing no variables. The first column shows the variables included in the model and the second column shows the sum of squared error on the validation data. We use a 60-40 split as before where 188 rows are in the training data and 126 rows are in the validation data. Sum of squared errors were varies considerably across different models. It can be seen that the model with only SQFT and BEDS gives the lowest sum of squared error. Based on the cross-validation result, we choose this model as the best model.