Theory
Theory
In this section, I will show how you can code a prediction model for the return of a company’s stock price based on the returns of a number of peer companies. I loosely follow the same approach as in Baker and Gelbach (2020). In that paper, peer firms are identified based on the SIC code in the Compustat data which indicates what the industry is of the firm’s primary product.
We are using the tidymodel
package for the application and I will follow the introduction to tidymodels. The advantage of the tidymodels approach is that you can follow a very similar workflow for other machine learning methods than the one I will be showing here. The code itself is not that important. The goal is more to give you a starting point.
To understand the machine learning approach that we are using, I need to start of with some theory. The fundamental idea is that we want to avoid overfitting in the data that we have, so that when we use the model to predict on new data we still have good predictions. This means that when we estimate our model we do not want to have a model that fits the current data as good as possible. We want to regularize the parameters in the model so that we do not get a perfect fit in the current sample and a better out-of-sample predictions. For instance, if we use 200 trading days and have 200 peer firms, we can perfectly predict within the sample of 200 days 1 but there are no guarantees that we will get good predictions from that model out-of-sample (i.e. after the earnings announcement).
For the linear model to predict stock returns based on peers, we will use a the elastic net regularizer to bias the estimates within the sample data to make it more likely that the linear model will give good predictions out-of-sample. One way to think about the linear model with peers as predictors is that we are creating a bespoke market index for each firm as a weighted average of its peers.
A regular linear model estimates the
That is, we want to find estimates that give the best possible fit in the data. The regulariser puts a penalty on bigger absolute values for the
The size of the penalty is given by the parameters
The final step is that we need to choose the right values for
The key insight is that if we care about a prediction task, we can use some of the data and pretend it is data that we have never seen when we estimate our prediction model. We can then test which model is actually good at predicting on data that it has never seen.
We do need a measure to evaluate the quality of the predictions. A common choice is the Root Mean Squared Error (RMSE) which is defined as the square root of the squared difference between the actual value of the outcome and the predictions for the outcome.
We will use the RMSE to evaluate the predictions out-of-sample. The RMSE is similar to the first equation where we choose the
References
Footnotes
It’s a system of linear equations with 200 unknown parameters (the
s) and 200 observations (the trading days).↩︎