Content

Author

Stijn Masschelein

Introduction

This section contains a written introduction to R and Rstudio with the goal of getting you up and running with the main tools for data analysis and reporting results for the unit. The script I use for this introduction is also available with some explanations. The coding is more advanced and you should see it more as a useful resource than something you should be able to do at the start of the semester.

Slides

The slides contain the lecture slides for the first half of the semester.

The first set of slides introduce a number of practical issues around the structure, goals,and assignments for the unit. I also demonstrate what research looks like with an example from executive compensation. You can find the data on LMS.1

The second set of slides introduces the notion of simulations and simulated data. Simulations are a way to make an abstract theory more concrete and to test our intuition of statistical tests. This is exactly what we are going to do in this lecture where the theory is a matching theory of firms and CEOs. Finally, we will test whether there is a relation between pay-for-performance and the size of the firm.

As background reading, I have also made a more detailed explanation of the matching theory of CEO compensation. It reinforces the value of knowing how to simulate some data from a theory.

The third set of slides looks at the issue of when and how to control (and sometimes not control) for additional effects. It’s complicated!

The fourth and fifth set of slides basically give up on trying to control for everything. The goal is to focus on the research design, i.e. find a situation where we can be reasonable sure that our research question is answerable. I will focus my attention on event studies and its bigger (but slower) brother difference-in-difference and instrumental variables.

Freaky Friday or Friday Earnings Announcements Are Weird

For the remainder of the unit, we will change the mode of teaching. The remaining parts are best seen as case studies into some specific topics. The first case study is an attempt at replicating the main results of Dellavigna and Pollet (2009) from scratch. The goal is twofold. First, the study is an event study of the market reaction to the release of news which is the workhorse study design for many finance studies. Second, the study requires different sources of data and it is a good exercise to demonstrate how to manage data in a larger project. Start from the introduction to the replication and go from there.

Machine Learning

This is also a two parter. The main goal is to make you aware when machine learning tools are useful. The other part is a long R implementation of one of the machine learning techniques that is most related to linear regression. The last part is mainly to give you the code as a starting point.

Generalised Linear Models

Here I introduce the use of GLM models for discrete outcomes. As before, I first emphasise why the regulare linear model works reasonably well for a lot of typical applications and when the GLM models are more appropriate. Maybe surprisingly, there is a clear link with the machine learning arguments.

Generated Variables

This section looks at a diverse set of methods that all have one thing in common. They are often a combination of two are more regression steps. I show how that might effect the uncertainty estimates and how the bootstrap can help to get the uncertainty estimates correct.

References

Dellavigna, Stefano, and Joshua M. Pollet. 2009. “Investor Inattention and Friday Earnings Announcements.” The Journal of Finance 64 (2): 709–49. https://doi.org/10.1111/j.1540-6261.2009.01447.x.

Footnotes

  1. Or download it yourself with the introduction script.↩︎