Control Variables

Stijn Masschelein

Causal Graphs

For more information see Chapter 8 in Huntington-Klein (2021).

An Example of a Causal Graph

brand_capital InfoEnvironment InfoEnvironment CreditRating CreditRating InfoEnvironment->CreditRating FutureCashFlow FutureCashFlow FutureCashFlow->CreditRating BrandCapital BrandCapital BrandCapital->InfoEnvironment BrandCapital->FutureCashFlow

Difference with equilibrium models

Differences

  • All the qualitative information about causal relations is in the graph.
  • The equilibrium model directly gives the relation between the variables of interest.

e.g.: Signaling model

donations Performance Performance Donation Donation Performance->Donation Return Return Donation->Return

Assignment: CSR report

measurement_error Performance Performance CSR_report CSR_report Performance->CSR_report Scandals Scandals Performance->Scandals Return Return CSR_report->Return Observed_Report Observed_Report CSR_report->Observed_Report

Measurement error and control variables

Causal Graph

measurement_error CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance TechIndustry TechIndustry TechIndustry->Performance

Simulation

set.seed(230383)
N <- 1000
ds <- tibble(CG = runif(N, 0, 10),
             TI = rbinom(N, 1, .25)) %>%
  mutate(Performance =
           rnorm(N, CG * .15 + TI * 10, 5))
lm1 <- lm(Performance ~ CG, data = ds)
lm2 <- lm(Performance ~ CG + TI, data = ds)
gof_omit <- "Adj|IC|Log|Pseudo|RMSE"
stars <- c('*' = .1, '**' = .05, '***' = .01)
msummary(list(lm1, lm2), stars = stars,
         gof_omit = gof_omit, output = "markdown")
(1) (2)
(Intercept) 2.976*** 0.036
(0.408) (0.310)
CG 0.081 0.130**
(0.073) (0.052)
TI 10.433***
(0.344)
Num.Obs. 1000 1000
R2 0.001 0.480

Note: ^^ * p < 0.1, ** p < 0.05, *** p < 0.01

Confounders and control variables

Causal Graph

confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance TechIndustry TechIndustry TechIndustry->CorporateGovernance TechIndustry->Performance

Simulation

N <- 1000
ds <- tibble(TI = rbinom(N, 1, .25)) %>%
  mutate(CG = rnorm(N, .5 - TI, .2),
         Performance = rnorm(N, TI + 0 * CG, 1))
lm1 <- lm(Performance ~ CG, data = ds)
lm2 <- lm(Performance ~ CG + TI, data = ds)
msummary(list(lm1, lm2), stars = stars,
         gof_omit = gof_omit, output = "markdown")
(1) (2)
(Intercept) 0.473*** 0.002
(0.036) (0.087)
CG -0.945*** -0.090
(0.067) (0.159)
TI 1.018***
(0.172)
Num.Obs. 1000 1000
R2 0.165 0.193

Note: ^^ * p < 0.1, ** p < 0.05, *** p < 0.01

Fixed effects as a special case

Definition

Effects that are the same for every industry, year, firm, or individual can be adjusted for by using fixed effects.

Benefits

We do not need to measure the specific variables and can just use indicators variables for each category (e.g. for each different industry).

See more in chapter 16 of Huntington-Klein (2021)

Fixed effects (for industry)

confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance Industry Industry Industry->CorporateGovernance Industry->Performance
Nind <- 20
N <- 5000
di <- tibble(
  ind_number = 1:Nind,
  ind_CG = rnorm(Nind, 0, 1),
  ind_performance = rnorm(Nind, 0, 1)
)
ds <- tibble(
    ind_number = sample(1:Nind, N, replace = TRUE)) %>%
  left_join(
    di, by = "ind_number") %>%
  mutate(
    CG = rnorm(N, .5 + ind_CG, .2),
    Performance = rnorm(N, 0 * CG + ind_performance, 1)
  )
glimpse(di, width = 50)
Rows: 20
Columns: 3
$ ind_number      <int> 1, 2, 3, 4, 5, 6, 7, 8, …
$ ind_CG          <dbl> 0.23567083, -0.34180999,…
$ ind_performance <dbl> 1.1335103, 1.2873377, 0.…
glimpse(ds, width = 50)
Rows: 5,000
Columns: 5
$ ind_number      <int> 9, 12, 5, 7, 6, 8, 15, 1…
$ ind_CG          <dbl> 1.91243572, 0.16031769, …
$ ind_performance <dbl> 0.1773941, -0.1250858, 0…
$ CG              <dbl> 2.3076069, 0.6604549, 1.…
$ Performance     <dbl> 0.09219279, -0.37244401,…
lm1 <- lm(Performance ~ CG, data = ds)
lm2 <- lm(Performance ~ CG + factor(ind_number), data = ds)
library(fixest)
fe <- feols(Performance ~ CG | ind_number, data = ds)
msummary(list(lm1, lm2, fe), gof_omit = gof_omit, stars = stars, output = "markdown")
(1) (2) (3)
(Intercept) 0.490*** 1.054***
(0.019) (0.080)
CG -0.076*** 0.030 0.030
(0.018) (0.071) (0.061)
factor(ind_number)2 0.207**
(0.097)
factor(ind_number)3 -0.490***
(0.090)
factor(ind_number)4 -0.372**
(0.161)
factor(ind_number)5 -0.410***
(0.099)
factor(ind_number)6 -1.907***
(0.169)
factor(ind_number)7 -0.083
(0.134)
factor(ind_number)8 -1.191***
(0.178)
factor(ind_number)9 -0.935***
(0.148)
factor(ind_number)10 -1.712***
(0.087)
factor(ind_number)11 -1.162***
(0.122)
factor(ind_number)12 -1.118***
(0.090)
factor(ind_number)13 0.765***
(0.091)
factor(ind_number)14 -0.564***
(0.104)
factor(ind_number)15 0.730***
(0.129)
factor(ind_number)16 0.689***
(0.158)
factor(ind_number)17 -2.056***
(0.098)
factor(ind_number)18 -0.503***
(0.091)
factor(ind_number)19 -1.907***
(0.093)
factor(ind_number)20 0.608***
(0.086)
Num.Obs. 5000 5000 5000
R2 0.003 0.443 0.443
R2 Within 0.000
Std.Errors by: ind_number
FE: ind_number X

Note: ^^ * p < 0.1, ** p < 0.05, *** p < 0.01

Nind <- 20
N <- 5000
correl <- -0.5
di <- tibble(
    ind_number = 1:Nind,
    ind_CG = rnorm(Nind, 0, 1)) %>%
  mutate(
    ind_performance = sqrt(1 - correl^2) * rnorm(Nind, 0, 1) + correl * ind_CG)
ds <- tibble(
    ind_number = sample(1:Nind, N, replace = TRUE)) %>%
  left_join(
    di, by = "ind_number") %>%
  mutate(
    CG = rnorm(N, .5 + ind_CG, .2),
    Performance = rnorm(N, 0 * CG + ind_performance, 1)
  )
glimpse(di, width = 50)
Rows: 20
Columns: 3
$ ind_number      <int> 1, 2, 3, 4, 5, 6, 7, 8, …
$ ind_CG          <dbl> -0.82999044, 0.44908313,…
$ ind_performance <dbl> -1.12284290, -0.57326559…
glimpse(ds, width = 50)
Rows: 5,000
Columns: 5
$ ind_number      <int> 20, 17, 2, 1, 9, 20, 5, …
$ ind_CG          <dbl> -1.1358960, 0.4833522, 0…
$ ind_performance <dbl> 1.0252155, -0.6131181, -…
$ CG              <dbl> -0.79273388, 1.44367931,…
$ Performance     <dbl> 1.75927497, -1.39745179,…
lm1 <- lm(Performance ~ CG, data = ds)
fe <- feols(Performance ~ CG | ind_number, data = ds)
msummary(list(lm1, fe), gof_omit = gof_omit, stars = stars)
 (1)   (2)
(Intercept) −0.033
(0.022)
CG −0.279*** −0.012
(0.015) (0.046)
Num.Obs. 5000 5000
R2 0.067 0.321
R2 Within 0.000
RMSE 1.17 1.00
Std.Errors by: ind_number
FE: ind_number X
* p < 0.1, ** p < 0.05, *** p < 0.01

What do fixed effects do?

Code
fe_plot <-
  ggplot(ds, aes(y = Performance, x = CG)) +
  geom_point()
plot(fe_plot)

Code
fe_colour <-
  ggplot(ds, aes(y = Performance, x = CG,
                colour = factor(ind_number))) +
  geom_point() + theme(legend.position="none") 
plot(fe_colour)

Code
fe_demean <- group_by(ds, ind_number) %>%
  mutate(Performance2 = Performance - mean(Performance),
         CG2 = CG - mean(CG)) %>%
  ggplot(aes(y = Performance2, x = CG2,
             colour = factor(ind_number))) +
  geom_point() + theme(legend.position="none") 
plot(fe_demean)

Speedboat Racing Example (Booth and Yamamura 2017)

  • Mixed-sex and single-sex races determined by lottery (Randomisation)
  • 7 race courses
  • Multiple races in the same month and location
speedboat ave_ability ave_ability ltime ltime ave_ability->ltime mixed_race mixed_race mixed_race->ltime female female female->ave_ability female->ltime course course circumstances circumstances course->circumstances month_location month_location month_location->circumstances circumstances->ltime circumstances->female

Results of Speedboat Races

Code
load(here("data", "booth_yamamura.Rdata"))
table <- as_tibble(table) %>%
  select(p_id, women_dat, time, ltime, mix_ra, course,
         race_id, yrmt_locid)
table_clean <- filter(table, complete.cases(table)) %>%
  select(ltime, women_dat, mix_ra, course, p_id, race_id,
         yrmt_locid)
ltime_reg <- feols(ltime ~ women_dat : mix_ra + mix_ra
                   | course + p_id + yrmt_locid,
                   cluster = "race_id",
                   data = table_clean)
msummary(ltime_reg, gof_omit = gof_omit, stars = stars)
 (1)
mix_ra −0.002***
(0.000)
women_dat × mix_ra 0.007***
(0.001)
Num.Obs. 142346
R2 0.361
R2 Within 0.001
Std.Errors by: race_id
FE: course X
FE: p_id X
FE: yrmt_locid X
* p < 0.1, ** p < 0.05, *** p < 0.01

Colliders and bad controls

Warning

Equilibrium models are very good at incorporating these effects!

Bad Controls, Survival Bias, Selection Bias, Self-Selection Bias

confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance StockPrice StockPrice Performance->StockPrice
confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance Survival Survival Performance->Survival
confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance SP500 SP500 Performance->SP500
confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance IPO IPO Performance->IPO

Example in the assignment

measurement_error Performance Performance CSR_report CSR_report Performance->CSR_report Scandals Scandals Performance->Scandals Return Return CSR_report->Return Observed_Report Observed_Report CSR_report->Observed_Report

Simulation Bad Control

confounder CorporateGovernance CorporateGovernance Performance Performance CorporateGovernance->Performance MarketReturn MarketReturn CorporateGovernance->MarketReturn Performance->MarketReturn
d <- tibble(corp_gov = rnorm(N, 0, 1)) %>%
  mutate(acc_profit = rnorm(N, corp_gov, sd = 3),
         market_return = rnorm(N, 2 * corp_gov + acc_profit,
                               sd = 3))
lm1 <- lm(acc_profit ~ corp_gov, data = d)
lm2 <- lm(acc_profit ~ corp_gov + market_return, data = d)
msummary(list(lm1, lm2),
         gof_omit = gof_omit, stars = stars)
 (1)   (2)
(Intercept) 0.039 0.012
(0.042) (0.030)
corp_gov 1.000*** −0.489***
(0.042) (0.037)
market_return 0.498***
(0.007)
Num.Obs. 5000 5000
R2 0.101 0.551
RMSE 2.98 2.11
* p < 0.1, ** p < 0.05, *** p < 0.01

Survival Bias

d <- mutate(d, survival = if_else(market_return > 5, 1, 0))
lm1 <- lm(acc_profit ~ corp_gov, data = filter(d, survival == 1))
lm2 <- lm(acc_profit ~ corp_gov * survival, data = d)
msummary(list(lm1, lm2), gof_omit = gof_omit, stars = stars, output = "markdown")
(1) (2)
(Intercept) 3.518*** -0.549***
(0.115) (0.043)
corp_gov -0.137 0.606***
(0.095) (0.045)
survival 4.067***
(0.135)
corp_gov × survival -0.743***
(0.115)
Num.Obs. 853 5000
R2 0.002 0.262

Note: ^^ * p < 0.1, ** p < 0.05, *** p < 0.01

Visualisation of Colliders (and Interactions)

Pitching Template

Pitching Format

  1. Description (Important)
    • Title
    • Research Question
    • Key Paper
    • Motivation
  2. THREE (IDioT) (Important)
    • Idea
    • Data
    • Tools

Pitching Format

  1. Description (Important)
    • Title
    • Research Question
    • Key Paper
    • Motivation
  2. THREE (IDioT) (Important)
    • Idea
    • Data
    • Tools
  1. TWO
    • What’s new?
    • So what?
  2. ONE contribution
  3. Other considerations.

References

Booth, Alison, and Eiji Yamamura. 2017. “Performance in Mixed-Sex and Single-Sex Competitions: What We Can Learn from Speedboat Races in Japan.” The Review of Economics and Statistics 100 (4): 581–93. https://doi.org/10.1162/rest_a_00715.
Huntington-Klein, Nick. 2021. The Effect: An Introduction to Research Design and Causality. First. Boca Raton: Chapman and Hall/CRC. https://doi.org/10.1201/9781003226055.