Descriptive statistics

Author

Stijn Masschelein

Setup

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.1     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(here)
here() starts at /Users/stijnmasschelein/Dropbox/Teaching/lecturenotes/method_package
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(cowplot)

Attaching package: 'cowplot'

The following object is masked from 'package:lubridate':

    stamp
theme_set(theme_cowplot(font_size = 18))
i_am("freaky_friday/descriptive.qmd")
here() starts at /Users/stijnmasschelein/Dropbox/Teaching/lecturenotes/method_package
main <- readRDS(here("data", "freaky_friday", "main.RDS")) %>%
  mutate(group = if_else(weekday == "Fri", "Friday", "Non-Friday"),
         year = year(anndat))
glimpse(main)
Rows: 130,759
Columns: 23
$ ticker       <chr> "A2", "A2", "A2", "A2", "A2", "A2", "AA0G", "AA0H", "AA0H…
$ actual       <dbl> -0.11, -0.11, -0.05, -0.07, -0.10, -0.04, -0.45, 0.01, 0.…
$ pdf          <chr> "D", "D", "D", "D", "D", "D", "P", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26, 2006-02-…
$ gvkey        <chr> "001081", "001081", "001081", "001081", "001081", "001081…
$ permno       <dbl> 10560, 10560, 10560, 10560, 10560, 10560, 88784, 10574, 1…
$ cusip        <chr> "00392410", "00392410", "00392410", "00392410", "00392410…
$ rdq          <date> 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26, 2006-02-…
$ anndat       <date> 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26, 2006-02-…
$ N            <int> 2, 3, 2, 4, 5, 1, 2, 1, 1, 1, 1, 2, 1, 5, 3, 2, 1, 4, 2, …
$ median       <dbl> -0.08965, -0.05740, -0.03700, -0.10610, -0.07830, -0.0900…
$ mean         <dbl> -0.0896500, -0.0680000, -0.0370000, -0.1005500, -0.076580…
$ mean_days    <dbl> 8.000000, 6.666667, 13.500000, 7.000000, 9.200000, 21.000…
$ car_short    <dbl> 0.036461999, -0.063605082, -0.004176757, -0.014869448, -0…
$ car_long     <dbl> -0.20959886, 0.05661849, -0.18368085, 0.20741889, 0.10569…
$ date_minus5  <date> 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-21, 2006-01-…
$ date         <date> 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-21, 2006-01-…
$ prc          <dbl> 5.89, 4.53, 5.00, 3.26, 4.02, 5.63, 14.50, 11.47, 10.85, …
$ market_value <dbl> 2592630.7, 1993992.8, 2200875.0, 1434970.5, 1769503.5, 24…
$ surprise     <dbl> -0.0034550086, -0.0116114785, -0.0026000000, 0.0110736197…
$ weekday      <ord> Wed, Wed, Wed, Wed, Wed, Thu, Tue, Tue, Tue, Wed, Tue, We…
$ group        <chr> "Non-Friday", "Non-Friday", "Non-Friday", "Non-Friday", "…
$ year         <dbl> 2005, 2005, 2005, 2005, 2006, 2004, 2001, 2002, 2002, 200…

Panel A

The actual calculations are just in the summarise and the first mutate statement. I then calculate the totals and add them to the data with bind_rows. The numerical values get rounded to present them a bit nicer. The double pivot transposes the table so that rows become columns. Finally, I fix the order of the columns and use kable to present the table.

summarise(main, .by = weekday,
          Number = n()) %>%
  mutate(Fraction = Number / sum(Number)) %>%
  bind_rows(summarise(.,
                      across(c(Number, Fraction), sum)) %>%
            mutate(weekday = "All")) %>%
  mutate(Fraction = format(Fraction, digits = 2),
         Number = format(Number)) %>%
  pivot_longer(-weekday) %>%
  pivot_wider(names_from = weekday) %>%
  select(All, Mon, Tue, Wed, Thu, Fri) %>%
  kable(format = "markdown")
All Mon Tue Wed Thu Fri
130759 18006 33409 33529 38532 7283
1.000 0.138 0.256 0.256 0.295 0.056

Overall, my data has more than 10,000 fewer observations but the percentage of announcements per weekday matches the table in the paper pretty closely. That means I did something right.

Panel B: Baseline Sample

panelb <- main %>%
  mutate(month = month(anndat),
         month_quarter = (month - 1) %% 3 + 1) %>%
  select(group, surprise, market_value, year, month_quarter) %>%
  mutate(market_value = market_value / 1e3)

summarise(panelb, .by = group,
          across(c(surprise, market_value, year),
                 list(mean = mean, sd = sd))) %>%
  kable(format = "markdown")
group surprise_mean surprise_sd market_value_mean market_value_sd year_mean year_sd
Non-Friday -0.0006961 0.0254121 3613.660 15717.58 1999.723 3.351327
Friday -0.0028837 0.0336559 3260.164 17823.93 1999.304 3.401054
summarise(panelb,
          across(c(surprise, market_value, year),
                 ~ list(t.test(.x ~ group)))
          ) %>%
  pivot_longer(c(surprise, market_value, year)) %>%
  mutate(
    diff = map_dbl(value, ~ .x$estimate[1] - .x$estimate[2]),
    stderr = map_dbl(value, ~ .x$stderr),
    pvalue = map_dbl(value, ~ .x$p.value)) %>%
  select(-value) %>%
  kable(format = "markdown")
name diff stderr pvalue
surprise -0.0021877 0.0004009 0.0000001
market_value -353.4961883 213.5926395 0.0979635
year -0.4180973 0.0409781 0.0000000

I am not really sure that the statistical test for the month in the quarter makes sense. So I am not going to do these.

summarise(panelb, .by = c(group, month_quarter),
          N = n()) %>%
  mutate(fraction = N/sum(N), .by = group) %>%
  select(-N) %>%
  pivot_wider(names_from = group, values_from = fraction) %>%
  kable(format = "markdown", digits = 3)
month_quarter Non-Friday Friday
1 0.618 0.591
2 0.296 0.303
3 0.086 0.105

Dellavigna and Pollet (2009) also report the same statistic for the a homogeneous sample with at least 10% of the observations on Friday and on non-Fridays. This is an example of asking the question where the variation is coming from in the sample and what the true comparison is that we want to make. In the full sample, we could be comparing firms that report on Friday to firms that do not. In the homogeneous sample, we are more likely to compare years where the same firm reports on Friday and years where they do not. The same question came up when we talked about compliers, always treated and never treated observations in the context of instrumental variables.

homogeneous <- main %>%
  mutate(perc_friday = sum(weekday == "Fri")/n(),
        .by = gvkey) %>%
  filter(perc_friday > .1, perc_friday < .9) %>%
  glimpse()
Rows: 23,329
Columns: 24
$ ticker       <chr> "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "…
$ actual       <dbl> -0.45, -0.37, -0.63, -0.65, -0.38, -0.36, -0.30, -0.31, -…
$ pdf          <chr> "P", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ gvkey        <chr> "133724", "133724", "133724", "133724", "133724", "133724…
$ permno       <dbl> 88784, 88784, 88784, 88784, 88784, 88784, 88784, 88784, 8…
$ cusip        <chr> "00724X10", "00724X10", "00724X10", "00724X10", "00724X10…
$ rdq          <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ anndat       <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ N            <int> 2, 2, 1, 1, 3, 2, 2, 4, 5, 7, 5, 1, 1, 1, 1, 4, 3, 3, 2, …
$ median       <dbl> -0.355, -0.545, -0.540, -0.680, -0.630, -0.530, -0.375, -…
$ mean         <dbl> -0.3550000, -0.5450000, -0.5400000, -0.6800000, -0.590000…
$ mean_days    <dbl> 0.000000, 0.000000, 0.000000, 1.000000, 13.333333, 11.500…
$ car_short    <dbl> -0.0134975048, -0.0513011472, 0.0138129292, 0.0357073706,…
$ car_long     <dbl> -0.08669771, 0.24715010, 0.01347160, -0.07084036, -0.0507…
$ date_minus5  <date> 2001-11-08, 2002-03-10, 2002-05-10, 2002-10-27, 2003-03-…
$ date         <date> 2001-11-08, 2002-03-08, 2002-05-10, 2002-10-25, 2003-03-…
$ prc          <dbl> 14.50, 13.10, 12.81, 13.89, 11.50, 13.95, 8.66, 8.89, 9.7…
$ market_value <dbl> 451124.00, 407462.41, 399492.67, 433937.50, 362537.50, 44…
$ surprise     <dbl> -0.0065517241, 0.0133587782, -0.0070257609, 0.0021598272,…
$ weekday      <ord> Tue, Fri, Wed, Fri, Tue, Wed, Tue, Thu, Fri, Thu, Thu, Th…
$ group        <chr> "Non-Friday", "Friday", "Non-Friday", "Friday", "Non-Frid…
$ year         <dbl> 2001, 2002, 2002, 2002, 2003, 2003, 2005, 2005, 2005, 200…
$ perc_friday  <dbl> 0.2222222, 0.2222222, 0.2222222, 0.2222222, 0.2222222, 0.…

I’ll leave the descriptive statistics for the homogeneous sample as an exercise for the reader.

Figure 1

Quantiles

quantiles <- main %>%
  mutate(sign = case_when(surprise > 0 ~ "positive",
                          surprise < 0 ~ "negative",
                          surprise == 0 ~ "zero")) %>%
  mutate(
    quintile = ntile(surprise, 5),
    .by = c(sign, year)) %>%
  mutate(
    quantile = case_when(sign == "positive" ~ 6 + quintile,
                         sign == "negative" ~ quintile,
                         sign == "zero" ~ 6
                         )
  ) %>%
  glimpse()
Rows: 130,759
Columns: 26
$ ticker       <chr> "A2", "A2", "A2", "A2", "A2", "A2", "AA0G", "AA0H", "AA0H…
$ actual       <dbl> -0.11, -0.11, -0.05, -0.07, -0.10, -0.04, -0.45, 0.01, 0.…
$ pdf          <chr> "D", "D", "D", "D", "D", "D", "P", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26, 2006-02-…
$ gvkey        <chr> "001081", "001081", "001081", "001081", "001081", "001081…
$ permno       <dbl> 10560, 10560, 10560, 10560, 10560, 10560, 88784, 10574, 1…
$ cusip        <chr> "00392410", "00392410", "00392410", "00392410", "00392410…
$ rdq          <date> 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26, 2006-02-…
$ anndat       <date> 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26, 2006-02-…
$ N            <int> 2, 3, 2, 4, 5, 1, 2, 1, 1, 1, 1, 2, 1, 5, 3, 2, 1, 4, 2, …
$ median       <dbl> -0.08965, -0.05740, -0.03700, -0.10610, -0.07830, -0.0900…
$ mean         <dbl> -0.0896500, -0.0680000, -0.0370000, -0.1005500, -0.076580…
$ mean_days    <dbl> 8.000000, 6.666667, 13.500000, 7.000000, 9.200000, 21.000…
$ car_short    <dbl> 0.036461999, -0.063605082, -0.004176757, -0.014869448, -0…
$ car_long     <dbl> -0.20959886, 0.05661849, -0.18368085, 0.20741889, 0.10569…
$ date_minus5  <date> 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-21, 2006-01-…
$ date         <date> 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-21, 2006-01-…
$ prc          <dbl> 5.89, 4.53, 5.00, 3.26, 4.02, 5.63, 14.50, 11.47, 10.85, …
$ market_value <dbl> 2592630.7, 1993992.8, 2200875.0, 1434970.5, 1769503.5, 24…
$ surprise     <dbl> -0.0034550086, -0.0116114785, -0.0026000000, 0.0110736197…
$ weekday      <ord> Wed, Wed, Wed, Wed, Wed, Thu, Tue, Tue, Tue, Wed, Tue, We…
$ group        <chr> "Non-Friday", "Non-Friday", "Non-Friday", "Non-Friday", "…
$ year         <dbl> 2005, 2005, 2005, 2005, 2006, 2004, 2001, 2002, 2002, 200…
$ sign         <chr> "negative", "negative", "negative", "positive", "negative…
$ quintile     <int> 2, 1, 3, 5, 2, 5, 2, 4, 4, 4, 1, 3, 3, 2, 1, 2, 4, 3, 1, …
$ quantile     <dbl> 2, 1, 3, 11, 2, 11, 2, 10, 10, 4, 6, 9, 9, 2, 6, 8, 4, 9,…

This is a quick version of Figure 1a. It can be further cleaned up with better axis labels. It shows the main results from Dellavigna and Pollet (2009) that the market reaction is subdued on Fridays.

ggplot(quantiles,
       aes(y = car_short, x = quantile, group = group, colour = group)) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = .2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_grey()

This is how I would program Figure 1a and b together. It’s a good example of how using pivot_longer can make your life easier. In this case, if we need to plot multiple similar variables.

quantiles %>%
  pivot_longer(c(car_short, car_long), values_to = "car", names_to = "window") %>%
  mutate(fig_name = if_else(window == "car_short", "Figure 1a", "Figure 1b")) %>%
  ggplot(aes(y = car, x = quantile, group = group, colour = group)) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = .2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_grey() +
  facet_wrap(~ fig_name, nrow = 2)

References

Dellavigna, Stefano, and Joshua M. Pollet. 2009. “Investor Inattention and Friday Earnings Announcements.” The Journal of Finance 64 (2): 709–49. https://doi.org/10.1111/j.1540-6261.2009.01447.x.