Descriptive statistics

Author

Stijn Masschelein

Setup

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(here)
here() starts at /Users/stijnmasschelein/Library/CloudStorage/Dropbox/Teaching/lecturenotes/method_package
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(cowplot)

Attaching package: 'cowplot'

The following object is masked from 'package:lubridate':

    stamp
theme_set(theme_cowplot(font_size = 18))
i_am("freaky_friday/descriptive.qmd")
here() starts at /Users/stijnmasschelein/Library/CloudStorage/Dropbox/Teaching/lecturenotes/method_package
main <- readRDS(here("data", "freaky_friday", "main.RDS")) %>%
  mutate(group = if_else(weekday == "Fri", "Friday", "Non-Friday"),
         year = year(anndat))
glimpse(main)
Rows: 130,358
Columns: 23
$ ticker       <chr> "A2", "A2", "A2", "A2", "A2", "A2", "A2", "AA0A", "AA0A",…
$ actual       <dbl> -0.0400, -0.1100, -0.1100, -0.0500, -0.0700, -0.1000, -0.…
$ pdf          <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ gvkey        <chr> "001081", "001081", "001081", "001081", "001081", "001081…
$ permno       <dbl> 10560, 10560, 10560, 10560, 10560, 10560, 10560, 10656, 1…
$ cusip        <chr> "00392410", "00392410", "00392410", "00392410", "00392410…
$ rdq          <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ anndat       <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ N            <int> 1, 2, 3, 2, 4, 5, 4, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 3, 2, …
$ median       <dbl> -0.09000, -0.08965, -0.05740, -0.03700, -0.10610, -0.0783…
$ mean         <dbl> -0.0900000, -0.0896500, -0.0680000, -0.0370000, -0.100550…
$ mean_days    <dbl> 21.000000, 8.000000, 6.666667, 13.500000, 7.000000, 9.200…
$ car_short    <dbl> 0.097290562, 0.036462040, -0.063605028, -0.004177245, -0.…
$ car_long     <dbl> 0.01028628, -0.20959780, 0.05661647, -0.18368201, 0.20741…
$ date_minus5  <date> 2004-10-16, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ date         <date> 2004-10-15, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ prc          <dbl> 5.63, 5.89, 4.53, 5.00, 3.26, 4.02, 4.28, 16.75, 15.66, 1…
$ market_value <dbl> 2478185.2, 2592630.8, 1993992.8, 2200875.0, 1434970.5, 17…
$ surprise     <dbl> 0.0088809947, -0.0034550085, -0.0116114790, -0.0026000000…
$ weekday      <ord> Thu, Wed, Wed, Wed, Wed, Wed, Wed, Thu, Thu, Wed, Thu, Th…
$ group        <chr> "Non-Friday", "Non-Friday", "Non-Friday", "Non-Friday", "…
$ year         <dbl> 2004, 2005, 2005, 2005, 2005, 2006, 2006, 2003, 2003, 200…

Panel A

The actual calculations are just in the summarise and the first mutate statement. I then calculate the totals and add them to the data with bind_rows. The numerical values get rounded to present them a bit nicer. The double pivot transposes the table so that rows become columns. Finally, I fix the order of the columns and use kable to present the table.

summarise(main, .by = weekday,
          Number = n()) %>%
  mutate(Fraction = Number / sum(Number)) %>%
  bind_rows(summarise(.,
                      across(c(Number, Fraction), sum)) %>%
            mutate(weekday = "All")) %>%
  mutate(Fraction = format(Fraction, digits = 2),
         Number = format(Number)) %>%
  pivot_longer(-weekday) %>%
  pivot_wider(names_from = weekday) %>%
  select(All, Mon, Tue, Wed, Thu, Fri) %>%
  kable(format = "markdown")
All Mon Tue Wed Thu Fri
130358 17956 33306 33427 38409 7260
1.000 0.138 0.255 0.256 0.295 0.056

Overall, my data has more than 10,000 fewer observations but the percentage of announcements per weekday matches the table in the paper pretty closely. That means I did something right.

Panel B: Baseline Sample

panelb <- main %>%
  mutate(month = month(anndat),
         month_quarter = (month - 1) %% 3 + 1) %>%
  select(group, surprise, market_value, year, month_quarter) %>%
  mutate(market_value = market_value / 1e3)

summarise(panelb, .by = group,
          across(c(surprise, market_value, year),
                 list(mean = mean, sd = sd))) %>%
  kable(format = "markdown")
group surprise_mean surprise_sd market_value_mean market_value_sd year_mean year_sd
Non-Friday -0.0006975 0.0265652 3620.593 15740.89 1999.716 3.350786
Friday -0.0028568 0.0335686 3268.788 17851.47 1999.299 3.400189
summarise(panelb,
          across(c(surprise, market_value, year),
                 ~ list(t.test(.x ~ group)))
          ) %>%
  pivot_longer(c(surprise, market_value, year)) %>%
  mutate(
    diff = map_dbl(value, ~ .x$estimate[1] - .x$estimate[2]),
    stderr = map_dbl(value, ~ .x$stderr),
    pvalue = map_dbl(value, ~ .x$p.value)) %>%
  select(-value) %>%
  kable(format = "markdown")
name diff stderr pvalue
surprise -0.0021594 0.0004012 0.0000001
market_value -351.8054036 214.2602171 0.1006392
year -0.4169706 0.0410326 0.0000000

I am not really sure that the statistical test for the month in the quarter makes sense. So I am not going to do these.

summarise(panelb, .by = c(group, month_quarter),
          N = n()) %>%
  mutate(fraction = N/sum(N), .by = group) %>%
  select(-N) %>%
  pivot_wider(names_from = group, values_from = fraction) %>%
  kable(format = "markdown", digits = 3)
month_quarter Non-Friday Friday
1 0.619 0.591
2 0.295 0.303
3 0.086 0.106

Dellavigna and Pollet () also report the same statistic for the a homogeneous sample with at least 10% of the observations on Friday and on non-Fridays. This is an example of asking the question where the variation is coming from in the sample and what the true comparison is that we want to make. In the full sample, we could be comparing firms that report on Friday to firms that do not. In the homogeneous sample, we are more likely to compare years where the same firm reports on Friday and years where they do not. The same question came up when we talked about compliers, always treated and never treated observations in the context of instrumental variables.

homogeneous <- main %>%
  mutate(perc_friday = sum(weekday == "Fri")/n(),
        .by = gvkey) %>%
  filter(perc_friday > .1, perc_friday < .9) %>%
  glimpse()
Rows: 23,201
Columns: 24
$ ticker       <chr> "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "…
$ actual       <dbl> -0.45, -0.37, -0.63, -0.65, -0.38, -0.36, -0.36, -0.44, -…
$ pdf          <chr> "P", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ gvkey        <chr> "133724", "133724", "133724", "133724", "133724", "133724…
$ permno       <dbl> 88784, 88784, 88784, 88784, 88784, 88784, 88784, 88784, 8…
$ cusip        <chr> "00724X10", "00724X10", "00724X10", "00724X10", "00724X10…
$ rdq          <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ anndat       <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ N            <int> 2, 2, 1, 1, 3, 2, 2, 4, 3, 4, 3, 3, 2, 4, 5, 4, 7, 5, 1, …
$ median       <dbl> -0.355, -0.545, -0.540, -0.680, -0.630, -0.530, -0.420, -…
$ mean         <dbl> -0.3550000, -0.5450000, -0.5400000, -0.6800000, -0.590000…
$ mean_days    <dbl> 0.000000, 0.000000, 0.000000, 1.000000, 13.333333, 11.500…
$ car_short    <dbl> -0.0134970976, -0.0513014547, 0.0138127967, 0.0357076318,…
$ car_long     <dbl> -0.08669761, 0.24714875, 0.01347011, -0.07084183, -0.0507…
$ date_minus5  <date> 2001-11-08, 2002-03-10, 2002-05-10, 2002-10-27, 2003-03-…
$ date         <date> 2001-11-08, 2002-03-08, 2002-05-10, 2002-10-25, 2003-03-…
$ prc          <dbl> 14.50, 13.10, 12.81, 13.89, 11.50, 13.95, 13.00, 17.96, 1…
$ market_value <dbl> 451124.00, 407462.40, 399492.66, 433937.49, 362537.50, 44…
$ surprise     <dbl> -0.0065517241, 0.0133587786, -0.0070257611, 0.0021598272,…
$ weekday      <ord> Tue, Fri, Wed, Fri, Tue, Wed, Thu, Wed, Tue, Tue, Wed, Tu…
$ group        <chr> "Non-Friday", "Friday", "Non-Friday", "Friday", "Non-Frid…
$ year         <dbl> 2001, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2004, 200…
$ perc_friday  <dbl> 0.2222222, 0.2222222, 0.2222222, 0.2222222, 0.2222222, 0.…

I’ll leave the descriptive statistics for the homogeneous sample as an exercise for the reader.

Figure 1

Quantiles

quantiles <- main %>%
  mutate(sign = case_when(surprise > 0 ~ "positive",
                          surprise < 0 ~ "negative",
                          surprise == 0 ~ "zero")) %>%
  mutate(
    quintile = ntile(surprise, 5),
    .by = c(sign, year)) %>%
  mutate(
    quantile = case_when(sign == "positive" ~ 6 + quintile,
                         sign == "negative" ~ quintile,
                         sign == "zero" ~ 6
                         )
  ) %>%
  glimpse()
Rows: 130,358
Columns: 26
$ ticker       <chr> "A2", "A2", "A2", "A2", "A2", "A2", "A2", "AA0A", "AA0A",…
$ actual       <dbl> -0.0400, -0.1100, -0.1100, -0.0500, -0.0700, -0.1000, -0.…
$ pdf          <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ gvkey        <chr> "001081", "001081", "001081", "001081", "001081", "001081…
$ permno       <dbl> 10560, 10560, 10560, 10560, 10560, 10560, 10560, 10656, 1…
$ cusip        <chr> "00392410", "00392410", "00392410", "00392410", "00392410…
$ rdq          <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ anndat       <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ N            <int> 1, 2, 3, 2, 4, 5, 4, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 3, 2, …
$ median       <dbl> -0.09000, -0.08965, -0.05740, -0.03700, -0.10610, -0.0783…
$ mean         <dbl> -0.0900000, -0.0896500, -0.0680000, -0.0370000, -0.100550…
$ mean_days    <dbl> 21.000000, 8.000000, 6.666667, 13.500000, 7.000000, 9.200…
$ car_short    <dbl> 0.097290562, 0.036462040, -0.063605028, -0.004177245, -0.…
$ car_long     <dbl> 0.01028628, -0.20959780, 0.05661647, -0.18368201, 0.20741…
$ date_minus5  <date> 2004-10-16, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ date         <date> 2004-10-15, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ prc          <dbl> 5.63, 5.89, 4.53, 5.00, 3.26, 4.02, 4.28, 16.75, 15.66, 1…
$ market_value <dbl> 2478185.2, 2592630.8, 1993992.8, 2200875.0, 1434970.5, 17…
$ surprise     <dbl> 0.0088809947, -0.0034550085, -0.0116114790, -0.0026000000…
$ weekday      <ord> Thu, Wed, Wed, Wed, Wed, Wed, Wed, Thu, Thu, Wed, Thu, Th…
$ group        <chr> "Non-Friday", "Non-Friday", "Non-Friday", "Non-Friday", "…
$ year         <dbl> 2004, 2005, 2005, 2005, 2005, 2006, 2006, 2003, 2003, 200…
$ sign         <chr> "positive", "negative", "negative", "negative", "positive…
$ quintile     <int> 5, 2, 1, 3, 5, 2, 3, 3, 3, 4, 2, 1, 3, 2, 5, 2, 4, 5, 5, …
$ quantile     <dbl> 11, 2, 1, 3, 11, 2, 3, 9, 9, 4, 2, 6, 9, 2, 11, 2, 10, 11…

This is a quick version of Figure 1a. It can be further cleaned up with better axis labels. It shows the main results from Dellavigna and Pollet () that the market reaction is subdued on Fridays.

ggplot(quantiles,
       aes(y = car_short, x = quantile, group = group, colour = group)) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = .2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_grey()

This is how I would program Figure 1a and b together. It’s a good example of how using pivot_longer can make your life easier. In this case, if we need to plot multiple similar variables.

quantiles %>%
  pivot_longer(c(car_short, car_long), values_to = "car", names_to = "window") %>%
  mutate(fig_name = if_else(window == "car_short", "Figure 1a", "Figure 1b")) %>%
  ggplot(aes(y = car, x = quantile, group = group, colour = group)) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = .2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_grey() +
  facet_wrap(~ fig_name, nrow = 2)

References

Dellavigna, Stefano, and Joshua M. Pollet. 2009. “Investor Inattention and Friday Earnings Announcements.” The Journal of Finance 64 (2): 709–49. https://doi.org/10.1111/j.1540-6261.2009.01447.x.