Descriptive statistics

Author

Stijn Masschelein

Setup

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(here)

here() starts at /Users/stijnmasschelein/Library/CloudStorage/Dropbox/Teaching/lecturenotes/method_package

library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

library(cowplot)


Attaching package: 'cowplot'

The following object is masked from 'package:lubridate':

    stamp

theme_set(theme_cowplot(font_size = 18))
i_am("freaky_friday/descriptive.qmd")

here() starts at /Users/stijnmasschelein/Library/CloudStorage/Dropbox/Teaching/lecturenotes/method_package

main <- readRDS(here("data", "freaky_friday", "main.RDS")) %>%
  mutate(group = if_else(weekday == "Fri", "Friday", "Non-Friday"),
         year = year(anndat))
glimpse(main)

Rows: 130,358
Columns: 23
$ ticker       <chr> "A2", "A2", "A2", "A2", "A2", "A2", "A2", "AA0A", "AA0A",…
$ actual       <dbl> -0.0400, -0.1100, -0.1100, -0.0500, -0.0700, -0.1000, -0.…
$ pdf          <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ gvkey        <chr> "001081", "001081", "001081", "001081", "001081", "001081…
$ permno       <dbl> 10560, 10560, 10560, 10560, 10560, 10560, 10560, 10656, 1…
$ cusip        <chr> "00392410", "00392410", "00392410", "00392410", "00392410…
$ rdq          <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ anndat       <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ N            <int> 1, 2, 3, 2, 4, 5, 4, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 3, 2, …
$ median       <dbl> -0.09000, -0.08965, -0.05740, -0.03700, -0.10610, -0.0783…
$ mean         <dbl> -0.0900000, -0.0896500, -0.0680000, -0.0370000, -0.100550…
$ mean_days    <dbl> 21.000000, 8.000000, 6.666667, 13.500000, 7.000000, 9.200…
$ car_short    <dbl> 0.097290562, 0.036462040, -0.063605028, -0.004177245, -0.…
$ car_long     <dbl> 0.01028628, -0.20959780, 0.05661647, -0.18368201, 0.20741…
$ date_minus5  <date> 2004-10-16, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ date         <date> 2004-10-15, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ prc          <dbl> 5.63, 5.89, 4.53, 5.00, 3.26, 4.02, 4.28, 16.75, 15.66, 1…
$ market_value <dbl> 2478185.2, 2592630.8, 1993992.8, 2200875.0, 1434970.5, 17…
$ surprise     <dbl> 0.0088809947, -0.0034550085, -0.0116114790, -0.0026000000…
$ weekday      <ord> Thu, Wed, Wed, Wed, Wed, Wed, Wed, Thu, Thu, Wed, Thu, Th…
$ group        <chr> "Non-Friday", "Non-Friday", "Non-Friday", "Non-Friday", "…
$ year         <dbl> 2004, 2005, 2005, 2005, 2005, 2006, 2006, 2003, 2003, 200…

Panel A

The actual calculations are just in the summarise and the first mutate statement. I then calculate the totals and add them to the data with bind_rows. The numerical values get rounded to present them a bit nicer. The double pivot transposes the table so that rows become columns. Finally, I fix the order of the columns and use kable to present the table.

summarise(main, .by = weekday,
          Number = n()) %>%
  mutate(Fraction = Number / sum(Number)) %>%
  bind_rows(summarise(.,
                      across(c(Number, Fraction), sum)) %>%
            mutate(weekday = "All")) %>%
  mutate(Fraction = format(Fraction, digits = 2),
         Number = format(Number)) %>%
  pivot_longer(-weekday) %>%
  pivot_wider(names_from = weekday) %>%
  select(All, Mon, Tue, Wed, Thu, Fri) %>%
  kable(format = "markdown")

All	Mon	Tue	Wed	Thu	Fri
130358	17956	33306	33427	38409	7260
1.000	0.138	0.255	0.256	0.295	0.056

Overall, my data has more than 10,000 fewer observations but the percentage of announcements per weekday matches the table in the paper pretty closely. That means I did something right.

Panel B: Baseline Sample

panelb <- main %>%
  mutate(month = month(anndat),
         month_quarter = (month - 1) %% 3 + 1) %>%
  select(group, surprise, market_value, year, month_quarter) %>%
  mutate(market_value = market_value / 1e3)

summarise(panelb, .by = group,
          across(c(surprise, market_value, year),
                 list(mean = mean, sd = sd))) %>%
  kable(format = "markdown")

group	surprise_mean	surprise_sd	market_value_mean	market_value_sd	year_mean	year_sd
Non-Friday	-0.0006975	0.0265652	3620.593	15740.89	1999.716	3.350786
Friday	-0.0028568	0.0335686	3268.788	17851.47	1999.299	3.400189

summarise(panelb,
          across(c(surprise, market_value, year),
                 ~ list(t.test(.x ~ group)))
          ) %>%
  pivot_longer(c(surprise, market_value, year)) %>%
  mutate(
    diff = map_dbl(value, ~ .x$estimate[1] - .x$estimate[2]),
    stderr = map_dbl(value, ~ .x$stderr),
    pvalue = map_dbl(value, ~ .x$p.value)) %>%
  select(-value) %>%
  kable(format = "markdown")

name	diff	stderr	pvalue
surprise	-0.0021594	0.0004012	0.0000001
market_value	-351.8054036	214.2602171	0.1006392
year	-0.4169706	0.0410326	0.0000000

I am not really sure that the statistical test for the month in the quarter makes sense. So I am not going to do these.

summarise(panelb, .by = c(group, month_quarter),
          N = n()) %>%
  mutate(fraction = N/sum(N), .by = group) %>%
  select(-N) %>%
  pivot_wider(names_from = group, values_from = fraction) %>%
  kable(format = "markdown", digits = 3)

month_quarter	Non-Friday	Friday
1	0.619	0.591
2	0.295	0.303
3	0.086	0.106

Dellavigna and Pollet (2009) also report the same statistic for the a homogeneous sample with at least 10% of the observations on Friday and on non-Fridays. This is an example of asking the question where the variation is coming from in the sample and what the true comparison is that we want to make. In the full sample, we could be comparing firms that report on Friday to firms that do not. In the homogeneous sample, we are more likely to compare years where the same firm reports on Friday and years where they do not. The same question came up when we talked about compliers, always treated and never treated observations in the context of instrumental variables.

homogeneous <- main %>%
  mutate(perc_friday = sum(weekday == "Fri")/n(),
        .by = gvkey) %>%
  filter(perc_friday > .1, perc_friday < .9) %>%
  glimpse()

Rows: 23,201
Columns: 24
$ ticker       <chr> "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "AA0G", "…
$ actual       <dbl> -0.45, -0.37, -0.63, -0.65, -0.38, -0.36, -0.36, -0.44, -…
$ pdf          <chr> "P", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ gvkey        <chr> "133724", "133724", "133724", "133724", "133724", "133724…
$ permno       <dbl> 88784, 88784, 88784, 88784, 88784, 88784, 88784, 88784, 8…
$ cusip        <chr> "00724X10", "00724X10", "00724X10", "00724X10", "00724X10…
$ rdq          <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ anndat       <date> 2001-11-13, 2002-03-15, 2002-05-15, 2002-11-01, 2003-03-…
$ N            <int> 2, 2, 1, 1, 3, 2, 2, 4, 3, 4, 3, 3, 2, 4, 5, 4, 7, 5, 1, …
$ median       <dbl> -0.355, -0.545, -0.540, -0.680, -0.630, -0.530, -0.420, -…
$ mean         <dbl> -0.3550000, -0.5450000, -0.5400000, -0.6800000, -0.590000…
$ mean_days    <dbl> 0.000000, 0.000000, 0.000000, 1.000000, 13.333333, 11.500…
$ car_short    <dbl> -0.0134970976, -0.0513014547, 0.0138127967, 0.0357076318,…
$ car_long     <dbl> -0.08669761, 0.24714875, 0.01347011, -0.07084183, -0.0507…
$ date_minus5  <date> 2001-11-08, 2002-03-10, 2002-05-10, 2002-10-27, 2003-03-…
$ date         <date> 2001-11-08, 2002-03-08, 2002-05-10, 2002-10-25, 2003-03-…
$ prc          <dbl> 14.50, 13.10, 12.81, 13.89, 11.50, 13.95, 13.00, 17.96, 1…
$ market_value <dbl> 451124.00, 407462.40, 399492.66, 433937.49, 362537.50, 44…
$ surprise     <dbl> -0.0065517241, 0.0133587786, -0.0070257611, 0.0021598272,…
$ weekday      <ord> Tue, Fri, Wed, Fri, Tue, Wed, Thu, Wed, Tue, Tue, Wed, Tu…
$ group        <chr> "Non-Friday", "Friday", "Non-Friday", "Friday", "Non-Frid…
$ year         <dbl> 2001, 2002, 2002, 2002, 2003, 2003, 2003, 2003, 2004, 200…
$ perc_friday  <dbl> 0.2222222, 0.2222222, 0.2222222, 0.2222222, 0.2222222, 0.…

I’ll leave the descriptive statistics for the homogeneous sample as an exercise for the reader.

Figure 1

Quantiles

quantiles <- main %>%
  mutate(sign = case_when(surprise > 0 ~ "positive",
                          surprise < 0 ~ "negative",
                          surprise == 0 ~ "zero")) %>%
  mutate(
    quintile = ntile(surprise, 5),
    .by = c(sign, year)) %>%
  mutate(
    quantile = case_when(sign == "positive" ~ 6 + quintile,
                         sign == "negative" ~ quintile,
                         sign == "zero" ~ 6
                         )
  ) %>%
  glimpse()

Rows: 130,358
Columns: 26
$ ticker       <chr> "A2", "A2", "A2", "A2", "A2", "A2", "A2", "AA0A", "AA0A",…
$ actual       <dbl> -0.0400, -0.1100, -0.1100, -0.0500, -0.0700, -0.1000, -0.…
$ pdf          <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D…
$ anndats_act  <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ gvkey        <chr> "001081", "001081", "001081", "001081", "001081", "001081…
$ permno       <dbl> 10560, 10560, 10560, 10560, 10560, 10560, 10560, 10656, 1…
$ cusip        <chr> "00392410", "00392410", "00392410", "00392410", "00392410…
$ rdq          <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ anndat       <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-…
$ N            <int> 1, 2, 3, 2, 4, 5, 4, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 3, 2, …
$ median       <dbl> -0.09000, -0.08965, -0.05740, -0.03700, -0.10610, -0.0783…
$ mean         <dbl> -0.0900000, -0.0896500, -0.0680000, -0.0370000, -0.100550…
$ mean_days    <dbl> 21.000000, 8.000000, 6.666667, 13.500000, 7.000000, 9.200…
$ car_short    <dbl> 0.097290562, 0.036462040, -0.063605028, -0.004177245, -0.…
$ car_long     <dbl> 0.01028628, -0.20959780, 0.05661647, -0.18368201, 0.20741…
$ date_minus5  <date> 2004-10-16, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ date         <date> 2004-10-15, 2005-01-21, 2005-04-22, 2005-07-22, 2005-10-…
$ prc          <dbl> 5.63, 5.89, 4.53, 5.00, 3.26, 4.02, 4.28, 16.75, 15.66, 1…
$ market_value <dbl> 2478185.2, 2592630.8, 1993992.8, 2200875.0, 1434970.5, 17…
$ surprise     <dbl> 0.0088809947, -0.0034550085, -0.0116114790, -0.0026000000…
$ weekday      <ord> Thu, Wed, Wed, Wed, Wed, Wed, Wed, Thu, Thu, Wed, Thu, Th…
$ group        <chr> "Non-Friday", "Non-Friday", "Non-Friday", "Non-Friday", "…
$ year         <dbl> 2004, 2005, 2005, 2005, 2005, 2006, 2006, 2003, 2003, 200…
$ sign         <chr> "positive", "negative", "negative", "negative", "positive…
$ quintile     <int> 5, 2, 1, 3, 5, 2, 3, 3, 3, 4, 2, 1, 3, 2, 5, 2, 4, 5, 5, …
$ quantile     <dbl> 11, 2, 1, 3, 11, 2, 3, 9, 9, 4, 2, 6, 9, 2, 11, 2, 10, 11…

This is a quick version of Figure 1a. It can be further cleaned up with better axis labels. It shows the main results from Dellavigna and Pollet (2009) that the market reaction is subdued on Fridays.

ggplot(quantiles,
       aes(y = car_short, x = quantile, group = group, colour = group)) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = .2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_grey()

This is how I would program Figure 1a and b together. It’s a good example of how using pivot_longer can make your life easier. In this case, if we need to plot multiple similar variables.

quantiles %>%
  pivot_longer(c(car_short, car_long), values_to = "car", names_to = "window") %>%
  mutate(fig_name = if_else(window == "car_short", "Figure 1a", "Figure 1b")) %>%
  ggplot(aes(y = car, x = quantile, group = group, colour = group)) +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = .2) +
  stat_summary(fun = mean, geom = "line") +
  scale_color_grey() +
  facet_wrap(~ fig_name, nrow = 2)

References

Dellavigna, Stefano, and Joshua M. Pollet. 2009. “Investor Inattention and Friday Earnings Announcements.” The Journal of Finance 64 (2): 709–49. https://doi.org/10.1111/j.1540-6261.2009.01447.x.