Earnings announcements

Author

Stijn Masschelein

library(tidyverse)
library(here)
i_am("freaky_friday/download_data.qmd")
here() starts at /Users/stijnmasschelein/Library/CloudStorage/Dropbox/Teaching/lecturenotes/method_package
library(RPostgres)
library(dtplyr)
library(dbplyr)
linking_table <- readRDS(here("data", "freaky_friday", "linking_table.RDS"))
wrds <- dbConnect(Postgres(),
                  host='wrds-pgdata.wharton.upenn.edu',
                  port=9737,
                  dbname='wrds',
                  user='stimas',
                  sslmode='require')

Earnings Announcements

I/B/E/S

We start with the earnings announcement data from I/B/E/S with the analyst estimates. According to the method section in Dellavigna and Pollet (2009), we need the data from the start of 1995 to the middle of 2006. We will want the analyst estimates for all the firms with a ticker in the master linking_table. I am going to use parameters that we can calculate or set in R and then pass them on to the database query.

begin_date <- "'1995-01-01'"
end_date <- "'2006-07-01'"
tickers <- unique(linking_table$ticker)

The dates of the estimate and the actual earnings announcement will be critical to construct unexpected component of the earnings and to determine the exact event data, i.e. the date that (the unexpected component of) the earnings are announced. Thankfully, WRDS provides a description of the date variables. anndats is the first day that an analyst set their estimate for the earnings per share and the revdats is the last day that the analyst confirmed their estimate. We will use revdats as the defacto date that the analyst provided the estimate. anndats_act is the earnings announcement date. value is the estimated EPS by the analyst and actual is the actual EPS as announced by the firm. pdf is flag whether the EPS if for the primary share class or on a diluted basis. I included both and that is probably appropriate for this paper. fpi is the forecast period indicator if we set this to “6”, we get the earnings estimates that are done in the quarter before the earnings announcements. All these variables can be verified in the data descriptions on WRDS. As you can see, it’s quite important if you work with data that you have not collected yourself to read the data descriptions.

ibes_query <- tbl(wrds, in_schema("ibes", "det_epsus")) %>%
  select(ticker, cusip, fpi, anndats, revdats, pdf, value, anndats_act, actual, analys) %>%
  filter(fpi == "6", ticker %in% tickers, !is.null(actual)) %>%
  filter(anndats_act >= begin_date, anndats_act <= end_date)
ann_ibes <- collect(ibes_query)
saveRDS(ann_ibes, here("data", "freaky_friday", "ann_ibes.RDS"))
glimpse(ann_ibes)
Rows: 1,046,663
Columns: 10
$ ticker      <chr> "AA", "A2", "A2", "A2", "AA", "AA", "AA", "AA", "AA", "AA"…
$ cusip       <chr> "02224910", "0039241X", "0039241X", "0039241X", "02224910"…
$ fpi         <chr> "6", "6", "6", "6", "6", "6", "6", "6", "6", "6", "6", "6"…
$ anndats     <date> 1994-10-10, 2004-09-30, 2004-10-21, 2004-10-27, 1994-10-1…
$ revdats     <date> 1995-07-06, 2004-09-30, 2004-10-22, 2004-11-03, 1994-10-1…
$ pdf         <chr> "D", "D", "D", "D", "P", "P", "P", "P", "P", "D", "D", "P"…
$ value       <dbl> 0.1875, -0.0900, -0.0403, -0.0410, 0.2663, 0.2963, 0.7500,…
$ anndats_act <date> 1995-01-11, 2004-10-21, 2004-10-21, 2004-10-21, 1995-01-1…
$ actual      <dbl> 0.2812, -0.0400, -0.0400, -0.0400, 0.2812, 0.2812, 0.7575,…
$ analys      <dbl> 297, 478, 5469, 43594, 288, 18082, 18082, 474, 288, 127, 6…

Compustat

Following the paper, we will verify the earnings announcement date in I/B/E/S with the earnings announcement date in Compustat. Given the importance of finding the exact date for an event study, it is not surprising that Dellavigna and Pollet (2009) spent a lot of effort to make sure that they have the date right.

gvkeys <- unique(linking_table$gvkey)

rdq is the earnings announcement date in Compustat. It does not make it easier that different databases use different variable names. That is why it so important to read the documentation of the database.

compu_query <- tbl(wrds, in_schema("comp", "fundq")) %>%
  filter(rdq >= begin_date, rdq <= end_date, gvkey %in% gvkeys) %>%
  select(cusip, rdq, gvkey)
ann_compu <- as_tibble(compu_query) %>%
  mutate(cusip = str_sub(cusip, 1, 8))
saveRDS(ann_compu, here("data", "freaky_friday", "ann_compu.RDS"))
glimpse(ann_compu)
Rows: 327,964
Columns: 3
$ cusip <chr> "00036110", "00036110", "00036110", "00036110", "00036110", "000…
$ rdq   <date> 1995-03-15, 1995-07-06, 1995-09-13, 1995-12-12, 1996-03-14, 199…
$ gvkey <chr> "001004", "001004", "001004", "001004", "001004", "001004", "001…

Combine Announcements

To combine the two datasets, we will link them through a simplified version of the larger linking table. I will also enforce that the first 6 characters of cusip are the same. I don’t think it is strictly necessary to do that but it does gives us more confidence that the links are of higher quality. We need to match the I/B/E/S data and the Compustat data based on the firm and its earnings announcement date. However, if you read the paper (Dellavigna and Pollet 2009), you will notice that the reason why we want to combine the I/B/E/S and Compustat data is because the date in both datasets does not always match. The paper gets around that by matching earnings announcements if the date is not more than 5 days apart in the two data sources. This is why I create anndat_begin and anndat_end to define the interval in which we want to match the data. Finally, we can calculate the actual event date as the minimum of the date in the I/B/E/S data and the Compustat data (Dellavigna and Pollet 2009) 1.

You can also see that I have two lines of code at the start to read in the datasets again. This is not strictly necessary to make this file fully reproducible but it does make debugging the code easier. If I want to make some changes to the code I do not have to download the data again from WRDS. I can just use the one in the data folder.

ann_ibes <- readRDS(here("data", "freaky_friday", "ann_ibes.RDS"))
ann_compu <- readRDS(here("data", "freaky_friday", "ann_compu.RDS"))
simple_link <- linking_table %>%
  select(ticker, gvkey, permno, cusip) %>%
  mutate(cusip = str_sub(cusip, end = 6)) %>%
  distinct()
earn_ann <- ann_ibes %>%
  distinct(ticker, actual, pdf, cusip, anndats_act) %>%
  mutate(cusip = str_sub(cusip, end = 6)) %>%
  left_join(simple_link, by = join_by(ticker)) %>%
  filter(!is.na(gvkey), !(cusip.x != cusip.y)) %>%
  select(-starts_with("cusip")) %>%
  mutate(anndat_begin = anndats_act - 5, anndat_end = anndats_act + 5) %>%
  left_join(ann_compu, by = join_by(gvkey == gvkey,
                                    anndat_begin <= rdq,  anndat_end >= rdq)) %>% 
  filter(!is.na(rdq)) %>%
  mutate(anndat = pmin(anndats_act, rdq)) %>%
  select(-anndat_begin, -anndat_end)
Warning in left_join(., simple_link, by = join_by(ticker)): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 151 of `x` matches multiple rows in `y`.
ℹ Row 11209 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
saveRDS(earn_ann, here("data", "freaky_friday", "earn_ann.RDS"))
glimpse(earn_ann)
Rows: 154,464
Columns: 9
$ ticker      <chr> "A2", "AA0G", "A2", "AA0A", "AA0G", "AA0G", "AA0R", "AA0R"…
$ actual      <dbl> -0.0400, -0.3000, -0.0800, 0.1067, -0.4100, -0.3100, 0.035…
$ pdf         <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D"…
$ anndats_act <date> 2004-10-21, 2005-03-01, 2006-04-26, 2003-05-08, 2004-03-0…
$ gvkey       <chr> "001081", "133724", "001081", "001094", "133724", "133724"…
$ permno      <dbl> 10560, 88784, 10560, 10656, 88784, 88784, 10659, 10659, 84…
$ cusip       <chr> "00392410", "00724X10", "00392410", "00444610", "00724X10"…
$ rdq         <date> 2004-10-21, 2005-03-01, 2006-04-26, 2003-05-08, 2004-03-0…
$ anndat      <date> 2004-10-21, 2005-03-01, 2006-04-26, 2003-05-08, 2004-03-0…

earn_ann serves as a linking table to match different earnings announcements (as opposed to firms and their securities in linking_table)

variable data source description
ticker I/B/E/S Identifier
anndats_act I/B/E/S Actual Announcement Date
gvkey Compustat Identifier
permno CRSP Identifier
cusip Identifier of length 6,8 or 9
rdq Compustat Actual Announcement Date
anndat pmin(rdq, anndats_act)

anndat is the validated way of calculating the announcement date. However, we need to keep the other dates around because we will need them to link back to the original databases.

The paper states that they have 154,051 earnings announcements (Dellavigna and Pollet 2009). We have 154464 earnings announcements.

Analyst Expectations

Finally, we have to calculate the median analyst earnings per share estimate for each company in the 30 days before the earnings announcement (Dellavigna and Pollet 2009). I will use the dtplyr package to do the calculations. I like the tidyverse for its expressiveness and readability but for some computations it can be slow 2. The data.table and the general fastverse can help speed up computations but the code syntax is less intuitive for beginner programmers. A nice intermediate step is to use the dtplyr package which translates tidyverse code into data.table code. We only have to make our datasets ready for the data.table code with lazy_dt(). The rest of the code should look like regular old tidyverse code. First, we only keep the last observation for every analyst in the 30 days before the earnings announcement. Second, we calculate the median estimate and some other statistics about the analyst estimates. In the last step, we convert the dataset back to a tidyverse tibble so that we can make use of our regular tidyverse functions.

earn_ann_lazy <- lazy_dt(earn_ann)
ann_ibes_lazy <- lazy_dt(ann_ibes)
analyst <- select(earn_ann_lazy, ticker, anndats_act, anndat) %>%
  left_join(ann_ibes_lazy, by = c("ticker", "anndats_act")) %>%
  filter(!is.na(anndat), anndat - 30 <= revdats, anndats < anndat) %>%
  select(-anndats_act) %>%
  group_by(ticker, anndat, actual, analys, pdf) %>%
  arrange(revdats) %>%
  summarise_all(~ last(.)) %>%
  group_by(ticker, anndat, actual, pdf) %>%
  summarise(N = n(), median = median(value, na.rm = T),
            mean = mean(value, na.rm = T),
            mean_days = mean(pmax(0, anndat - revdats))) %>%
  ungroup() %>%
  as_tibble()
`summarise()` has grouped output by 'ticker', 'anndat', 'actual'. You can
override using the `.groups` argument.
saveRDS(analyst, here("data", "freaky_friday", "analyst.RDS"))
glimpse(analyst)
Rows: 133,076
Columns: 8
$ ticker    <chr> "A2", "A2", "A2", "A2", "A2", "A2", "A2", "AA0A", "AA0A", "A…
$ anndat    <date> 2004-10-21, 2005-01-26, 2005-04-27, 2005-07-27, 2005-10-26,…
$ actual    <dbl> -0.0400, -0.1100, -0.1100, -0.0500, -0.0700, -0.1000, -0.080…
$ pdf       <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", …
$ N         <int> 1, 2, 3, 2, 4, 5, 4, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 3, 2, 2, …
$ median    <dbl> -0.09000, -0.08965, -0.05740, -0.03700, -0.10610, -0.07830, …
$ mean      <dbl> -0.0900000, -0.0896500, -0.0680000, -0.0370000, -0.1005500, …
$ mean_days <dbl> 21.000000, 8.000000, 6.666667, 13.500000, 7.000000, 9.200000…

References

Dellavigna, Stefano, and Joshua M. Pollet. 2009. “Investor Inattention and Friday Earnings Announcements.” The Journal of Finance 64 (2): 709–49. https://doi.org/10.1111/j.1540-6261.2009.01447.x.

Footnotes

  1. We need to use pmin so that the mutate statement knows that it needs to take the minimum between the two columns for each row and not over the whole dataset.↩︎

  2. Nevertheless, with some smart programming you can get the tidyverse to be very speedy as well. See further when we calculate the abnormal returns.↩︎