class: center, middle, inverse, title-slide # POL90: Statistics ## Regression Discontinuity Design ### Prof. Wasow (with Andrew Mack) PoliticsPomona College ### 2022-05-02 --- <style type="text/css"> .regression10 table { font-size: 10px; } .regression12 table { font-size: 12px; } .regression14 table { font-size: 14px; } </style> # Announcements .large[ - Assignments - Report 3 due Today - Final is on Sakai - Report 2 - Reading: - Dunning *Natural Experiments in the Social Sciences*: - Chapter 1 (on Sakai -> Lessons) ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 30 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 4 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 6 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 11 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Missing Data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 13 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Panel Data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 18 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Matching </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 20 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Matching </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 25 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Causal inference: Natural Experiments </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 27 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Causal inference: Natural Experiments </td> <td style="text-align:right;"> Dunning </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 16 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> May 2 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mon </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Causal inference: RDD </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> NA </td> </tr> <tr> <td style="text-align:right;"> 16 </td> <td style="text-align:left;"> May 4 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Review </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 15 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> May 2 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mon </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Report3 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 10 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 16 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> May 13 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Final </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 36 </td> </tr> </tbody> </table> --- class: center, middle, inverse # Regression discontinuity design --- ## Randomized Controlled Experiments .large[ 1. The response of experimental subjects assigned to receive a treatment is compared to the response of subjects assigned to a control group 2. The assignment of subjects to treatment and control groups is done at random, through a randomizing device such as a coin flip 3. The manipulation of the treatment--also known as the intervention--is under the control of an experimental researcher ] --- ## Natural Experiments .large[ * Share attribute (1) of true experiments --- comparison of outcomes across treatment and control conditions * At least partially share (2), since assignment is random or "as if" random * Assignment not under control of researcher ] --- ## Natural Experiments .large[ * Dunning (2012) + "Standard" natural experiments + **Regression-discontinuity designs** + Instrumental-variables designs * Others? + Synthetic controls + Granger causality ] --- ## What is a regression discontinuity design? .large[ > "Regression discontinuity designs (RDD) were first introduced by Donald L. Thistlethwaite and Donald T. Campbell (1960) as a way of estimating treatment effects in a nonexperimental setting where treatment is determined by whether an observed ‘assignment’ variable (also referred to in the literature as the ‘forcing’ variable or the ‘running’ variable) exceeds a known cutoff point.” ] .small[.footnote[Source: Lee and Lemieux (2010)]] --- ## Regression discontinuity .large[ * Simple version + Continuous `\(X\)` + Discontinuous relation with `\(Y\)` due to cutoff rule * Cutoff typically follows some arbitrary bureaucratic rule + Elections: - winner at 50.1% vs loser at 49.9% + Academics: - Grade cutoff for school admission or scholarship ] --- ## Discuss: What explains this plot? <img src="images/carrell_hoekstra_west_drinking_college_figure3_1_no_caption.png" width="85%" style="display: block; margin: auto;" /> --- ## Alcohol & college performance <img src="images/carrell_hoekstra_west_drinking_college_titlepage.png" width="100%" style="display: block; margin: auto;" /> --- ## Alcohol & college performance They estimate model of form: `$$Grade_i = f(Age_i) + \delta(OlderThan21_i) + \epsilon_i$$` --- ## Alcohol & college performance <img src="images/carrell_hoekstra_west_drinking_college_figure3_1.png" width="80%" style="display: block; margin: auto;" /> --- ## Did George Floyd protests influence public opinion? <br> <img src="images/opinion_mobilizing_title_abstract.png" width="100%" style="display: block; margin: auto;" /> --- ## Did George Floyd protests influence public opinion? <br> <img src="images/opinion_mobilizing_fig1_upper.png" width="100%" style="display: block; margin: auto;" /> --- ## Did George Floyd protests influence public opinion? <br> <img src="images/opinion_mobilizing_fig2_upper.png" width="100%" style="display: block; margin: auto;" /> --- ## Did George Floyd protests influence public opinion? <br> <img src="images/opinion_mobilizing_fig3.png" width="100%" style="display: block; margin: auto;" /> --- ## Other discontinuities? <img src="images/dunning_table.png" width="100%" style="display: block; margin: auto;" /> --- ## Summary .large[ * Regression discontinuity design provides a powerful way to use arbitrary thresholds to provide leverage for causal inference * Important to note that RDD provides a *local* treatment effect; we only learn the treatment effect for units close to the threshold * RDD visualizations can provide compelling graphical evidence of a causal effect * That said, also important to supplement visual intuition with formal statistical tests * Also good to check assumption of "as-if" randomness by checking for covariate balance ] --- class: center, middle # Ethics and data --- <br><br> <img src="images/edstem_mateo_ethics_data.png" width="604" style="display: block; margin: auto;" /> --- <br><br><br> <img src="images/edstem_camille_ethics_data.png" width="548" style="display: block; margin: auto;" /> --- ## Ethics and data - Single vs multiple / complex effects? - Linearity assumptions? - Average effects vs heterogeneous effects? --- class: middle, center # How to study diet? --- ## Linear no-threshold - "Although compelling evidence on effect of low dosage of radiation was hard to come by, by late 1940s, idea of LNT became more popular due to its mathematical simplicity." --- ## Sometimes a J-curve! <img src="images/Radiations_at_low_doses.gif" style="display: block; margin: auto;" /> --- class: middle, center # Average vs Heterogeneous Effects --- ## Average vs Heterogeneous Effects - https://www.sciencedirect.com/science/article/abs/pii/S0002937812020352 - https://www.bmj.com/content/356/bmj.i6583 - https://vitamin-d-covid.shotwell.ca --- class: middle, center # Questions --- ## Regression discontinuity .large[ Treatment assignment takes simple threshold form: `$$T_i = \begin{cases} 1 & \textrm{ if } x \geq x_0 \\ 0 & \textrm{ if } x < x_0 \end{cases}$$` This leads to regression: `$$Y_i = \beta_0 + \beta_1 X_i + \delta T_i + \epsilon_i$$` Here the coefficient `\(\delta\)` is our measure of the treatment effect ] --- ## Regression discontinuity <img src="images/angrist_pischke_figure_linear.jpg" width="100%" style="display: block; margin: auto;" /> .small[.footnote[Source: Angrist and Pischke (2009))]] --- ## Regression discontinuity .large[ More generally, nonlinear relationship between `\(X\)` and `\(Y\)`: `$$Y_i = \color{red}{\beta_0 + \beta_1 X_i} + \delta T_i + \epsilon_i$$` ] --- ## Regression discontinuity .large[ More generally, nonlinear relationship between `\(X\)` and `\(Y\)`: `$$Y_i = {\color{red}{f(X)}} + \delta T_i + \epsilon_i$$` ] -- .large[ Here `\(f\)` is some nonlinear function (e.g. quadratic) estimated from the data. Again, the coefficient `\(\delta\)` is our measure of the treatment effect. ] --- ## Regression discontinuity <img src="images/angrist_pischke_figure_all.jpg" width="70%" style="display: block; margin: auto;" /> .small[.footnote[Source: Angrist and Pischke (2009))]] --- ## Regression discontinuity Important to visualize data! <img src="images/angrist_pischke_figure_mistaken.jpg" width="100%" style="display: block; margin: auto;" /> .small[Source: Angrist and Pischke (2009))] --- ## Alcohol & mortality #<img src="images/carpenter_dobkin_alcohol_mortality_titlepage.png" alt="some text" width="700" height="420"> <img src="images/carpenter_dobkin_alcohol_mortality_titlepage.png" width="90%" style="display: block; margin: auto;" /> --- ## Does drinking behavior change at 21? <img src="images/carpenter_dobkin_alcohol_mortality_figure1.png" width="100%" style="display: block; margin: auto;" /> ??? with self-reported survey data, helpful to look at a variety of outcomes --- ## Does drinking behavior change at 21? <img src="images/carpenter_dobkin_alcohol_mortality_figure2.png" width="80%" style="display: block; margin: auto;" /> ??? multiple outcome variables with different scales presented on same graph left y-axis: proportion of days right y-axis: drinks per day both proportion of days metrics appear to have more pronounced discontinuity --- ## Does mortality change at 21? <img src="images/carpenter_dobkin_alcohol_mortality_figure3.png" width="80%" style="display: block; margin: auto;" /> ??? external causes of death - car crashes, homicide, suicide, also includes alcohol-related interal causes such as fatty liver disease internal causes of death - death from medical conditions without an identifiable external cause --- ## Does mortality change at 21? <img src="images/carpenter_dobkin_alcohol_mortality_figure4.png" width="80%" style="display: block; margin: auto;" /> ??? MVA - motor vehicle accidents stand out --- ## Discuss: Is this a significant discontinuity? <img src="images/rdd_tstat_2p12.png" width="85%" style="display: block; margin: auto;" /> --- ## Vote: Is this a significant discontinuity? .vertical-center[ .large[ - http://pollev.com/pol346 ] ] --- <br><br><br><br> <img src="images/is-this-a-significant-discontinuity.png" width="100%" style="display: block; margin: auto;" /> ??? It is statistically significant. t-stat is 2.12 --- ## What is a significant discontinuity? <img src="images/jackson_tweet_rdd.png" width="100%" style="display: block; margin: auto;" /> .small[.footnote[Source: @KiraboJackson https://twitter.com/KiraboJackson/status/1074062192037847040.]] --- ## What is a significant discontinuity? - The following plot corresponds to a t-stat of 8.5 on `\(\delta\)` : <img src="images/rdd_tstat_8p5.png" width="85%" style="display: block; margin: auto;" /> .small[.footnote[Source: @KiraboJackson https://twitter.com/KiraboJackson/status/1074062192037847040.]] ??? while visualization is important, it's also important --- ## What is a significant discontinuity? <img src="images/rdd_tstat_2p3.png" width="75%" style="display: block; margin: auto;" /> -- - `\(t\)`-stat is 2.3 --- ## Incumbency advantage <img src="images/lee_title.png" width="95%" style="display: block; margin: auto;" /> --- ## Incumbency advantage .large[ * Treatment: whether US House candidate wins election at period `\(t\)` * Outcome: outcome at next election (period `\(t+1\)`) * Identifying assumption: due to random chance elements, very close elections in period `\(t\)` are essentially coin flips ] --- ## Incumbency advantage <img src="images/lee_fig.png" width="90%" style="display: block; margin: auto;" /> --- ## As-if random? <img src="images/caughey_sekhon_fig2_large.png" width="57%" style="display: block; margin: auto;" /> .small[.footnote[Source: Caughey & Sekhon (2011)]] --- ## As-if random? <img src="images/caughey_sekhon_fig2_zoom.png" width="100%" style="display: block; margin: auto;" /> .small[.footnote[Source: Caughey & Sekhon (2011)]] --- ## As-if random? .large[ > In fact, the outcomes of very close elections can be predicted with a high degree of accuracy based on such ex ante indicators as the partisanship of the previous incumbent, the financial resources of the candidates, and Congressional Quarterly’s pre-election race ratings. ] * The election outcome in period `\(t\)` is highly correlated with pretreatment variables, suggesting that treatment is not *as-if random* around the 50% threshold * However, there are studies of elections in other contexts that don't have this problem (e.g. Titiunik (2009)) .small[.footnote[Source: Caughey & Sekhon (2011)]]