POL90: Statistics

class: center, middle, inverse, title-slide

# POL90: Statistics
## Regression Discontinuity Design
### Prof. Wasow (with Andrew Mack)</br> Politics</br>Pomona College
### 2022-05-02

---

.regression12 table {
  font-size: 12px;     
}

.regression14 table {
  font-size: 14px;     
}

</style>

# Announcements

.large[

- Assignments
  - Report 3 due Today
  - Final is on Sakai
  - Report 2

- Reading:
  - Dunning *Natural Experiments in the Social Sciences*: 
    - Chapter 1 (on Sakai -> Lessons)

]

---
# Schedule

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> Week </th>
   <th style="text-align:left;"> Date </th>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:left;"> Title </th>
   <th style="text-align:right;"> Chapter </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:left;"> Mar 30 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Interaction terms </td>
   <td style="text-align:right;"> 9 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:left;"> Apr 4 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Logistic regression </td>
   <td style="text-align:right;"> 20 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:left;"> Apr 6 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Logistic regression </td>
   <td style="text-align:right;"> 20 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 13 </td>
   <td style="text-align:left;"> Apr 11 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Missing Data </td>
   <td style="text-align:right;"> Handout </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 13 </td>
   <td style="text-align:left;"> Apr 13 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Panel Data </td>
   <td style="text-align:right;"> Handout </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 14 </td>
   <td style="text-align:left;"> Apr 18 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Matching </td>
   <td style="text-align:right;"> Handout </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 14 </td>
   <td style="text-align:left;"> Apr 20 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Matching </td>
   <td style="text-align:right;"> Handout </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:left;"> Apr 25 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Causal inference: Natural Experiments </td>
   <td style="text-align:right;"> Handout </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 15 </td>
   <td style="text-align:left;"> Apr 27 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Causal inference: Natural Experiments </td>
   <td style="text-align:right;"> Dunning </td>
  </tr>
  <tr>
   <td style="text-align:right;color: black !important;background-color: yellow !important;"> 16 </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> May 2 </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mon </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> Causal inference: RDD </td>
   <td style="text-align:right;color: black !important;background-color: yellow !important;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 16 </td>
   <td style="text-align:left;"> May 4 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Review </td>
   <td style="text-align:right;"> NA </td>
  </tr>
</tbody>
</table>

---
## Assignment schedule

---
class: center, middle, inverse

# Regression discontinuity design

---

## Randomized Controlled Experiments
.large[
1. The response of experimental subjects assigned to receive a treatment is compared to the response of subjects assigned to a control group

2. The assignment of subjects to treatment and control groups is done at random, through a randomizing device such as a coin flip

3. The manipulation of the treatment--also known as the intervention--is under the control of an experimental researcher
]
---

## Natural Experiments
.large[
* Share attribute (1) of true experiments --- comparison of outcomes across treatment and control conditions

* At least partially share (2), since assignment is random or "as if" random

* Assignment not under control of researcher
]
---
## Natural Experiments
.large[
* Dunning (2012)

+ "Standard" natural experiments
  + **Regression-discontinuity designs**
  + Instrumental-variables designs
  
  
  
* Others?

+ Synthetic controls
  + Granger causality
  ]
---

## What is a regression discontinuity design?
.large[
> "Regression discontinuity designs (RDD) were first introduced by Donald L. Thistlethwaite and Donald T. Campbell (1960) as a way of estimating treatment effects in a nonexperimental setting where treatment is determined by whether an observed ‘assignment’ variable (also referred to in the literature as the ‘forcing’ variable or the ‘running’ variable) exceeds a known cutoff point.”
]

.small[.footnote[Source: Lee and Lemieux (2010)]]

---

## Regression discontinuity
.large[
* Simple version

+ Continuous `$X$`
  + Discontinuous relation with `$Y$` due to cutoff rule
  
* Cutoff typically follows some arbitrary bureaucratic rule

+ Elections: 
      - winner at 50.1% vs loser at 49.9%
      
  + Academics: 
      - Grade cutoff for school admission or scholarship 
  ]

---
## Discuss: What explains this plot?
<img src="images/carrell_hoekstra_west_drinking_college_figure3_1_no_caption.png" width="85%" style="display: block; margin: auto;" />

---

## Alcohol & college performance

<img src="images/carrell_hoekstra_west_drinking_college_titlepage.png" width="100%" style="display: block; margin: auto;" />
---

## Alcohol & college performance
They estimate model of form:

`$$Grade_i = f(Age_i) + \delta(OlderThan21_i) + \epsilon_i$$`
---

## Alcohol & college performance
<img src="images/carrell_hoekstra_west_drinking_college_figure3_1.png" width="80%" style="display: block; margin: auto;" />

---
## Did George Floyd protests influence public opinion?
<br>

---
## Did George Floyd protests influence public opinion?
<br>

---
## Did George Floyd protests influence public opinion?
<br>

---
## Did George Floyd protests influence public opinion?
<br>

---

## Other discontinuities?

---

## Summary

.large[ 
* Regression discontinuity design provides a powerful way to use arbitrary thresholds to provide leverage for causal inference

* Important to note that RDD provides a *local* treatment effect; we only learn the treatment effect for units close to the threshold

* RDD visualizations can provide compelling graphical evidence of a causal effect

* That said, also important to supplement visual intuition with formal statistical tests

* Also good to check assumption of "as-if" randomness by checking for covariate balance
]

---
class: center, middle

# Ethics and data

---
<br><br>
<img src="images/edstem_mateo_ethics_data.png" width="604" style="display: block; margin: auto;" />

---
<br><br><br>

---
## Ethics and data

- Single vs multiple / complex effects?

- Linearity assumptions?

- Average effects vs heterogeneous effects?

---
class: middle, center

# How to study diet?

---
## Linear no-threshold

- "Although compelling evidence on effect of low dosage of radiation was hard to come by, by late 1940s, idea of LNT became more popular due to its mathematical simplicity."

---
## Sometimes a J-curve!

---
class: middle, center

# Average vs Heterogeneous Effects

---
## Average vs Heterogeneous Effects

- https://www.sciencedirect.com/science/article/abs/pii/S0002937812020352
  
  - https://www.bmj.com/content/356/bmj.i6583
  
  - https://vitamin-d-covid.shotwell.ca

---
class: middle, center

# Questions

---

## Regression discontinuity
.large[
Treatment assignment takes simple threshold form:
`$$T_i = \begin{cases} 1 & \textrm{ if } x \geq x_0 \\
0 & \textrm{ if } x < x_0
\end{cases}$$`

This leads to regression:
`$$Y_i = \beta_0 + \beta_1 X_i + \delta T_i + \epsilon_i$$`
Here the coefficient `$\delta$` is our measure of the treatment effect
]
---

## Regression discontinuity
<img src="images/angrist_pischke_figure_linear.jpg" width="100%" style="display: block; margin: auto;" />
.small[.footnote[Source: Angrist and Pischke (2009))]]

---

## Regression discontinuity
.large[
More generally, nonlinear relationship between `$X$` and `$Y$`:
`$$Y_i =  \color{red}{\beta_0 + \beta_1 X_i} + \delta T_i + \epsilon_i$$`
]
---

## Regression discontinuity
.large[
More generally, nonlinear relationship between `$X$` and `$Y$`:
`$$Y_i =  {\color{red}{f(X)}} + \delta T_i + \epsilon_i$$`
]
--
.large[
Here `$f$` is some nonlinear function (e.g. quadratic) estimated from the data.

Again, the coefficient `$\delta$` is our measure of the treatment effect.
]
---

## Regression discontinuity
<img src="images/angrist_pischke_figure_all.jpg" width="70%" style="display: block; margin: auto;" />

.small[.footnote[Source: Angrist and Pischke (2009))]]

---

## Regression discontinuity
Important to visualize data!

<img src="images/angrist_pischke_figure_mistaken.jpg" width="100%" style="display: block; margin: auto;" />
.small[Source: Angrist and Pischke (2009))]

---
## Alcohol & mortality
#<img src="images/carpenter_dobkin_alcohol_mortality_titlepage.png" alt="some text"  width="700" height="420">

---

## Does drinking behavior change at 21?

<img src="images/carpenter_dobkin_alcohol_mortality_figure1.png" width="100%" style="display: block; margin: auto;" />
???
with self-reported survey data, helpful to look at a variety of outcomes

---

## Does drinking behavior change at 21?

<img src="images/carpenter_dobkin_alcohol_mortality_figure2.png" width="80%" style="display: block; margin: auto;" />
???
multiple outcome variables with different scales presented on same graph

left y-axis: proportion of days
right y-axis: drinks per day

both proportion of days metrics appear to have more pronounced discontinuity

---

## Does mortality change at 21?
<img src="images/carpenter_dobkin_alcohol_mortality_figure3.png" width="80%" style="display: block; margin: auto;" />

???
external causes of death - car crashes, homicide, suicide, also includes alcohol-related interal causes such as fatty liver disease

internal causes of death - death from medical conditions without an identifiable external cause

---

## Does mortality change at 21?
<img src="images/carpenter_dobkin_alcohol_mortality_figure4.png" width="80%" style="display: block; margin: auto;" />

???
MVA - motor vehicle accidents stand out
---

## Discuss: Is this a significant discontinuity?
<img src="images/rdd_tstat_2p12.png" width="85%" style="display: block; margin: auto;" />
---

## Vote: Is this a significant discontinuity?

.vertical-center[
.large[
- http://pollev.com/pol346
]
]

---
<br><br><br><br>
<img src="images/is-this-a-significant-discontinuity.png" width="100%" style="display: block; margin: auto;" />

???
It is statistically significant. t-stat is 2.12

---
## What is a significant discontinuity?

.small[.footnote[Source: @KiraboJackson https://twitter.com/KiraboJackson/status/1074062192037847040.]]

---

## What is a significant discontinuity?
- The following plot corresponds to a t-stat of 8.5 on `$\delta$` :
<img src="images/rdd_tstat_8p5.png" width="85%" style="display: block; margin: auto;" />

.small[.footnote[Source: @KiraboJackson https://twitter.com/KiraboJackson/status/1074062192037847040.]]

???
while visualization is important, it's also important
---

## What is a significant discontinuity?
<img src="images/rdd_tstat_2p3.png" width="75%" style="display: block; margin: auto;" />
--

- `$t$`-stat is 2.3

---

## Incumbency advantage
<img src="images/lee_title.png" width="95%" style="display: block; margin: auto;" />

---

## Incumbency advantage
.large[

* Treatment: whether US House candidate wins election at period `$t$`

* Outcome: outcome at next election (period `$t+1$`)

* Identifying assumption: due to random chance elements,  very close elections in period `$t$` are essentially coin flips

]
---

## Incumbency advantage
<img src="images/lee_fig.png" width="90%" style="display: block; margin: auto;" />

---

## As-if random?
<img src="images/caughey_sekhon_fig2_large.png" width="57%" style="display: block; margin: auto;" />

.small[.footnote[Source: Caughey & Sekhon (2011)]]
---

## As-if random?
<img src="images/caughey_sekhon_fig2_zoom.png" width="100%" style="display: block; margin: auto;" />

.small[.footnote[Source: Caughey & Sekhon (2011)]]
---

## As-if random?
.large[
> In fact, the outcomes of very close elections can
be predicted with a high degree of accuracy based on such ex ante indicators as the partisanship of the
previous incumbent, the financial resources of the candidates, and Congressional Quarterly’s pre-election
race ratings.
]

* The election outcome in period `$t$` is highly correlated with pretreatment variables, suggesting that treatment is not *as-if random* around the 50% threshold

* However, there are studies of elections in other contexts that don't have this problem (e.g. Titiunik (2009))

.small[.footnote[Source: Caughey & Sekhon (2011)]]