POL90: Statistics

class: center, middle, inverse, title-slide

# POL90: Statistics
## Confidence Intervals
### Prof Wasow<br/>Assistant Professor, Politics<br/>Pomona College
### 2021-02-15

---

# Announcements

.large[
* Assignments

+ PS03 due <mark>Friday, 2/11</mark>
    
    + Report 1

]

---
# Schedule

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> Week </th>
   <th style="text-align:left;"> Date </th>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:left;"> Title </th>
   <th style="text-align:right;"> Chapter </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> Jan 26 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Drawing Statistical Conclusions </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> Jan 31 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Inference Using t-Distributions </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> Feb 2 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Inference Using t-Distributions </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:left;"> Feb 7 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> A Closer Look at Assumptions </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;color: black !important;background-color: yellow !important;"> 4 </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 9 </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> A Closer Look at Assumptions </td>
   <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:left;"> Feb 14 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Alternatives to the t-Tools </td>
   <td style="text-align:right;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:left;"> Feb 16 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Alternatives to the t-Tools </td>
   <td style="text-align:right;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:left;"> Feb 21 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Comparison Among Several Samples </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:left;"> Feb 23 </td>
   <td style="text-align:left;"> Wed </td>
   <td style="text-align:left;"> Comparison Among Several Samples </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:left;"> Feb 28 </td>
   <td style="text-align:left;"> Mon </td>
   <td style="text-align:left;"> Simple Linear Regression </td>
   <td style="text-align:right;"> 7 </td>
  </tr>
</tbody>
</table>

---
## Assignment schedule

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> Week </th>
   <th style="text-align:left;"> Date </th>
   <th style="text-align:left;"> Day </th>
   <th style="text-align:left;"> Assignment </th>
   <th style="text-align:right;"> Percent </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> Feb 4 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> PS02 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;color: black !important;background-color: yellow !important;"> 4 </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 11 </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td>
   <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS03 </td>
   <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:left;"> Feb 18 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> PS04 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:left;"> Feb 25 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> PS05 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:left;"> Mar 4 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> Report1 </td>
   <td style="text-align:right;"> 6 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:left;"> Mar 11 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> PS06 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:left;"> Mar 18 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> Spring break </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 10 </td>
   <td style="text-align:left;"> Mar 25 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> PS07 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:left;"> Apr 1 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> PS08 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:left;"> Apr 8 </td>
   <td style="text-align:left;"> Fri </td>
   <td style="text-align:left;"> Report2 </td>
   <td style="text-align:right;"> 8 </td>
  </tr>
</tbody>
</table>

---

class: center, middle, inverse

# Report 1

---
class: center
## Report 1: Test a theory, elites vs masses

.pull-left[<img src="images/Zaller_James_big_square.jpg" alt="drawing" style="width:200px;"/><img src="images/lenz_gabriel2.jpg" alt="drawing" style="width:200px;"/>]

.pull-right[<img src="images/lee_taeku_big.jpg" alt="drawing" style="width:200px;"/><img src="images/erica-chenoweth-maria-stephan-nsc-briefing.jpg" alt="drawing" style="width:200px;"/><img src="images/Daniel-Gillion3.jpg" alt="drawing" style="width:200px;"/>]

---
class: center
## Report 1:  Engage with texts, not people

.pull-left[<img src="images/zaller_book_cover.jpg" alt="drawing" style="width:150px;"/>

<img src="images/lenz_book_cover.jpg" alt="drawing" style="width:150px;"/>]

.pull-rigth[<img src="images/lee_book_cover.jpg" alt="drawing" style="width:100px;"/>

<img src="images/gillion_book_cover.jpg" alt="drawing" style="width:100px;"/>]

---
## Pew: US opinion on same-sex marriage, 2001-2017

.footnote[Source: http://www.pewforum.org/fact-sheet/changing-attitudes-on-gay-marriage/]

---

## Report 1: Goals

.large[
* Use data to test theories

- A contest or horserace of theories
    - "Three cornered fight" or a court proceeding
]

.large[
* Use data as rhetoric
    - Report a statistical test
    - Summarize data in a table
    - Convey trends and relationships with visualization
]
--

.large[
* Produce replicable research
    - See how "literate programming" like R + R Markdown + Latex + knitr contributes to replication
    - Practice good programming and statistical style
]

---
## Report Questions & Suggestions

.large[

- Plan to use iPoll
  - Register on iPoll it makes downloading many polls much easier

- You will need to do some data cleaning, that's part of the assignment
  - See Data Scrubbing handout: http://appliedstats.org/data_scrub_handout.html

- Collaboration may be easier with RStudio.cloud. See link on Canvas & handout: http://appliedstats.org/rstudio_cloud_guide.html

]

---
class: middle, inverse, center
# Sampling Distribution 
# of the Sample Average

---
## Sampling Distribution of the Sample Average

.footnote[Source: Statistical Sleuth 3e, Display 2.3]

---
## Population 
<img src="images/ss_display_2_3_fished1.png" width="60%" style="display: block; margin: auto;" />

.footnote[Source: Statistical Sleuth 3e, Display 2.3]

---
## Population vs Sample 
<img src="images/ss_display_2_3_fished2.png" width="60%" style="display: block; margin: auto;" />

.footnote[Source: Statistical Sleuth 3e, Display 2.3]

---
## Population vs Sample vs Sampling distribution
<img src="images/ss_display_2_3_fished3.png" width="60%" style="display: block; margin: auto;" />

.footnote[Source: Statistical Sleuth 3e, Display 2.3]

---
class: center, middle, inverse

# Confidence Intervals

---

.footnote[Source: https://twitter.com/EpiEllie/status/1073385394580979712]

---

.footnote[Source: https://twitter.com/EpiEllie/status/1073385412993929217]

---

.footnote[Source: https://twitter.com/EpiEllie/status/1073385427317465089]

---
## Think 95% Interval not 95% Confidence

.large[
- Issue is partly wording

- Would you rather be 95% confident or 99% confident? 
  
  - Why not 100%?
  
  - Consider interval for heights

]

---
## Think 95% Interval not 95% Confidence

.large[
- With 95% Confidence Interval, 95% modifies interval, not confidence
  
  - Think: Interval *of* 95% 
  
  - A Confidence Interval of 95% should capture the true value 95% of the time
  
  - Wider the interval, more likely to capture true value
]
---

.footnote[Source: https://twitter.com/EpiEllie/status/1073385445835329536]
---
class: center, middle, inverse

# Confidence Intervals
# Analytically

---
## What Are Plausible Values for `$\mu$` (Given the Data)?

`$$t_{df}(0.025)  < \frac{(\bar{Y} - \mu)}{\textrm{SE}(\bar{Y})} < t_{df}(0.975)$$`

- Looks similar to `$t$`-ratio equation...

$$
t\text{-ratio} = \dfrac{\text{Estimate}-\text{Parameter}}{\text{SE(Estimate)}}
$$

$$
t-\text{ratio}(\text{if } \mu \text{ is zero}) = \dfrac{0.199-0}{0.0615} = 3.236
$$
--

- ...but with Confidence Interval, we're solving for `$\mu$` 
  
  - in second equation, we're solving for `$t$`-ratio (and `$\mu$` is set to zero)

---
## What Are Plausible Values for `$\mu$` (Given the Data)?

`$$t_{df}(0.025)  < \frac{(\bar{Y} - \mu)}{\textrm{SE}(\bar{Y})} < t_{df}(0.975)$$`

`$$t_{14}(0.025)  <  \frac{(0.199 - \mu)}{0.0615} < t_{14}(0.975)$$`

`$$t_{14}(0.025) \cdot 0.0615 < (0.199 - \mu) < t_{14}(0.975) \cdot 0.0615$$`

$$ -2.145 \cdot 0.0615  < (0.199 - \mu) < +2.145 \cdot 0.0615 $$

$$ -0.199 -2.145 \cdot 0.0615 <  - \mu < -0.199 +2.145 \cdot 0.0615 $$

$$ -0.331  < - \mu < -0.067 $$

$$ +0.331  > + \mu > +0.067 $$

$$ 0.067 < \quad \mu < 0.331 $$

---
## Intuition about the Confidence Interval formula

.large[
- Looking at each term:

$$ +0.199 -2.145 \cdot 0.0615 <  + \mu < +0.199 +2.145 \cdot 0.0615 $$
]

.large[

- Three terms

- +0.199 = Central tendency

- `$\pm$` 2.145 = Two Standard Deviations on `$t$`-distribution with 14 df

- 0.0615 = Scaling term 
]

---
## What are `$t_{14}(0.025)$` and `$t_{14}(0.975)$` in `R`?

- Now we use `qt()`. We give a probability and `qt()` returns quantile.

```r
# quantile for p = 0.025 in t-dist with 14 df
qt(p = 0.025, df = 14)
```

```
[1] -2.144787
```

```r
# quantile for p = 0.975 in t-dist with 14 df
qt(p = 0.975, df = 14)
```

```
[1] 2.144787
```

---
## How does `$t_{14}$` Compare to Other Distributions?

```r
# quantile for p = 0.025 in t-dist with 14 df
qt(p = 0.025, df = 14)
```

```
[1] -2.144787
```

```r
qt(p = 0.025, df = 60)
```

```
[1] -2.000298
```

```r
qt(p = 0.025, df = 100)
```

```
[1] -1.983972
```

```r
# quantile for p = 0.025 in Normal distribution
qnorm(p = 0.025, mean = 0, sd = 1)
```

```
[1] -1.959964
```

---

## Student's *t*-Distribution on 14 d.f

.footnote[Source: Statistical Sleuth 3e, Display 2.5]

---

## Student's *t*-Distribution on 14 d.f

.footnote[Source: Statistical Sleuth 3e, Display 2.5]

---

## Student's *t*-Distribution on 14 d.f

.footnote[Source: Statistical Sleuth 3e, Display 2.5]

---

## Student's *t*-Distribution on 14 d.f

.footnote[Source: Statistical Sleuth 3e, Display 2.5]

---
## Calculating Confidence Interval Bounds

```r
# left 95% CI
+0.199 -2.145 * 0.0615
```

```
[1] 0.0670825
```

```r
# right 95% CI
+0.199 +2.145 * 0.0615
```

```
[1] 0.3309175
```
---
## Loading `twins` data

```r
twins <- Sleuth3::case0202 %>% 
  janitor::clean_names()

twins <- twins %>% 
  mutate(
    difference = unaffected - affected,
    pair       = paste("Pair", row_number())
    ) %>% 
  select(pair, difference)

head(twins)
```

```
    pair difference
1 Pair 1       0.67
2 Pair 2      -0.19
3 Pair 3       0.09
4 Pair 4       0.19
5 Pair 5       0.13
6 Pair 6       0.40
```

---
## Checking Confidence Interval in `t.test`

```r
t.test(twins$difference)
```

```

One Sample t-test

data:  twins$difference
t = 3.2289, df = 14, p-value = 0.006062
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
* 0.0667041 0.3306292
sample estimates:
mean of x 
0.1986667 
```

---
class: center, middle, inverse

# Bootstrapping Confidence Intervals

---
## Recall populations & samples
.center[![](images/statistics1e_figun_03_p162.jpg)]

---

## Sampling Distribution of the Sample Average

.footnote[Source: Statistical Sleuth 3e, Display 2.3]

---

```r
twins_sample_as_pop <- rbind(twins, twins, twins, twins, twins, 
                 twins, twins, twins, twins, twins) 
dim(twins_sample_as_pop)
```

```
[1] 150   2
```

```r
twins_sample_as_pop %>% 
  sample_n(15)
```

```
      pair difference
1   Pair 7       0.04
2   Pair 6       0.40
3  Pair 13       0.02
4   Pair 3       0.09
5  Pair 15       0.11
6   Pair 6       0.40
7   Pair 2      -0.19
8  Pair 11       0.23
9  Pair 11       0.23
10  Pair 8       0.10
11 Pair 13       0.02
12  Pair 4       0.19
13 Pair 10       0.07
14 Pair 10       0.07
15  Pair 6       0.40
```

---

```r
twins_sample_as_pop <- rbind(twins, twins, twins, twins, twins, 
                 twins, twins, twins, twins, twins) 
dim(twins_sample_as_pop)
```

```
[1] 150   2
```

```r
twins_sample_as_pop %>% 
  sample_n(15)
```

```
      pair difference
1  Pair 15       0.11
2   Pair 1       0.67
3  Pair 12       0.59
4  Pair 12       0.59
5   Pair 3       0.09
6   Pair 2      -0.19
7  Pair 14       0.03
8   Pair 6       0.40
9   Pair 1       0.67
10 Pair 14       0.03
11  Pair 1       0.67
12  Pair 5       0.13
13  Pair 2      -0.19
14  Pair 8       0.10
15  Pair 3       0.09
```

---

```r
twins_sample_as_pop <- rbind(twins, twins, twins, twins, twins, 
                 twins, twins, twins, twins, twins) 
dim(twins_sample_as_pop)
```

```
[1] 150   2
```

```r
twins_sample_as_pop %>% 
  sample_n(15)
```

```
      pair difference
1  Pair 15       0.11
2  Pair 15       0.11
3   Pair 2      -0.19
4  Pair 10       0.07
5   Pair 6       0.40
6   Pair 2      -0.19
7  Pair 13       0.02
8   Pair 7       0.04
9  Pair 13       0.02
10  Pair 1       0.67
11 Pair 13       0.02
12  Pair 7       0.04
13 Pair 14       0.03
14 Pair 11       0.23
15  Pair 2      -0.19
```

---
class: middle, center
background-color: #000000

.footnote[Source: http://www.lock5stat.com/videos.html]

---
## Bootstrapping the Sampling Distribution

.footnote[Source: http://www.lock5stat.com/StatKey/bootstrap_1_quant/bootstrap_1_quant.html]

---
## Bootstrapping vs randomization test

.large[
* Both use simulation and randomization

* Bootstrapping
    + mimics random sampling
    + assumes sample represents population
    + draws new 'sample' from original sample 
    + typically draws with replacement
    + *no* randomization of group assignment
]
--
.large[
* Randomization test
    + mimics random assignment
    + assumes null hypothesis for effect of 'treatment'
    + randomizes group assignment
    + *no* replacement
]

---
class: center, middle

# Questions?