POL90: Applied Quantitative Analysis

class: center, middle, inverse, title-slide

# POL90: Applied Quantitative Analysis
## Chapter 7: Simple Linear Regression
### Prof WasowAssistant Professor, Politics
### 2022-03-02

---

.regression12 table {
  font-size: 12px;     
}

.regression14 table {
  font-size: 14px;     
}

</style>

# Announcements

.large[
* Assignments

+ Report 1

]

.large[
* Statistical Sleuth

+ Read Chapter 7

+ Supplement
      - http://appliedstats.org/chapter7.html

]

---
# Schedule

<table>
 <thead>
 <tr>
 <th style="text-align:right;"> Week </th>
 <th style="text-align:left;"> Date </th>
 <th style="text-align:left;"> Day </th>
 <th style="text-align:left;"> Title </th>
 <th style="text-align:right;"> Chapter </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:right;"> 5 </td>
 <td style="text-align:left;"> Feb 16 </td>
 <td style="text-align:left;"> Wed </td>
 <td style="text-align:left;"> A Closer Look at Assumptions </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 6 </td>
 <td style="text-align:left;"> Feb 21 </td>
 <td style="text-align:left;"> Mon </td>
 <td style="text-align:left;"> Alternatives to the t-Tools </td>
 <td style="text-align:right;"> 4 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 6 </td>
 <td style="text-align:left;"> Feb 23 </td>
 <td style="text-align:left;"> Wed </td>
 <td style="text-align:left;"> Comparison Among Several Samples </td>
 <td style="text-align:right;"> 5 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 7 </td>
 <td style="text-align:left;"> Feb 28 </td>
 <td style="text-align:left;"> Mon </td>
 <td style="text-align:left;"> Comparison Among Several Samples </td>
 <td style="text-align:right;"> 5 </td>
 </tr>
 <tr>
 <td style="text-align:right;color: black !important;background-color: yellow !important;"> 7 </td>
 <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 2 </td>
 <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td>
 <td style="text-align:left;color: black !important;background-color: yellow !important;"> Simple Linear Regression </td>
 <td style="text-align:right;color: black !important;background-color: yellow !important;"> 7 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 8 </td>
 <td style="text-align:left;"> Mar 7 </td>
 <td style="text-align:left;"> Mon </td>
 <td style="text-align:left;"> Simple Linear Regression </td>
 <td style="text-align:right;"> 7 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 8 </td>
 <td style="text-align:left;"> Mar 9 </td>
 <td style="text-align:left;"> Wed </td>
 <td style="text-align:left;"> Regression by Calculation </td>
 <td style="text-align:right;"> 7 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 9 </td>
 <td style="text-align:left;"> Mar 14 </td>
 <td style="text-align:left;"> Mon </td>
 <td style="text-align:left;"> Spring Recess </td>
 <td style="text-align:right;"> - </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 9 </td>
 <td style="text-align:left;"> Mar 16 </td>
 <td style="text-align:left;"> Wed </td>
 <td style="text-align:left;"> Spring Recess </td>
 <td style="text-align:right;"> - </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 10 </td>
 <td style="text-align:left;"> Mar 21 </td>
 <td style="text-align:left;"> Mon </td>
 <td style="text-align:left;"> Null hypothesis, R-squared </td>
 <td style="text-align:right;"> 8 </td>
 </tr>
</tbody>
</table>

---
## Assignment schedule

<table>
 <thead>
 <tr>
 <th style="text-align:right;"> Week </th>
 <th style="text-align:left;"> Date </th>
 <th style="text-align:left;"> Day </th>
 <th style="text-align:left;"> Assignment </th>
 <th style="text-align:right;"> Percent </th>
 </tr>
 </thead>
<tbody>
 <tr>
 <td style="text-align:right;"> 6 </td>
 <td style="text-align:left;"> Feb 25 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> PS05 </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;color: black !important;background-color: yellow !important;"> 7 </td>
 <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 4 </td>
 <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td>
 <td style="text-align:left;color: black !important;background-color: yellow !important;"> Report1 </td>
 <td style="text-align:right;color: black !important;background-color: yellow !important;"> 6 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 8 </td>
 <td style="text-align:left;"> Mar 11 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> PS06 </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 9 </td>
 <td style="text-align:left;"> Mar 18 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> Spring break </td>
 <td style="text-align:right;"> NA </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 10 </td>
 <td style="text-align:left;"> Mar 25 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> PS07 </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 11 </td>
 <td style="text-align:left;"> Apr 1 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> PS08 </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 12 </td>
 <td style="text-align:left;"> Apr 8 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> Report2 </td>
 <td style="text-align:right;"> 8 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 13 </td>
 <td style="text-align:left;"> Apr 15 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> PS09 </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 14 </td>
 <td style="text-align:left;"> Apr 22 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> PS10 </td>
 <td style="text-align:right;"> 3 </td>
 </tr>
 <tr>
 <td style="text-align:right;"> 15 </td>
 <td style="text-align:left;"> Apr 29 </td>
 <td style="text-align:left;"> Fri </td>
 <td style="text-align:left;"> Report3 </td>
 <td style="text-align:right;"> 10 </td>
 </tr>
</tbody>
</table>

---
class: center, middle

# Wrapping up Chapter 5: 
# Multiple comparisons 
# with ANOVA: Spock Trial

---
## Three ratios, Single Mean vs Seven Mean
 
<img src="images/f_stat_equation_anova_table16.png" width="1284" style="display: block; margin: auto;" />

---

## Calculating *p*-value

We can those statistics to calculate the probability of getting an *F*-statistic as extreme or more extreme on an F-distribution with numerator degrees of freedom of 6 and denominator degrees of freedom of 39:

```r
# Setting lower.tail = FALSE gives us right tail
pf(6.72, 6, 39, lower.tail = FALSE) 
```

```
[1] 0.00006082
```

The *p*-value is extremely small, therefore we can reject the null hypothesis that all the means are equal.

---
## Visualizing *F* (6, 39) & *F*-stat = 6.72

```r
visualize::visualize.f(stat = 6.72, df1 = 6, df2 = 39, section = "upper")
```

---

## Check results with `aov`

To check the robustness of our manual calculation, we can use the built-in `aov()` function:

```r
aov(Percent ~ Judge, data = spock) %>% 
summary()
```

```
            Df Sum Sq Mean Sq F value   Pr(>F)    
Judge        6   1927     321    6.72 0.000061 ***
Residuals   39   1864      48                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---
class: middle, center
# Calculating the *F*-statistic
# With ANOVA Tables

---

## ANOVA Table: Equal means vs two means

* Start with RSS and df for reduced model

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value |
|------------------------------|----------------|------|-------------|-------------|---------|
| `$RSS_{reduced} - RSS_{full}$` | ( )            | ( )  | ( )         | ( )         | ( )     |
| (Two Means)                  | ( )            | ( )  | ( )         |             |         |
| (Equal Means)                | 3,791.53       | 45   |             |             |         |
]

---

## ANOVA Table: Equal means vs two means

* Incorporate RSS and df for full model

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value |
|------------------------------|----------------|------|-------------|-------------|---------|
| `$RSS_{reduced} - RSS_{full}$` | ( )            | ( )  | ( )         | ( )         | ( )     |
| (Two Means)                  | (2190.90)      | (44) | ()     |             |         |
| (Equal Means)                | 3,791.53       | 45   |             |             |         |
]

---

## ANOVA Table: Equal means vs two means

* Divide `$RSS_{full}$` by `$df_{full}$` to calculate variance of `$RSS_{full}$` or `$s_p^2$`

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value |
|------------------------------|----------------|------|-------------|-------------|---------|
| `$RSS_{reduced} - RSS_{full}$` | ()      | ()  | ()   | ( )         | ( )     |
| (Two Means)                  | (2190.90)      | (44) | (49.79)     |             |         |
| (Equal Means)                | 3,791.53       | 45   |             |             |         |
]
---

## ANOVA Table: Equal means vs two means

* Calculate `$RSS_{reduced} - RSS_{full}$` and `$df_{reduced}$` - `$df_{full}$`

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value |
|------------------------------|----------------|------|-------------|-------------|---------|
| `$RSS_{reduced} - RSS_{full}$` | (1600.63)      | (1)  | ()   | ( )         | ( )     |
| (Two Means)                  | (2190.90)      | (44) | (49.79)     |             |         |
| (Equal Means)                | 3,791.53       | 45   |             |             |         |
]

---

## ANOVA Table: Equal means vs two means

* Calculate Mean Square by dividing ESS by ( `$df_{reduced} - df_{full}$` )

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value |
|------------------------------|----------------|------|-------------|-------------|---------|
| `$RSS_{reduced} - RSS_{full}$` | (1600.63)      | (1)  | (1600.63)   | ( )         | ( )     |
| (Two Means)                  | (2190.90)      | (44) | (49.79)     |             |         |
| (Equal Means)                | 3,791.53       | 45   |             |             |         |
]

---

## ANOVA Table: Equal means vs two means

* Calculate *F*-statistic by dividing Mean Square terms

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value |
|------------------------------|----------------|------|-------------|-------------|---------|
| `$RSS_{reduced} - RSS_{full}$` | (1600.63)      | (1)  | (1600.63)   | (32.15)         | ( )     |
| (Two Means)                  | (2190.90)      | (44) | (49.79)     |             |         |
| (Equal Means)                | 3,791.53       | 45   |             |             |         |
]
---

## ANOVA Table: Equal means vs two means

* Calculate `$p$`-value. In `R`: `pf`( *F*-statistic, `$df_{reduced}$` , `$df_{full}$` )

.vertical-center[
| Source of Variation          | Sum of Squares | d.f. | Mean Square | *F*-statistic | *p*-value    |
|------------------------------|----------------|------|-------------|-------------|------------|
| `$RSS_{reduced} - RSS_{full}$` | (1600.63)      | (1)  | (1600.63)   | (32.15)     | (0.000001) |
| (Two Means)                  | (2190.90)      | (44) | (49.79)     |             |            |
| (Equal Means)                | 3,791.53       | 45   |             |             |            |
]

---
## Visualizing *F* (1, 44) & *F*-stat = 32.15

```r
visualize::visualize.f(stat = 32.15, df1 = 1, df2 = 44, section = "upper")
```

---
class: middle, center

# Two means vs seven means

---