class: center, middle, inverse, title-slide # POL90: Statistics
Chapter 7: Regression by Calculation ### Prof Wasow
Assistant Professor, Politics
Pomona College ### 2022-03-12 --- <style type="text/css"> .regression10 table { font-size: 10px; } .regression12 table { font-size: 12px; } .regression14 table { font-size: 14px; } </style> # Announcements .large[ * Assignments + PS06 ] -- .large[ * Statistical Sleuth + Read Chapter 7 + Supplement - http://appliedstats.org/chapter7.html ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Feb 23 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Comparison Among Several Samples </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Feb 28 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Comparison Among Several Samples </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Mar 2 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Simple Linear Regression </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 7 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Simple Linear Regression </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 8 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 9 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Regression by Calculation </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 21 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Null hypothesis, R-squared </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 23 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Multiple regression </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 28 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Mar 4 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report1 </td> <td style="text-align:right;"> 6 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 8 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 11 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS06 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS07 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Apr 1 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS08 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 8 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report2 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 15 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS09 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 22 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS10 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 29 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report3 </td> <td style="text-align:right;"> 10 </td> </tr> <tr> <td style="text-align:right;"> 16 </td> <td style="text-align:left;"> May 6 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> NA </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> --- class: middle, center # Report 1: Did Donald Trump # Drive Public Opinion # on "Build the Wall"? --- ## Welch, Rodi, Shaw & Wonghirundacha (2022) <img src="images/welch_rodi_shaw_wonghirundacha01.png" width="858" style="display: block; margin: auto;" /> --- ## Welch, Rodi, Shaw & Wonghirundacha (2022) <br> <img src="images/welch_rodi_shaw_wonghirundacha02.png" width="592" style="display: block; margin: auto;" /> --- ## Welch, Rodi, Shaw & Wonghirundacha (2022) <br> <img src="images/welch_rodi_shaw_wonghirundacha03.png" width="621" style="display: block; margin: auto;" /> --- ## Welch, Rodi, Shaw & Wonghirundacha (2022) <br><br><br><br> <img src="images/welch_rodi_shaw_wonghirundacha04.png" width="873" style="display: block; margin: auto;" /> --- ## Welch, Rodi, Shaw & Wonghirundacha (2022) <br><br> <img src="images/welch_rodi_shaw_wonghirundacha05.png" width="860" style="display: block; margin: auto;" /> --- class: middle, center # Mathematical Approach to # Linear Models --- ## Revisiting Meat Processing and PH Level ```r meat <- Sleuth3::case0702 meat ``` ``` Time pH 1 1 7.02 2 1 6.93 3 2 6.42 4 2 6.51 5 4 6.07 6 4 5.99 7 6 5.59 8 6 5.80 9 8 5.51 10 8 5.36 ``` --- ```r meat %>% ggplot() + aes(x = Time, y = pH) + geom_point() + * geom_smooth(method = "loess", se = FALSE) ``` <img src="week08_02_files/figure-html/unnamed-chunk-10-1.png" width="792" style="display: block; margin: auto;" /> --- ```r meat <- meat %>% mutate(log_time = log(Time)) meat %>% ggplot() + aes(x = log_time, y = pH) + geom_point() + * geom_smooth(method = "lm", se = FALSE) ``` <img src="week08_02_files/figure-html/unnamed-chunk-11-1.png" width="792" style="display: block; margin: auto;" /> --- ## Terminology - Greek letters like `\(\beta\)`, are the unobserved truth - Modified Greek letters like `\(\hat{\beta}\)` are our estimate. They are what we think the truth is based on our data - English letters like `\(\boldsymbol{X}\)` are actual data from our sample - Modified English letters like `\(\bar{X}\)` are calculations from our sample. They're what we do with our data - We can say that our estimate of the truth is that calculation, e.g. `\(\hat{\mu} = \bar{X}\)` $$ Data \longrightarrow Calculation \longrightarrow Estimate \overset{\text{hopefully}}\longrightarrow Truth $$ .footnote[Source: Nick Huntington-Klein, https://twitter.com/nickchk/status/1272993322395557888] --- ## Review notation for fitted values and residuals .vertical-center[ .large[ - Hat notation denotes an estimate of the parameter `\begin{eqnarray*} \hat{\mu}\{Y|X\} & = &\hat{\beta_0} + \hat{\beta_1}X \end{eqnarray*}` - Estimated mean is the <span style="color:blue">fitted value</span> or <span style="color:blue">predicted value<span> - Difference between observed response and estimated mean is the <span style="color:blue">residual<span> ] ] `\begin{eqnarray*} fit_i =\hat{\mu}\{Y_i|X_i\} &= &\hat{\beta_0} + \hat{\beta_1}X_i \\ res_i & = & Y_i - fit_i \end{eqnarray*}` --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-12-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-13-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-14-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-15-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-16-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-17-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Visualizing a single `\(Y_i\)` and `\(\hat{Y_i}\)` (zoomed in) .center[ <img src="week08_02_files/figure-html/unnamed-chunk-18-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Model assumptions .vertical-center[ .large[ `\begin{eqnarray*} Y_i & = & \beta_0 + \beta_1X_i + \epsilon_i, \end{eqnarray*}` - where `\(\epsilon_i \sim N(0, \sigma^2)\)` ] ] --- ## Ideal, normal, simple linear regression .center[ <img src="images/ss_display_7_5.png" width="60%" style="display: block; margin: auto;" /> ] .footnote[Source: *Statistical Sleuth*, 3e, Display 7.5] --- ## Model assumptions .large[ - <span style="color:blue">Linearity assumption</span>: the means of the sub-population of responses for each value of the explanatory variable - <span style="color:blue">Equal spread assumption</span> (constant variance assumption): the sub-population standard deviations are all equal (to `\(\sigma\)`) - <span style="color:blue">Normality assumption</span>: there is a normally distributed sub-population of responses for each value of the explanatory variable - <span style="color:blue">Independence assumption</span>: each response is drawn independently of all other responses from the same sub-population and independently of all responses drawn from other sub-populations ] --- ## Least squares estimators .vertical-center[ .large[ `\begin{eqnarray*} \hat{\beta}_1 & = & \frac{\sum^{n}_{i=1}(X_i-\bar{X})(Y_i-\bar{Y}) }{\sum^{n}_{i=1} (X_i-\bar{X})^2}, \\ & & \\ \hat{\beta}_0 & = & \bar{Y} - \hat{\beta}_1\bar{X} \end{eqnarray*}` ] ] --- ## Least squares estimators manually in R ```r x_bar <- mean(meat$log_time) x_bar ``` ``` [1] 1.19 ``` ```r y_bar <- mean(meat$pH) y_bar ``` ``` [1] 6.12 ``` ```r x_deviation <- meat$log_time - x_bar x_deviation ``` ``` [1] -1.1901 -1.1901 -0.4970 -0.4970 0.1962 0.1962 0.6016 0.6016 0.8893 [10] 0.8893 ``` ```r y_deviation <- meat$pH - y_bar y_deviation ``` ``` [1] 0.90 0.81 0.30 0.39 -0.05 -0.13 -0.53 -0.32 -0.61 -0.76 ``` --- ## Least squares estimators manually in R ```r beta_hat_1 <- sum((x_deviation) * (y_deviation))/sum((x_deviation)^2) beta_hat_1 ``` ``` [1] -0.7257 ``` ```r beta_hat_0 <- y_bar - beta_hat_1 * x_bar beta_hat_0 ``` ``` [1] 6.984 ``` ```r # Compare to base R regression lm(formula = pH ~ log_time, data = meat) %>% broom::tidy() %>% kable(digits = 4) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 6.9836 </td> <td style="text-align:right;"> 0.0485 </td> <td style="text-align:right;"> 143.90 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> log_time </td> <td style="text-align:right;"> -0.7257 </td> <td style="text-align:right;"> 0.0344 </td> <td style="text-align:right;"> -21.08 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> --- ## For standard error, first estimate of σ from residuals .vertical-center[ .large[ `\begin{eqnarray*} \hat{\sigma} & = & \sqrt{\frac{\textrm{Sum of all squared residuals}}{\textrm{Degrees of freedom}} } \end{eqnarray*}` - Where d.f. = (Number of observations) - (Number of parameters in the model for the means) ] ] --- ## Recall residuals are `\(Y_i\)` - `\(\hat{Y_i}\)` .center[ <img src="week08_02_files/figure-html/unnamed-chunk-22-1.png" width="792" style="display: block; margin: auto;" /> ] --- ## Sampling distributions for OLS .center[ <img src="images/ss_display_7_7.png" width="70%" style="display: block; margin: auto;" /> ] .footnote[Source: *Statistical Sleuth*, 3e, Display 7.7] --- ## Estimation of σ from Residuals in R ```r y_hat <- beta_hat_0 + beta_hat_1 * meat$log_time y_hat ``` ``` [1] 6.984 6.984 6.481 6.481 5.978 5.978 5.683 5.683 5.475 5.475 ``` ```r residuals <- meat$pH - y_hat residuals ``` ``` [1] 0.03637 -0.05363 -0.06064 0.02936 0.09235 0.01235 -0.09342 0.11658 [9] 0.03534 -0.11466 ``` ```r sum_squared_residuals <- sum(residuals^2) sum_squared_residuals ``` ``` [1] 0.05413 ``` --- ## Estimation of σ from Residuals in R ```r n <- nrow(meat) n ``` ``` [1] 10 ``` ```r df <- n - 2 df ``` ``` [1] 8 ``` ```r sigma_hat <- sqrt(sum_squared_residuals / df) sigma_hat ``` ``` [1] 0.08226 ``` --- ## Standard errors for OLS .center[ <img src="images/ss_display_7_7_formulas.png" width="100%" style="display: block; margin: auto;" /> ] .footnote[Source: *Statistical Sleuth*, 3e, Section 7.3.5] --- ## Estimation of standard errors manually in R ```r s_x <- sd(meat$log_time) s_x ``` ``` [1] 0.7965 ``` ```r s_x2 <- s_x^2 s_x2 ``` ``` [1] 0.6344 ``` ```r # s_x2 is variance var(meat$log_time) ``` ``` [1] 0.6344 ``` ```r se_beta_hat_1 <- sigma_hat * sqrt(1 / (( n - 1) * ( s_x2 ))) se_beta_hat_1 ``` ``` [1] 0.03443 ``` --- ## Estimation of standard errors manually in R ```r s_x <- sd(meat$log_time) s_x ``` ``` [1] 0.7965 ``` ```r s_x2 <- s_x^2 s_x2 ``` ``` [1] 0.6344 ``` ```r # s_x2 is variance var(meat$log_time) ``` ``` [1] 0.6344 ``` ```r se_beta_hat_0 <- sigma_hat * sqrt(1 / n + (x_bar^2 / (( n - 1) * ( s_x2 ))) ) se_beta_hat_0 ``` ``` [1] 0.04853 ``` --- ## Compare manual se calculations to `lm` se ```r se_beta_hat_0 ``` ``` [1] 0.04853 ``` ```r se_beta_hat_1 ``` ``` [1] 0.03443 ``` ```r # Compare to base R regression lm(formula = pH ~ log_time, data = meat) %>% broom::tidy() %>% kable(digits = 4) ``` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 6.9836 </td> <td style="text-align:right;"> 0.0485 </td> <td style="text-align:right;"> 143.90 </td> <td style="text-align:right;"> 0 </td> </tr> <tr> <td style="text-align:left;"> log_time </td> <td style="text-align:right;"> -0.7257 </td> <td style="text-align:right;"> 0.0344 </td> <td style="text-align:right;"> -21.08 </td> <td style="text-align:right;"> 0 </td> </tr> </tbody> </table> --- class: middle, center # Geometry of Regression --- ## Geometry of `\(\hat{\beta}_1\)` with `palmerpenguins` - Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network. ```r library(palmerpenguins) penguins <- palmerpenguins::penguins head(penguins) ``` ``` # A tibble: 6 × 8 species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g sex <fct> <fct> <dbl> <dbl> <int> <int> <fct> 1 Adelie Torge… 39.1 18.7 181 3750 male 2 Adelie Torge… 39.5 17.4 186 3800 fema… 3 Adelie Torge… 40.3 18 195 3250 fema… 4 Adelie Torge… NA NA NA NA <NA> 5 Adelie Torge… 36.7 19.3 193 3450 fema… 6 Adelie Torge… 39.3 20.6 190 3650 male # … with 1 more variable: year <int> ``` .footnote[Source: https://allisonhorst.github.io/palmerpenguins/index.html] --- ## Who are `palmerpenguins`? <img src="images/penguins.png" width="800" style="display: block; margin: auto;" /> --- ## What are these measurements? <img src="images/culmen_depth.png" width="1728" style="display: block; margin: auto;" /> --- ## Subsetting for Adelie ```r # subset data penguins_adelie <- penguins %>% drop_na() %>% filter(species == "Adelie") dim(penguins_adelie) ``` ``` [1] 146 8 ``` --- ## Visualizing bill length vs bill depth <img src="week08_02_files/figure-html/unnamed-chunk-34-1.png" width="792" style="display: block; margin: auto;" /> --- ## Modeling bill length vs bill depth ```r # bill length vs bill depth fit <- lm(formula = bill_length_mm ~ bill_depth_mm, data = penguins_adelie) summary(fit) ``` ``` Call: lm(formula = bill_length_mm ~ bill_depth_mm, data = penguins_adelie) Residuals: Min 1Q Median 3Q Max -6.543 -1.837 0.016 1.718 6.510 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 23.367 3.087 7.57 4.1e-12 *** bill_depth_mm 0.842 0.168 5.02 1.5e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.46 on 144 degrees of freedom Multiple R-squared: 0.149, Adjusted R-squared: 0.143 F-statistic: 25.2 on 1 and 144 DF, p-value: 1.51e-06 ``` --- ## Modeling bill length vs bill depth ```r penguins_adelie$predicted <- predict(fit) # Save the predicted values penguins_adelie$residuals <- residuals(fit) # Save the residual values penguins_adelie %>% select(bill_depth_mm, bill_length_mm, predicted, residuals) %>% head() ``` ``` # A tibble: 6 × 4 bill_depth_mm bill_length_mm predicted residuals <dbl> <dbl> <dbl> <dbl> 1 18.7 39.1 39.1 -0.0211 2 17.4 39.5 38.0 1.47 3 18 40.3 38.5 1.77 4 19.3 36.7 39.6 -2.93 5 20.6 39.3 40.7 -1.42 6 17.8 38.9 38.4 0.537 ``` --- ## Visualizing Residuals <img src="week08_02_files/figure-html/unnamed-chunk-37-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing Residuals (ratio of one to one) <img src="week08_02_files/figure-html/unnamed-chunk-39-1.png" width="792" style="display: block; margin: auto;" /> --- ## Recall Least Squares Estimators .vertical-center[ .large[ `\begin{eqnarray*} \hat{\beta}_1 & = & \frac{\sum^{n}_{i=1}(X_i-\bar{X})(Y_i-\bar{Y}) }{\sum^{n}_{i=1} (X_i-\bar{X})^2}, \\ & & \\ \hat{\beta}_0 & = & \bar{Y} - \hat{\beta}_1\bar{X} \end{eqnarray*}` ] ] --- ## Visualizing `\(\hat{\beta}_1\)` Numerator Step-by-Step <img src="week08_02_files/figure-html/unnamed-chunk-40-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Numerator *x*-mean <img src="week08_02_files/figure-html/unnamed-chunk-41-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Numerator *y*-mean <img src="week08_02_files/figure-html/unnamed-chunk-42-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Numerator *x*-difference <img src="week08_02_files/figure-html/unnamed-chunk-43-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Numerator *y*-difference <img src="week08_02_files/figure-html/unnamed-chunk-44-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Numerator `\(x\)`-diff `\(\times y\)`-diff <img src="week08_02_files/figure-html/unnamed-chunk-45-1.png" width="792" style="display: block; margin: auto;" /> --- ## Question: What do the Colors Represent? <img src="week08_02_files/figure-html/unnamed-chunk-46-1.png" width="792" style="display: block; margin: auto;" /> --- ## Question: What Explains Positive vs Negative Slope? <img src="week08_02_files/figure-html/unnamed-chunk-47-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Denominator (ratio of one to one) <img src="week08_02_files/figure-html/unnamed-chunk-48-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Denominator `\(x\)`-mean <img src="week08_02_files/figure-html/unnamed-chunk-49-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Denominator `\(x\)`-difference <img src="week08_02_files/figure-html/unnamed-chunk-50-1.png" width="792" style="display: block; margin: auto;" /> --- ## Visualizing `\(\hat{\beta}_1\)` Denominator ( `\(x\)`-diff) `\(^2\)` <img src="week08_02_files/figure-html/unnamed-chunk-51-1.png" width="792" style="display: block; margin: auto;" /> --- ## Again, Recall Least Squares Estimators .vertical-center[ .large[ `\begin{eqnarray*} \hat{\beta}_1 & = & \frac{\sum^{n}_{i=1}(X_i-\bar{X})(Y_i-\bar{Y}) }{\sum^{n}_{i=1} (X_i-\bar{X})^2}, \\ & & \\ \hat{\beta}_0 & = & \bar{Y} - \hat{\beta}_1\bar{X} \end{eqnarray*}` ] ] --- ## Calculating `\(\hat{\beta}_1\)` Algebraically ```r x_bar <- mean(penguins_adelie$bill_depth_mm) x_bar ``` ``` [1] 18.35 ``` ```r x_diff <- penguins_adelie$bill_depth_mm - x_bar head(x_diff) ``` ``` [1] 0.3527 -0.9473 -0.3473 0.9527 2.2527 -0.5473 ``` ```r y_bar <- mean(penguins_adelie$bill_length_mm) y_bar ``` ``` [1] 38.82 ``` ```r y_diff <- penguins_adelie$bill_length_mm - y_bar head(y_diff) ``` ``` [1] 0.27603 0.67603 1.47603 -2.12397 0.47603 0.07603 ``` --- ## Calculating `\(\hat{\beta}_1\)` Algebraically ```r x_diff_y_diff <- x_diff * y_diff head(x_diff_y_diff) ``` ``` [1] 0.09737 -0.64037 -0.51257 -2.02359 1.07237 -0.04161 ``` ```r # numerator sum(x_diff_y_diff) ``` ``` [1] 181.6 ``` ```r # denominator sum(x_diff^2) ``` ``` [1] 215.6 ``` ```r beta_hat_1 <- sum(x_diff_y_diff) / sum(x_diff^2) beta_hat_1 ``` ``` [1] 0.8425 ``` --- ## Calculating `\(\hat{\beta}_0\)` Algebraically ```r beta_hat_0 <- y_bar - beta_hat_1 * x_bar beta_hat_0 ``` ``` [1] 23.37 ``` --- ## Calculating Coefficients with `lm()` ```r lm(formula = bill_length_mm ~ bill_depth_mm, data = penguins_adelie) %>% summary() ``` ``` Call: lm(formula = bill_length_mm ~ bill_depth_mm, data = penguins_adelie) Residuals: Min 1Q Median 3Q Max -6.543 -1.837 0.016 1.718 6.510 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 23.367 3.087 7.57 4.1e-12 *** bill_depth_mm 0.842 0.168 5.02 1.5e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.46 on 144 degrees of freedom Multiple R-squared: 0.149, Adjusted R-squared: 0.143 F-statistic: 25.2 on 1 and 144 DF, p-value: 1.51e-06 ``` --- ## Recall "Rise over Run" <br><br><br><br><br> .large[ - Rise: Numerator is `\(\sum_i (x_i - \bar{x}) \times (y_i - \bar{y})\)` - Run: Denominator is `\(\sum_i (x_i - \bar{x})^2\)` ] --- ## Let's play .vertical-center[ .large[ * Short: http://bit.ly/346ols * Long: http://setosa.io/ev/ordinary-least-squares-regression/ ] ] --- class: center, middle # Questions?