class: center, middle, inverse, title-slide # POL90: Statistics ## Multiple Regression ### Prof. Wasow, PoliticsPomona College ### 2022-03-23 --- <style type="text/css"> .regression10 table { font-size: 10px; } .regression12 table { font-size: 12px; } .regression14 table { font-size: 14px; } </style> # Announcements .large[ * Assignments + PS07 + Report 2 ] -- .large[ * Statistical Sleuth + Read Chapter 9 + Supplement - http://appliedstats.org/chapter9.html ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 9 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Regression by Calculation </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 21 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Null hypothesis, R-squared </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 10 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 23 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Multiple regression </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 28 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 30 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 4 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 6 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 11 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Missing data </td> <td style="text-align:right;"> Handout </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 11 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS06 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 10 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 25 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS07 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Apr 1 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS08 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 12 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 8 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Report2 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 15 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS09 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 22 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS10 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 29 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report3 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- class: middle, center # Report 1: Did Sandy Hook # Influcence Attitudes about # Gun Control? --- ## Charatan, Flores, Kim & Strevey (2022) <img src="images/charatan_flores_kim_strevey_1.png" width="862" style="display: block; margin: auto;" /> --- ## Charatan, Flores, Kim & Strevey (2022) <img src="images/charatan_flores_kim_strevey_2.png" width="661" style="display: block; margin: auto;" /> --- ## Charatan, Flores, Kim & Strevey (2022) <img src="images/charatan_flores_kim_strevey_3.png" width="848" style="display: block; margin: auto;" /> --- ## Charatan, Flores, Kim & Strevey (2022) <img src="images/charatan_flores_kim_strevey_4.png" width="848" style="display: block; margin: auto;" /> --- ## Charatan, Flores, Kim & Strevey (2022) <img src="images/charatan_flores_kim_strevey_5.png" width="50%" style="display: block; margin: auto;" /> --- ## Charatan, Flores, Kim & Strevey (2022) <img src="images/charatan_flores_kim_strevey_6.png" width="848" style="display: block; margin: auto;" /> --- class: center, middle # What is a dummy variable? --- ## Defining dummy variables .large[ - Can be called different names - binary, dummy, dichotomized, logical, indicator - can also be categorical with two values - Typically a 1 / 0 for TRUE / FALSE - Example: consider categorical variable `religion` with multiple denominations such as Christian, Hindu, Muslim, Jewish... - Could also be series of dummy variables coded 1/0 - `rel_christian`, `rel_hindu`, `rel_muslim`, `rel_jewish`, ... - In regression, `R` automatically converts categoricals to dummies ] --- class: center, middle # Regression Overview --- ## Regression Overview: Four Components <br><br><br> .large[ 1. Intercept 2. Intercept shifts 3. Slopes 4. Slope shifts (covered in precept & next week) ] --- ## Meadowfoam data .large[ - Meadowfoam plant oils have important properties that make the cultivation of meadowfoam an important goal. - The researchers in this study studied the impact of two factors that could impact the amount of meadowfoam flowering: - light intensity at six levels of 150, 300, 450, 600, 750, and 900 `\(\mu\)` mol / `\(m^2\)` / sec; - the timing of the onset of the light treatment, either at photoperiodic floral induction (PFI) - the time at which the photo period was increased from 8 to 16 hours per day to induce flowering - or 24 days before PFI. ] --- ## Meadowfoam data: Three variables <br><br> .large[ - Flowers: - average number of flowers per meadowfoam plant - Time: - time light intensity regiments started; 1 = Late, 2 = Early - Intensity: - light intensity (in `\(\mu\)` mol / `\(m^2\)` / sec) ] --- ## Meadowfoam data <img src="images/ss_display_9_1.png" width="75%" style="display: block; margin: auto;" /> --- ## Meadowfoam data ```r meadow <- Sleuth3::case0901 %>% clean_names() meadow <- meadow %>% mutate( time = ifelse(time > 1, "Early", "Late") %>% forcats::fct_relevel("Early")) # check order levels(meadow$time) ``` ``` [1] "Early" "Late" ``` ```r head(meadow, 10) ``` ``` flowers time intensity 1 62.3 Late 150 2 77.4 Late 150 3 55.3 Late 300 4 54.2 Late 300 5 49.6 Late 450 6 61.9 Late 450 7 39.4 Late 600 8 45.7 Late 600 9 31.3 Late 750 10 44.9 Late 750 ``` --- ## Meadowfoam data ```r tail(meadow, 10) ``` ``` flowers time intensity 15 69.1 Early 300 16 78.0 Early 300 17 57.0 Early 450 18 71.1 Early 450 19 62.9 Early 600 20 52.2 Early 600 21 60.3 Early 750 22 45.6 Early 750 23 52.6 Early 900 24 44.4 Early 900 ``` --- ## Visualizing meadowfoam data .left-code[ ```r meadow %>% ggplot() + aes(x = intensity, y = flowers) + geom_point() ``` ] .right-plot[ <img src="week09_02_files/figure-html/meadow_code_plot-1.png" width="90%" style="display: block; margin: auto;" /> ] --- class: middle, center # Simplest Model: # Intercept --- ## Regression overview: Intercept .vertical-center[ .large[ * Most basic regression * One variable: one `\(y\)`, no `\(x\)` * `lm(y ~ 1, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- ## Model: Intercept, No Slope ```r model_mean <- lm(flowers ~ 1, data = meadow) summary(model_mean) ``` ``` Call: lm(formula = flowers ~ 1, data = meadow) Residuals: Min 1Q Median 3Q Max -24.84 -10.71 -1.39 8.31 21.86 *Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 56.1 2.8 20 4.7e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 13.7 on 23 degrees of freedom ``` --- ## Model: Intercept, No Slope <br><br><br> `\begin{eqnarray*} \mu\{Y|X\} & = & \beta_0 + \beta_1(X) \\ \mu\{\operatorname{flowers}\} & = & \beta_{0} \\ \mu\{\operatorname{flowers}\} & = & 56.1 \end{eqnarray*}` --- ## Visualizing: Intercept, No Slope <img src="week09_02_files/figure-html/meadow_code_mean2-1.png" width="90%" style="display: block; margin: auto;" /> --- ## Evaluate flowers ~ `\(\beta_0\)` .regression14[ ```r stargazer(model_mean, single.row = TRUE, header = FALSE, type = 'html', omit.stat = c("ser", "adj.rsq")) ``` <table style="text-align:center"><tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td>flowers</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Constant</td><td>56.140<sup>***</sup> (2.803)</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>24</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.000</td></tr> <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] ```r mean(meadow$flowers) ``` ``` [1] 56.14 ``` --- class: middle, center # Simple Model: # Intercept + Dummy Variable --- ## Regression overview: Intercept + Dummy .vertical-center[ .large[ * Simple regression * One variable: one `\(y\)`, one binary `\(x\)` * `lm(y ~ x_binary, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- ## Evaluate flowers ~ `\(\beta_0\)` + `\(\beta_1\)` `\(\times\)` time ```r model_time <- lm(flowers ~ time, data = meadow) summary(model_time) ``` ``` Call: lm(formula = flowers ~ time, data = meadow) Residuals: Min 1Q Median 3Q Max -18.76 -9.72 -1.19 9.62 27.34 Coefficients: * Estimate Std. Error t value Pr(>|t|) (Intercept) 62.22 3.62 17.21 3e-14 *** timeLate -12.16 5.11 -2.38 0.027 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 12.5 on 22 degrees of freedom Multiple R-squared: 0.204, Adjusted R-squared: 0.168 F-statistic: 5.65 on 1 and 22 DF, p-value: 0.0265 ``` --- ## Equation: Intercept, Dummy, No Slope <br> `\begin{eqnarray*} \mu\{Y|X\} & = & \beta_0 + \beta_1(X) \\ \mu\{\operatorname{flowers} | \operatorname{time}\} & = & \beta_{0} + \beta_{1}(\operatorname{time}_{\operatorname{Late}}) \\ \mu\{\operatorname{flowers} | \operatorname{time}\} & = & 62.22 - 12.16(\operatorname{time}_{\operatorname{Late}}) \end{eqnarray*}` -- - .large[Example: Time Early] `\begin{eqnarray*} \mu\{\operatorname{flowers} | \operatorname{time}\} & = & 62.22 - 12.16(0) \\ \mu\{\operatorname{flowers} | \operatorname{time}\} & = & 62.22 \end{eqnarray*}` -- - .large[Example: Time Late] `\begin{eqnarray*} \mu\{\operatorname{flowers} | \operatorname{time}\} & = & 62.22 - 12.16(1) \\ \mu\{\operatorname{flowers} | \operatorname{time}\} & = & 50.06 \end{eqnarray*}` --- ## R: Intercept, Dummy, No Slope <br><br><br> ```r mean_flowers_early <- mean(meadow$flowers[meadow$time == "Early"]) mean_flowers_early ``` ``` [1] 62.22 ``` ```r mean_flowers_late <- mean(meadow$flowers[meadow$time == "Late"]) mean_flowers_late ``` ``` [1] 50.06 ``` --- ## Intercept + Shift <img src="week09_02_files/figure-html/meadow_code_mean2_1-1.png" width="90%" style="display: block; margin: auto;" /> --- ## Evaluate flowers ~ `\(\beta_0\)` + `\(\beta_1\)` `\(\times\)` time .regression14[ ```r stargazer(model_mean, model_time, single.row = TRUE, header = FALSE, type = 'html', omit.stat = c("ser", "adj.rsq"), digits = 2) ``` <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="2">flowers</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">timeLate</td><td></td><td>-12.16<sup>**</sup> (5.11)</td></tr> <tr><td style="text-align:left">Constant</td><td>56.14<sup>***</sup> (2.80)</td><td>62.22<sup>***</sup> (3.62)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>24</td><td>24</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.00</td><td>0.20</td></tr> <tr><td style="text-align:left">F Statistic</td><td></td><td>5.65<sup>**</sup> (df = 1; 22)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] --- class: middle, center # Simple Model: # Intercept + Slope --- ## Regression overview: Intercept + Slope .vertical-center[ .large[ * Simple regression * One variable: one `\(y\)`, one *continuous* `\(x\)` * `lm(y ~ x_continuous, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- ## Model: Intercept + Slope ```r model_intensity <- lm(flowers ~ intensity, data = meadow) summary(model_intensity) ``` ``` Call: lm(formula = flowers ~ intensity, data = meadow) Residuals: Min 1Q Median 3Q Max -15.731 -7.805 0.019 6.186 13.269 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 77.38500 4.16119 18.60 6.1e-15 *** intensity -0.04047 0.00712 -5.68 1.0e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 8.94 on 22 degrees of freedom Multiple R-squared: 0.595, Adjusted R-squared: 0.576 F-statistic: 32.3 on 1 and 22 DF, p-value: 1.03e-05 ``` --- ## Equation: Intercept + Slope <br> `\begin{eqnarray*} \mu\{Y|X\} & = & \beta_0 + \beta_1(X) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}\} & = & \beta_{0} + \beta_{1}(\operatorname{intensity}) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}\} & = & 77.39 - 0.04(\operatorname{intensity}) \end{eqnarray*}` -- - .large[Example: Intensity = 300] `\begin{eqnarray*} \mu\{\operatorname{flowers} | \operatorname{intensity}\} & = & 77.39 - 0.04(300) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}\} & = & 65.39 \end{eqnarray*}` -- - .large[Example: Intensity = 900] `\begin{eqnarray*} \mu\{\operatorname{flowers} | \operatorname{intensity}\} & = & 77.39 - 0.04(900) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}\} & = & 41.39 \end{eqnarray*}` --- ## R: Intercept + Slope ```r # Base R predict(model_intensity, newdata = data.frame(intensity = 300)) ``` ``` 1 65.24 ``` ```r predict(model_intensity, newdata = data.frame(intensity = 900)) ``` ``` 1 40.96 ``` ```r # ggeffects (part of sjPlot family) ggeffects::ggpredict(model_intensity, terms = "intensity[300, 900]") ``` ``` # Predicted values of flowers intensity | Predicted | 95% CI -------------------------------------- 300 | 65.24 | [60.48, 70.00] 900 | 40.96 | [34.62, 47.30] ``` --- ## Visualzing Internet + Slope <img src="week09_02_files/figure-html/meadow_code_slope2-1.png" width="90%" style="display: block; margin: auto;" /> --- ## Evaluate flowers ~ `\(\beta_0\)` + `\(\beta_1\)` `\(\times\)` intensity .regression14[ ```r stargazer(model_mean, model_time, model_intensity, single.row = TRUE, header = FALSE, type = 'html', omit.stat = c("ser", "adj.rsq"), digits = 2) ``` <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="3" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="3">flowers</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td></tr> <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">timeLate</td><td></td><td>-12.16<sup>**</sup> (5.11)</td><td></td></tr> <tr><td style="text-align:left">intensity</td><td></td><td></td><td>-0.04<sup>***</sup> (0.01)</td></tr> <tr><td style="text-align:left">Constant</td><td>56.14<sup>***</sup> (2.80)</td><td>62.22<sup>***</sup> (3.62)</td><td>77.39<sup>***</sup> (4.16)</td></tr> <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>24</td><td>24</td><td>24</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.00</td><td>0.20</td><td>0.59</td></tr> <tr><td style="text-align:left">F Statistic (df = 1; 22)</td><td></td><td>5.65<sup>**</sup></td><td>32.28<sup>***</sup></td></tr> <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="3" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] --- class: middle, center # Simple Model: # Intercept + Slope + Dummy --- ## Model: Intercept + Slope + Dummy ```r model_intensity_time <- lm(flowers ~ intensity + time, data = meadow) summary(model_intensity_time) ``` ``` Call: lm(formula = flowers ~ intensity + time, data = meadow) Residuals: Min 1Q Median 3Q Max -9.65 -4.14 -1.56 5.63 12.16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 83.46417 3.27377 25.49 < 2e-16 *** intensity -0.04047 0.00513 -7.89 1e-07 *** timeLate -12.15833 2.62956 -4.62 0.00015 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 6.44 on 21 degrees of freedom Multiple R-squared: 0.799, Adjusted R-squared: 0.78 F-statistic: 41.8 on 2 and 21 DF, p-value: 4.79e-08 ``` --- ## Equation: Intercept + Slope + Dummy <br> `\begin{eqnarray*} \mu\{Y|X\} & = & \beta_0 + \beta_1(X) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}, \operatorname{time}_{\operatorname{Late}}\} & = & \beta_{0} + \beta_{1}(\operatorname{intensity}) + \beta_{2}(\operatorname{time}_{\operatorname{Late}}) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}, \operatorname{time}_{\operatorname{Late}}\} & = & 83.46 - 0.04(\operatorname{intensity}) - 12.16\beta_{2}(\operatorname{time}_{\operatorname{Late}}) \end{eqnarray*}` -- - .large[Example: Intensity = 300, Time = Early] `\begin{eqnarray*} \mu\{\operatorname{flowers} | \operatorname{intensity}, \operatorname{time}_{\operatorname{Late}}\} & = & 83.46 - 0.04(300) - 12.16(0) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}, \operatorname{time}_{\operatorname{Late}}\} & = & 71.46 \end{eqnarray*}` -- - .large[Example: Intensity = 300, Time = Late] `\begin{eqnarray*} \mu\{\operatorname{flowers} | \operatorname{intensity}, \operatorname{time}_{\operatorname{Late}}\} & = & 83.46 - 0.04(300) - 12.16(1) \\ \mu\{\operatorname{flowers} | \operatorname{intensity}, \operatorname{time}_{\operatorname{Late}}\} & = & 59.3 \end{eqnarray*}` --- ## Visualzing Internet + Slope + Dummy <img src="week09_02_files/figure-html/meadow_code_slope9-1.png" width="90%" style="display: block; margin: auto;" /> --- ## Evaluate flowers ~ `\(\beta_0\)` + `\(\beta_1\)` `\(\times\)` intensity .regression14[ ```r stargazer(model_mean, model_time, model_intensity, model_intensity_time, single.row = TRUE, header = FALSE, type = 'html', omit.stat = c("ser", "adj.rsq"), digits = 2) ``` <table style="text-align:center"><tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="4"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="4" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="4">flowers</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td><td>(4)</td></tr> <tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">timeLate</td><td></td><td>-12.16<sup>**</sup> (5.11)</td><td></td><td>-12.16<sup>***</sup> (2.63)</td></tr> <tr><td style="text-align:left">intensity</td><td></td><td></td><td>-0.04<sup>***</sup> (0.01)</td><td>-0.04<sup>***</sup> (0.01)</td></tr> <tr><td style="text-align:left">Constant</td><td>56.14<sup>***</sup> (2.80)</td><td>62.22<sup>***</sup> (3.62)</td><td>77.39<sup>***</sup> (4.16)</td><td>83.46<sup>***</sup> (3.27)</td></tr> <tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>24</td><td>24</td><td>24</td><td>24</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.00</td><td>0.20</td><td>0.59</td><td>0.80</td></tr> <tr><td style="text-align:left">F Statistic</td><td></td><td>5.65<sup>**</sup> (df = 1; 22)</td><td>32.28<sup>***</sup> (df = 1; 22)</td><td>41.78<sup>***</sup> (df = 2; 21)</td></tr> <tr><td colspan="5" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="4" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] --- ## Single mean model vs two mean model? - Intercept vs intercept plus dummy ```r anova(model_mean, model_time) ``` ``` Analysis of Variance Table Model 1: flowers ~ 1 Model 2: flowers ~ time Res.Df RSS Df Sum of Sq F Pr(>F) 1 23 4338 2 22 3451 1 887 5.65 0.027 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ## Single mean model vs conditional mean model? - Intercept vs intercept plus slope ```r anova(model_mean, model_intensity) ``` ``` Analysis of Variance Table Model 1: flowers ~ 1 Model 2: flowers ~ intensity Res.Df RSS Df Sum of Sq F Pr(>F) 1 23 4338 2 22 1758 1 2580 32.3 1e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ## Single slope vs Slope plus dummy - Intercept plus slope vs intercept plus slope plus dummy ```r anova(model_intensity, model_intensity_time) ``` ``` Analysis of Variance Table Model 1: flowers ~ intensity Model 2: flowers ~ intensity + time Res.Df RSS Df Sum of Sq F Pr(>F) 1 22 1758 2 21 871 1 887 21.4 0.00015 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ## Multiple Regression: Quadratic term .vertical-center[ .large[ * Multiple regression + Two variables: one `\(y\)`, with `\(x\)` + `\(x^2\)` + `lm(y ~ x + I(x*x), data = some_data)` + Can still be plotted in two dimensions ] ] --- ## Multiple Regression: Interaction terms <br><br> .large[ + Three variables: one `\(y\)`, with `\(x_1\)` + `\(x_2\)` + `\(x_1 \cdot x_2\)` + Usually `\(x_2\)` is a dummy or simple categorical (e.g., few levels) + `lm(y ~ x1 + x2 + I(x1 * x2), data = some_data)` + Shifts intercept and slope, can still be plotted in two dimensions ] --- ## Regression overview .large[ * Multiple regression * Three or more variables: one `\(y\)`, with `\(x_1\)` + `\(x_2\)` + … * `lm(y ~ x1 + x2 + …, data = some_data)` * Allows us to "control" for multiple variables * Our primary concern is still: `y ~ x1` * More on how to plot multiple regression next week ] --- class: center, middle # Questions?