class: center, middle, inverse, title-slide # POL90: Statistics ## Multiple Regression: Quadratic Terms ### Prof. Wasow, PoliticsPomona College ### 2022-03-29 --- <style type="text/css"> .regression10 table { font-size: 10px; } .regression12 table { font-size: 12px; } .regression14 table { font-size: 14px; } </style> # Announcements .large[ * Assignments + PS07 + Report 2 - Report 2 - Explanatory variable should be plausibly manipulable - *Statistical Sleuth* - This week: Chapters 9 & 10 - http://appliedstats.org/chapter9.html - http://appliedstats.org/chapter10.html ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 21 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Null hypothesis, R-squared </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 23 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Multiple regression </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 11 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 28 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mon </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Interaction terms </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 30 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 4 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 6 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 11 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Missing data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 13 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Missing data </td> <td style="text-align:right;"> Handout </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS07 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 11 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 1 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS08 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 12 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 8 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Report2 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 15 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS09 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 22 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS10 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 29 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report3 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- class: middle, center # Report 1: Did Sandy Hook # Influcence Attitudes about # Gun Control? --- ## Losielle, Cappella & Gleitz (2022) <img src="images/Losielle_Cappella_Gleitz_1.png" width="80%" style="display: block; margin: auto;" /> --- ## Losielle, Cappella & Gleitz (2022) <img src="images/Losielle_Cappella_Gleitz_2.png" width="428" style="display: block; margin: auto;" /> --- ## Losielle, Cappella & Gleitz (2022) <img src="images/Losielle_Cappella_Gleitz_3.png" width="866" style="display: block; margin: auto;" /> --- ## Losielle, Cappella & Gleitz (2022) <img src="images/Losielle_Cappella_Gleitz_4.png" width="851" style="display: block; margin: auto;" /> --- ## Rosencrans, Unrath, Henriquez & Bhalla (2022) <img src="images/Rosencrans_Unrath_Henriquez_Bhalla_1.png" width="858" style="display: block; margin: auto;" /> --- ## Rosencrans, Unrath, Henriquez & Bhalla (2022) <img src="images/Rosencrans_Unrath_Henriquez_Bhalla_2.png" width="763" style="display: block; margin: auto;" /> --- ## Rosencrans, Unrath, Henriquez & Bhalla (2022) <img src="images/Rosencrans_Unrath_Henriquez_Bhalla_3.png" width="497" style="display: block; margin: auto;" /> --- ## Rosencrans, Unrath, Henriquez & Bhalla (2022) <img src="images/Rosencrans_Unrath_Henriquez_Bhalla_4.png" width="843" style="display: block; margin: auto;" /> --- ## Rosencrans, Unrath, Henriquez & Bhalla (2022) <img src="images/Rosencrans_Unrath_Henriquez_Bhalla_5.png" width="860" style="display: block; margin: auto;" /> --- class: middle, center # Simple Regression Review --- ## Regression Review: Intercept .vertical-center[ .large[ * Most basic regression * One variable: one `\(y\)`, no `\(x\)` term in regression * `lm(y ~ 1, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- ## Regression Review: Intercept + Dummy .vertical-center[ .large[ * Simple regression * Two variables: one `\(y\)`, one binary `\(x\)` * `lm(y ~ x_binary, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- ## Regression Review: Intercept + Slope .vertical-center[ .large[ * Simple regression * Two variables: one `\(y\)`, one *continuous* `\(x\)` * `lm(y ~ x_continuous, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- ## Regression Review: Intercept + Slope + Dummy .vertical-center[ .large[ * Simple regression * Three variables: one `\(y\)`, one *continuous* `\(x\)` and one *binary* `\(x\)` * `lm(y ~ x_continuous + x_binary, data = some_data)` * Easy to conceptualize * Easy to plot in two-dimensions ] ] --- class: center, middle # Regression with # Quadratic Terms --- ## Regression Overview: Four Components <br><br><br> .large[ 1. Intercept 2. Intercept shifts 3. Slopes 4. <mark>Slope shifts</mark> ] --- class: center, middle # Quadratic Model # Corn Yields --- ## Corn yields vs rainfall .large[ - Data was collected on the corn yield versus rainfall in six U.S. corn-producing states (Iowa, Nebraska, Illinois, Indiana, Missouri, and Ohio), recorded for each year from 1890 to 1927. - Although increasing rainfall is associated with higher mean yields for rainfalls up to 12 inches, increasing rainfall at higher levels is associated with no change or perhaps a decrease in mean yield. - Why might that be? ] --- ## Corn yields vs rainfall <img src="images/ss_display_9_6.png" width="90%" style="display: block; margin: auto;" /> .footnote[Source: *Statistical Sleuth*, Display 9.6] --- ## Multiple Regression: Quadratic term .vertical-center[ .large[ * Multiple regression + Two variables: one `\(y\)`, one `\(x\)` with `\(x\)` + `\(x^2\)` + `lm(y ~ x + I(x*x), data = some_data)` + Can still be plotted in two dimensions ] ] --- ## Case: Corn Data ```r corn <- Sleuth3::ex0915 %>% clean_names() head(corn, 15) ``` ``` year yield rainfall 1 1890 24.5 9.6 2 1891 33.7 12.9 3 1892 27.9 9.9 4 1893 27.5 8.7 5 1894 21.7 6.8 6 1895 31.9 12.5 7 1896 36.8 13.0 8 1897 29.9 10.1 9 1898 30.2 10.1 10 1899 32.0 10.1 11 1900 34.0 10.8 12 1901 19.4 7.8 13 1902 36.0 16.2 14 1903 30.2 14.1 15 1904 32.4 10.6 ``` --- ## Visualizing Corn Data ```r ggplot(data = corn) + aes(x = rainfall, y = yield) + geom_point() + geom_smooth(method = "loess") ``` <img src="week10_01_files/figure-html/unnamed-chunk-15-1.png" width="75%" style="display: block; margin: auto;" /> --- ## Regression Table ```r lm1 <-lm(yield ~ rainfall, data = corn) lm2 <-lm(yield ~ rainfall + I(rainfall^2), data = corn) ``` .regression12[ <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="2">yield</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">rainfall</td><td>0.776<sup>**</sup> (0.294)</td><td>6.004<sup>***</sup> (2.039)</td></tr> <tr><td style="text-align:left">I(rainfall2)</td><td></td><td>-0.229<sup>**</sup> (0.089)</td></tr> <tr><td style="text-align:left">Constant</td><td>23.550<sup>***</sup> (3.236)</td><td>-5.015 (11.440)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>38</td><td>38</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.162</td><td>0.297</td></tr> <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.139</td><td>0.256</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] --- ## Equation with rainfall squared `\begin{eqnarray*} \mu \{ \operatorname{yield} | \operatorname{rainfall} \} = \beta_{0} + \beta_{1}(\operatorname{rainfall}) + \beta_{2}(\operatorname{rainfall^2}) \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 6.004(\operatorname{rainfall}) -0.229(\operatorname{rainfall^2}) \end{eqnarray*}` -- .large[- Example: `rainfall = 9.6`] `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 6.004(\operatorname{9.6}) -0.229(\operatorname{9.6^2}) \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 57.638 - 21.105 \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = 31.523 \end{eqnarray*}` --- ## Visualizing rainfall = 9.6 ```r plot_model(lm2, type = "pred", terms = "rainfall", show.data = TRUE) + geom_vline(xintercept = 9.6, col = "purple", linetype = "dashed") ``` <img src="week10_01_files/figure-html/unnamed-chunk-19-1.png" width="75%" style="display: block; margin: auto;" /> --- ## Equation with rainfall = 12.9 `\begin{eqnarray*} \mu \{ \operatorname{yield} | \operatorname{rainfall} \} = \beta_{0} + \beta_{1}(\operatorname{rainfall}) + \beta_{2}(\operatorname{rainfall^2}) \end{eqnarray*}` `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 6.004(\operatorname{rainfall}) -0.229(\operatorname{rainfall^2}) \end{eqnarray*}` -- .large[- Example: `rainfall = 12.9`] `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 6.004(\operatorname{12.9}) -0.229(\operatorname{12.9^2}) \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 77.452 - 38.108 \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = 34.329 \end{eqnarray*}` --- ## Visualizing rainfall = 12.9 ```r plot_model(lm2, type = "pred", terms = "rainfall", show.data = TRUE) + geom_vline(xintercept = 12.9, col = "purple", linetype = "dashed") ``` <img src="week10_01_files/figure-html/unnamed-chunk-21-1.png" width="75%" style="display: block; margin: auto;" /> --- ## Equation with rainfall = 16.5 `\begin{eqnarray*} \mu \{ \operatorname{yield} | \operatorname{rainfall} \} = \beta_{0} + \beta_{1}(\operatorname{rainfall}) + \beta_{2}(\operatorname{rainfall^2}) \end{eqnarray*}` `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 6.004(\operatorname{rainfall}) -0.229(\operatorname{rainfall^2}) \end{eqnarray*}` -- .large[- Example: `rainfall = 16.5`] `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 6.004(\operatorname{16.5}) -0.229(\operatorname{16.5^2}) \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = -5.015 + 99.066 - 62.345 \end{eqnarray*}` -- `\begin{eqnarray*} \mu \{\operatorname{yield} | \operatorname{rainfall}\} = 31.706 \end{eqnarray*}` --- ## Visualizing rainfall = 16.5 ```r plot_model(lm2, type = "pred", terms = "rainfall", show.data = TRUE) + geom_vline(xintercept = 16.5, col = "purple", linetype = "dashed") ``` <img src="week10_01_files/figure-html/unnamed-chunk-23-1.png" width="75%" style="display: block; margin: auto;" /> --- ## What do we mean by “slope shift”? ```r # set up some plausible rainfall values rainfall_values <- 7:16 rainfall_values ``` ``` [1] 7 8 9 10 11 12 13 14 15 16 ``` ```r term1 <- 6.004 * rainfall_values term1 ``` ``` [1] 42.03 48.03 54.04 60.04 66.04 72.05 78.05 84.06 90.06 96.06 ``` ```r term2 <- 0.229 * rainfall_values^2 term2 ``` ``` [1] 11.22 14.66 18.55 22.90 27.71 32.98 38.70 44.88 51.52 58.62 ``` ```r -5.015 + term1 - term2 ``` ``` [1] 25.79 28.36 30.47 32.12 33.32 34.06 34.34 34.16 33.52 32.42 ``` --- ## `interactions` package .panelset[ .panel[.panel-name[Code] ```r lm3 <- lm(yield ~ rainfall + I(rainfall^2) + year, data = corn) interactions::interact_plot( lm3, # pick a model to plot pred = "rainfall", # this variable will be your x-axis modx = "year", # a moderator, i.e. a control we think is important plot.points = TRUE # plot the data points ) ``` ] .panel[.panel-name[Plot] <img src="week10_01_files/figure-html/unnamed-chunk-25-1.png" width="65%" style="display: block; margin: auto;" /> ] ] --- ## When to include quadratic terms? - As with interaction terms, quadratic terms should not routinely be included. - Consider in four situations: - When the analyst has good reason to suspect that the response is nonlinear in some explanatory variable (through knowledge of the process or by graphical examination) - When the question of interest calls for finding the values that maximize or minimize the mean response; - When careful modeling of the regression is called for by the questions of interest (and presumably this is only the case if there are just a few explanatory variables); - Or when inclusion is used to produce a rich model for assessing the fit of an inferential model. .footnote[*Statistical Sleuth* 3e, 10.4.4, p 295] --- class: center, middle # Questions?