class: center, middle, inverse, title-slide # POL90: Statistics ## Multiple Regression: Interaction Terms ### Prof. Wasow, PoliticsPomona College ### 2022-03-30 --- <style type="text/css"> .regression10 table { font-size: 10px; } .regression12 table { font-size: 12px; } .regression14 table { font-size: 14px; } </style> # Announcements .large[ * Assignments + PS08 + Report 2 - Report 2 - Doesn't have to be feeling thermometer - Should be "approximately continuous" - *Statistical Sleuth* - This week: Chapters 9 & 10 - http://appliedstats.org/chapter9.html - http://appliedstats.org/chapter10.html ] --- class: black-bg background-image: url("images/WiDS Poster-2022.png") background-size: contain --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 21 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Null hypothesis, R-squared </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 23 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Multiple regression </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 28 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 11 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 30 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Interaction terms </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 4 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 6 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 11 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Missing data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 13 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Missing data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 18 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Matching </td> <td style="text-align:right;"> Handout </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS07 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 11 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 1 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS08 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 12 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 8 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Report2 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 15 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS09 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 22 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS10 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 29 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report3 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- class: middle, center # Report 1 --- class: black-bg background-image: url("images/bernie_assumptions_cropped.jpg") background-size: contain --- class: black-bg background-image: url("images/drake_null.jpg") background-size: contain --- class: black-bg background-image: url("images/error_message_r.jpg") background-size: contain --- class: black-bg background-image: url("images/headaches_r.jpg") background-size: contain --- class: black-bg background-image: url("images/boyfriend_x.jpg") background-size: contain --- class: black-bg background-image: url("images/stats_memes_tweet.png") background-size: contain .footnote[https://twitter.com/v_matzek/status/1376959852367372288] --- class: black-bg background-image: url("images/memer_tweet.png") background-size: contain .footnote[https://twitter.com/chendaniely/status/1377013685856636928] --- class: middle, center # Revisiting Corn Data --- ## Case: Corn Data ```r corn <- Sleuth3::ex0915 %>% clean_names() head(corn, 15) ``` ``` year yield rainfall 1 1890 24.5 9.6 2 1891 33.7 12.9 3 1892 27.9 9.9 4 1893 27.5 8.7 5 1894 21.7 6.8 6 1895 31.9 12.5 7 1896 36.8 13.0 8 1897 29.9 10.1 9 1898 30.2 10.1 10 1899 32.0 10.1 11 1900 34.0 10.8 12 1901 19.4 7.8 13 1902 36.0 16.2 14 1903 30.2 14.1 15 1904 32.4 10.6 ``` --- ## Corn Data with Added Quadratic Term ```r corn <- corn %>% mutate(rainfall_sqr = rainfall^2) head(corn, 15) ``` ``` year yield rainfall rainfall_sqr 1 1890 24.5 9.6 92.16 2 1891 33.7 12.9 166.41 3 1892 27.9 9.9 98.01 4 1893 27.5 8.7 75.69 5 1894 21.7 6.8 46.24 6 1895 31.9 12.5 156.25 7 1896 36.8 13.0 169.00 8 1897 29.9 10.1 102.01 9 1898 30.2 10.1 102.01 10 1899 32.0 10.1 102.01 11 1900 34.0 10.8 116.64 12 1901 19.4 7.8 60.84 13 1902 36.0 16.2 262.44 14 1903 30.2 14.1 198.81 15 1904 32.4 10.6 112.36 ``` --- ## Is `I(rainfall^2)` Same as `rainfall_sqr`? - `I()` means "isolate", like order of operations ```r lm(yield ~ rainfall + I(rainfall^2), data = corn) %>% summary() ``` ``` Call: lm(formula = yield ~ rainfall + I(rainfall^2), data = corn) Residuals: Min 1Q Median 3Q Max -8.464 -2.324 -0.127 3.515 7.160 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.0147 11.4416 -0.44 0.6639 rainfall 6.0043 2.0389 2.94 0.0057 ** I(rainfall^2) -0.2294 0.0886 -2.59 0.0140 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.76 on 35 degrees of freedom Multiple R-squared: 0.297, Adjusted R-squared: 0.256 F-statistic: 7.38 on 2 and 35 DF, p-value: 0.00211 ``` --- ## Is `I(rainfall^2)` Same as `rainfall_sqr`? ```r lm(yield ~ rainfall + rainfall_sqr, data = corn) %>% summary() ``` ``` Call: lm(formula = yield ~ rainfall + rainfall_sqr, data = corn) Residuals: Min 1Q Median 3Q Max -8.464 -2.324 -0.127 3.515 7.160 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.0147 11.4416 -0.44 0.6639 rainfall 6.0043 2.0389 2.94 0.0057 ** rainfall_sqr -0.2294 0.0886 -2.59 0.0140 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.76 on 35 degrees of freedom Multiple R-squared: 0.297, Adjusted R-squared: 0.256 F-statistic: 7.38 on 2 and 35 DF, p-value: 0.00211 ``` --- class: center, middle # Regression with # Interaction terms --- ## Multiple Regression: Interaction terms .large[ + Three variables: one `\(y\)`, with `\(x_1\)` + `\(x_2\)` + `\(x_1 \times x_2\)` + Usually `\(x_2\)` is a dummy or simple categorical (e.g., few levels) + `lm(y ~ x1 + x2 + x1 * x2, data = some_data)` + Shifts intercept AND slope, with binary or categorical *x* can still be plotted in two dimensions ] --- ## Meadowfoam: Equal lines, Parallel lines, Separate lines <img src="images/ss_display_9_8.png" width="50%" style="display: block; margin: auto;" /> .footnote[*Statistical Sleuth*, Display 9.8] --- ## Meadowfoam: Equal lines, Parallel lines, Separate lines <img src="images/ss_display_9_8_highlight.png" width="50%" style="display: block; margin: auto;" /> .footnote[*Statistical Sleuth*, Display 9.8] --- class: center, middle # Interactions with Albuquerque # Real Estate Data --- ## Example using Albuquerque real estate data ```r # read in Albuquerque data alb_real_estate <- read.table(here("data", "alb.dat.txt"), sep = "\t", header = TRUE) head(alb_real_estate, 3) ``` ``` price sqft age feats ne cust cor tax 1 2050 2650 13 7 1 1 0 1639 2 2080 2600 NA 4 1 1 0 1088 3 2150 2664 6 5 1 1 0 1193 ``` ```r # Transform ne dummy and select three columns alb <- alb_real_estate %>% * mutate(ne_fct = ifelse(ne == 1, "yes", "no") %>% as.factor()) %>% rename(ne_bin = ne) %>% select(price, sqft, ne_bin, ne_fct) head(alb, 3) ``` ``` price sqft ne_bin ne_fct 1 2050 2650 1 yes 2 2080 2600 1 yes 3 2150 2664 1 yes ``` --- ## Visualize Albuquerque real estate data .left-code[ ```r ggplot(data = alb) + aes(x = sqft, y = price) + geom_point() ``` ] .right-plot[ <img src="week10_02_files/figure-html/alb_plot-1.png" width="432" style="display: block; margin: auto;" /> ] --- ## Albuquerque model: Intercept ```r # model model_mean <- lm(price ~ 1, data = alb) summary(model_mean) ``` ``` Call: lm(formula = price ~ 1, data = alb) Residuals: Min 1Q Median 3Q Max -687 -336 -94 322 946 Coefficients: Estimate Std. Error t value Pr(>|t|) *(Intercept) 1203.5 32.3 37.2 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 429 on 175 degrees of freedom ``` --- ## Albuquerque baseline model: Intercept .left-code[ ```r mean_price <- mean(alb$price) # plot alb %>% ggplot() + aes(x = sqft, y = price) + geom_point() + geom_hline( yintercept = mean_price, linetype = 2 ) ``` ] .right-plot[ <img src="week10_02_files/figure-html/unnamed-chunk-14-1.png" width="432" style="display: block; margin: auto;" /> ] --- ## Visualizing residuals with model = mean <img src="week10_02_files/figure-html/unnamed-chunk-16-1.png" width="576" style="display: block; margin: auto;" /> --- ## Albuquerque model 1: Intercept + Slope ```r # model lm1 <- lm(price ~ sqft, data = alb) summary(lm1) ``` ``` Call: lm(formula = price ~ sqft, data = alb) Residuals: Min 1Q Median 3Q Max -453.2 -108.4 15.4 79.7 815.0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 134.4380 43.1611 3.11 0.0022 ** *sqft 0.5437 0.0207 26.30 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 193 on 174 degrees of freedom Multiple R-squared: 0.799, Adjusted R-squared: 0.798 F-statistic: 692 on 1 and 174 DF, p-value: <2e-16 ``` --- ## Albuquerque model 1: Intercept + Slope .left-code[ ```r # plot *sjPlot::plot_model( model = lm1, type = "pred", terms = "sqft", show.data = TRUE ) ``` ] .right-plot[ <img src="week10_02_files/figure-html/alb_plot_m1-1.png" width="432" style="display: block; margin: auto;" /> ] --- ## Visualizing residuals in “Equal lines” model <img src="week10_02_files/figure-html/unnamed-chunk-19-1.png" width="576" style="display: block; margin: auto;" /> --- ## Albuquerque model 2: Intercept + Slope + Dummy ```r # model lm2 <- lm(price ~ sqft + ne_fct, data = alb) summary(lm2) ``` ``` Call: lm(formula = price ~ sqft + ne_fct, data = alb) Residuals: Min 1Q Median 3Q Max -381.0 -111.1 -13.8 99.7 898.2 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 64.8945 43.2621 1.50 0.14 sqft 0.5373 0.0195 27.49 < 2e-16 *** *ne_fctyes 134.0359 28.2105 4.75 4.2e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 182 on 173 degrees of freedom Multiple R-squared: 0.822, Adjusted R-squared: 0.82 F-statistic: 400 on 2 and 173 DF, p-value: <2e-16 ``` --- ## Albuquerque model 2: Parallel lines .left-code[ ```r # plot sjPlot::plot_model( model = lm2, type = "pred", terms = c("sqft", "ne_fct"), show.data = TRUE ) ``` ] .right-plot[ <img src="week10_02_files/figure-html/alb_plot_m2-1.png" width="432" style="display: block; margin: auto;" /> ] --- ## Visualizing residuals with model = Parallel lines <img src="week10_02_files/figure-html/unnamed-chunk-22-1.png" width="576" style="display: block; margin: auto;" /> --- ## Model 3: Interaction or "Separate Lines" ```r alb <- alb %>% mutate( sqft_ne_int = sqft * ne_bin ) head(alb, 3) ``` ``` price sqft ne_bin ne_fct residuals_mean sqft_ne_int 1 2050 2650 1 yes 427.3 2650 2 2080 2600 1 yes 484.1 2600 3 2150 2664 1 yes 519.8 2664 ``` ```r tail(alb, 3) ``` ``` price sqft ne_bin ne_fct residuals_mean sqft_ne_int 174 2134 3431 1 yes 91.17 3431 175 1737 2878 1 yes -8.56 2878 176 1727 2607 1 yes 127.32 2607 ``` --- ## Model 3: Intercept + Slope + Intercept shift + Slope shift ```r # interaction model with new column lm3 <- lm(price ~ sqft + ne_fct + sqft_ne_int, data = alb) summary(lm3) ``` ``` Call: lm(formula = price ~ sqft + ne_fct + sqft_ne_int, data = alb) Residuals: Min 1Q Median 3Q Max -433.2 -94.3 -5.1 65.6 923.6 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 293.1036 58.0879 5.05 1.1e-06 *** sqft 0.4175 0.0286 14.62 < 2e-16 *** ne_fctyes -255.9345 76.4422 -3.35 0.001 *** sqft_ne_int 0.2005 0.0369 5.43 1.9e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 168 on 172 degrees of freedom Multiple R-squared: 0.848, Adjusted R-squared: 0.846 F-statistic: 320 on 3 and 172 DF, p-value: <2e-16 ``` --- ## Model 3: Intercept + Slope + Intercept shift + Slope shift ```r # interaction model coded in formula lm3 <- lm(price ~ sqft + ne_fct + sqft * ne_fct, data = alb) summary(lm3) ``` ``` Call: lm(formula = price ~ sqft + ne_fct + sqft * ne_fct, data = alb) Residuals: Min 1Q Median 3Q Max -433.2 -94.3 -5.1 65.6 923.6 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 293.1036 58.0879 5.05 1.1e-06 *** sqft 0.4175 0.0286 14.62 < 2e-16 *** ne_fctyes -255.9345 76.4422 -3.35 0.001 *** sqft:ne_fctyes 0.2005 0.0369 5.43 1.9e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 168 on 172 degrees of freedom Multiple R-squared: 0.848, Adjusted R-squared: 0.846 F-statistic: 320 on 3 and 172 DF, p-value: <2e-16 ``` --- ## Albuquerque model 3: Separate lines .left-code[ ```r # plot sjPlot::plot_model( model = lm3, type = "pred", terms = c("sqft", "ne_fct"), show.data = TRUE ) ``` ] .right-plot[ <img src="week10_02_files/figure-html/alb_plot_m3-1.png" width="432" style="display: block; margin: auto;" /> ] --- ## Visualizing residuals with model = Separate lines <img src="week10_02_files/figure-html/unnamed-chunk-28-1.png" width="576" style="display: block; margin: auto;" /> --- ## Table of regression results for three models .regression12[ ```r stargazer(lm1, lm2, lm3, digits = 2, single.row = TRUE, header = FALSE, type = 'html', omit.stat = c("f", "adj.rsq", "ser")) ``` <table style="text-align:center"><tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="3" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td colspan="3">price</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td></tr> <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">sqft</td><td>0.54<sup>***</sup> (0.02)</td><td>0.54<sup>***</sup> (0.02)</td><td>0.42<sup>***</sup> (0.03)</td></tr> <tr><td style="text-align:left">ne_fctyes</td><td></td><td>134.00<sup>***</sup> (28.21)</td><td>-255.90<sup>***</sup> (76.44)</td></tr> <tr><td style="text-align:left">sqft:ne_fctyes</td><td></td><td></td><td>0.20<sup>***</sup> (0.04)</td></tr> <tr><td style="text-align:left">Constant</td><td>134.40<sup>***</sup> (43.16)</td><td>64.89 (43.26)</td><td>293.10<sup>***</sup> (58.09)</td></tr> <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>176</td><td>176</td><td>176</td></tr> <tr><td style="text-align:left">R<sup>2</sup></td><td>0.80</td><td>0.82</td><td>0.85</td></tr> <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="3" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> ] --- ## Equations for interaction model `\begin{eqnarray*} \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} + \beta_2 \cdot \textrm{ne_fct}_{yes} + \beta_3 \cdot \textrm{sqft} \cdot \textrm{ne_fct}_{yes} \end{eqnarray*}` -- .large[- when `ne_fctyes` = 0] `\begin{eqnarray*} \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} + \beta_2 \cdot 0 + \beta_3 \cdot \textrm{sqft} \cdot 0 \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} \end{eqnarray*}` -- .large[- when `ne_fctyes` = 1] `\begin{eqnarray*} \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} + \beta_2 \cdot \textrm{ne_fct}_{yes} + \beta_3 \cdot \textrm{sqft} \cdot \textrm{ne_fct}_{yes} \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} + \beta_2 \cdot 1 + \beta_3 \cdot \textrm{sqft} \cdot 1 \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} + \beta_2 + \beta_3 \cdot \textrm{sqft} \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & (\beta_0 + \beta_2) + (\beta_1 + \beta_3) \cdot \textrm{sqft} \end{eqnarray*}` --- ## Equations for interaction model `\begin{eqnarray*} \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} + \beta_2 \cdot \textrm{ne_fct}_{yes} + \beta_3 \cdot \textrm{sqft} \cdot \textrm{ne_fct}_{yes} \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & 256 + 0.47 \cdot \textrm{sqft} - 316 \cdot \textrm{ne_fct}_{yes} \\ & & + 0.22 \cdot \textrm{sqft} \cdot \textrm{ne_fct}_{yes} \end{eqnarray*}` -- .large[- when `ne_fctyes` = 0] `\begin{eqnarray*} \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & 256 + 0.47 \cdot \textrm{sqft} - 316 \cdot 0 + 0.22 \cdot \textrm{sqft} \cdot 0 \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & 256 + 0.47 \cdot \textrm{sqft} \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & \beta_0 + \beta_1 \cdot \textrm{sqft} \end{eqnarray*}` -- .large[- when `ne_fctyes` = 1] `\begin{eqnarray*} \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & 256 + 0.47 \cdot \textrm{sqft} - 316 \cdot 1 + 0.22 \cdot \textrm{sqft} \cdot 1 \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & (256 - 316) + (0.22 + 0.47) \cdot \textrm{sqft} \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & -60 + 0.69 \cdot \textrm{sqft} \\ \mu \{ \textrm{price} | \textrm{sqft}, \textrm{ne_fct} \} & = & (\beta_0 + \beta_2) + (\beta_1 + \beta_3) \cdot \textrm{sqft} \end{eqnarray*}` --- ## Revisit plot & visualize regression results .left-code[ ```r # plot sjPlot::plot_model( model = lm3, type = "pred", terms = c("sqft", "ne_fct"), show.data = TRUE ) ``` ] .right-plot[ <img src="week10_02_files/figure-html/unnamed-chunk-31-1.png" width="432" style="display: block; margin: auto;" /> ] --- ## Testing for an Interaction .vertical-center[ * Can answer the question by a hypothesis test on the coefficient to the interaction term or through ANOVA. For example: `\begin{align*} \textrm{H}_0: \mu & = \beta_0 + \beta_1x_1 + \beta_2x_2 \\ \textrm{H}_1: \mu & = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_3 \end{align*}` * Where `\(x_3\)` is the interaction of `\((x_1\cdot x_2)\)` * The `\(p\)`-value for the interaction term in the Albuquerque data is 0.0004, so we can reject the null hypothesis of no interaction effect * Thus, there is a significant interaction between area of a home and its location in Albuquerque on home price (*p* < 0.001) ] --- ## When to include interaction terms? .large[ * General strategy: Based on knowledge of the study, the researcher can decide which variables to consider for interactions, and test for their significance * If the interaction term `\(x_1\cdot x_2\)` is in the regression model, it usually does not make sense to test for the effect of `\(x_1\)` or `\(x_2\)` alone * The only difference in fitting a single model with an interaction between a binary and quantitative variable and fitting separate models according to the binary variable is that `\(\sigma^2\)` is assumed the same for the model with interactions ] --- ## When to include interaction terms? .vertical-center[ .large[ * Both Albuquerque and meadowfoam examples of an interaction between a binary (categorical) variable and a numeric variable * Other types of predictor interactions: * Factor with > 2 levels and a numeric variable * Two factors ] ] --- ## Three cases to include interaction terms .vertical-center[ .large[ * When a question of interest pertains to interaction (as in Albuquerque) * When good reason exists to suspect interaction * When interactions are proposed as a more general model for the purpose of examining the goodness of fit of a model without interaction ] ] --- ## Occam's Razor and Models .large[ - The principle of Occam's Razor is that simple models are to be preferred over complicated ones. - Named after the 14th-century English philosopher, William of Occam, this principle has guided scientific research ever since its formulation. - It has no underlying theoretical or logical basis; rather, it is founded in common sense and successful experience. It is often called the Principle of Parsimony. - In statistical applications, the idea translates into a preference for the more simple of two models that fit data equally well. One should seek a parsimonious model that is as simple as possible and yet adequately explains all that can be explained. ] --- class: center, middle # Questions? --- ## Can also run ANOVA: Equal vs Parallel lines - Single conditional mean vs two intercept, single slope conditional mean ```r anova(lm1, lm2) ``` ``` Analysis of Variance Table Model 1: price ~ sqft Model 2: price ~ sqft + ne_fct Res.Df RSS Df Sum of Sq F Pr(>F) 1 174 6463155 2 173 5717129 1 746027 22.6 4.2e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ## Can also run ANOVA: Equal vs Parallel lines - Single conditional mean vs two intercept, two slope conditional mean ```r anova(lm1, lm3) ``` ``` Analysis of Variance Table Model 1: price ~ sqft Model 2: price ~ sqft + ne_fct + sqft * ne_fct Res.Df RSS Df Sum of Sq F Pr(>F) 1 174 6463155 2 172 4880808 2 1582347 27.9 3.3e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- ## Can also run ANOVA: Parallel vs Separate lines - Two intercept, single slope conditional mean vs two intercept, two slope conditional mean ```r anova(lm2, lm3) ``` ``` Analysis of Variance Table Model 1: price ~ sqft + ne_fct Model 2: price ~ sqft + ne_fct + sqft * ne_fct Res.Df RSS Df Sum of Sq F Pr(>F) 1 173 5717129 2 172 4880808 1 836320 29.5 1.9e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- class: center, middle # Example: 1966 California Governor Election # and Watts Uprising --- ## 1966 California Governor election poll .large[ - Motivating example: What was the relationship between attitudes about Watts, by party, and support for Governor Brown’s 1966 reelection? - Subjects in an October 1966 survey were asked if they agree or disagree with the following statement ‘Governor Brown was lax in the way he handled the Watts riots?’ - Subjects were also asked, ‘In the election for Governor here in California on Nov. 8th, if you had to vote right now, would you vote for Ronald Reagan, the Republican, or Pat Brown, the Democrat?’ <!-- - The marginal effect of moving from ‘Disagree’ to ‘Agree’ on support for Gov. Brown estimated in a logistic regression model that controlled for race, gender, age, income, party identification, religion and union membership --> ] --- --- ## Vote for Brown vs Opinion on response to Watts <img src="images/harris_1600_zelig_watts-1.png" width="100%" style="display: block; margin: auto;" /> .footnote[Note: Model controls for race, age, religion, education, income, sex, region of California and home ownership. Source: Plot: Wasow (forthcoming); Data: Louis Harris and Associates, Inc. (1966)] --- ## Vote for Brown vs Watts, by party <img src="images/harris_1600_zelig_watts_party-1.png" width="100%" style="display: block; margin: auto;" /> .footnote[Note: Model controls for race, age, religion, education, income, sex, region of California and home ownership. Source: Plot: Wasow (forthcoming); Data: Louis Harris and Associates, Inc. (1966)]