Week | Date | Day | Title | Chapter |
---|---|---|---|---|
9 | Mar 14 | Mon | Spring Recess | - |
9 | Mar 16 | Wed | Spring Recess | - |
10 | Mar 21 | Mon | Null hypothesis, R-squared | 8 |
10 | Mar 23 | Wed | Multiple regression | 8 |
11 | Mar 28 | Mon | Interaction terms | 9 |
11 | Mar 30 | Wed | Interaction terms | 9 |
12 | Apr 4 | Mon | Logistic regression | 20 |
12 | Apr 6 | Wed | Logistic regression | 20 |
13 | Apr 11 | Mon | Missing data | Handout |
13 | Apr 13 | Wed | Missing data | Handout |
Data was collected on the corn yield versus rainfall in six U.S. corn-producing states (Iowa, Nebraska, Illinois, Indiana, Missouri, and Ohio), recorded for each year from 1890 to 1927.
Although increasing rainfall is associated with higher mean yields for rainfalls up to 12 inches, increasing rainfall at higher levels is associated with no change or perhaps a decrease in mean yield.
Why might that be?
corn <- Sleuth3::ex0915 %>% clean_names()head(corn, 15)
year yield rainfall1 1890 24.5 9.62 1891 33.7 12.93 1892 27.9 9.94 1893 27.5 8.75 1894 21.7 6.86 1895 31.9 12.57 1896 36.8 13.08 1897 29.9 10.19 1898 30.2 10.110 1899 32.0 10.111 1900 34.0 10.812 1901 19.4 7.813 1902 36.0 16.214 1903 30.2 14.115 1904 32.4 10.6
lm1 <-lm(yield ~ rainfall, data = corn)lm2 <-lm(yield ~ rainfall + I(rainfall^2), data = corn)
Dependent variable: | ||
yield | ||
(1) | (2) | |
rainfall | 0.776** (0.294) | 6.004*** (2.039) |
I(rainfall2) | -0.229** (0.089) | |
Constant | 23.550*** (3.236) | -5.015 (11.440) |
Observations | 38 | 38 |
R2 | 0.162 | 0.297 |
Adjusted R2 | 0.139 | 0.256 |
Note: | *p<0.1; **p<0.05; ***p<0.01 |
# set up some plausible rainfall values rainfall_values <- 7:16rainfall_values
[1] 7 8 9 10 11 12 13 14 15 16
term1 <- 6.004 * rainfall_valuesterm1
[1] 42.03 48.03 54.04 60.04 66.04 72.05 78.05 84.06 90.06 96.06
term2 <- 0.229 * rainfall_values^2term2
[1] 11.22 14.66 18.55 22.90 27.71 32.98 38.70 44.88 51.52 58.62
-5.015 + term1 - term2
[1] 25.79 28.36 30.47 32.12 33.32 34.06 34.34 34.16 33.52 32.42
interactions
packagelm3 <- lm(yield ~ rainfall + I(rainfall^2) + year, data = corn)interactions::interact_plot( lm3, # pick a model to plot pred = "rainfall", # this variable will be your x-axis modx = "year", # a moderator, i.e. a control we think is important plot.points = TRUE # plot the data points)
As with interaction terms, quadratic terms should not routinely be included.
Consider in four situations:
When the analyst has good reason to suspect that the response is nonlinear in some explanatory variable (through knowledge of the process or by graphical examination)
When the question of interest calls for finding the values that maximize or minimize the mean response;
When careful modeling of the regression is called for by the questions of interest (and presumably this is only the case if there are just a few explanatory variables);
Or when inclusion is used to produce a rich model for assessing the fit of an inferential model.
Statistical Sleuth 3e, 10.4.4, p 295
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |