class: center, middle, inverse, title-slide # POL90: Statistics ## Multiple Comparisons with ANOVA ### Prof Wasow
Assistant Professor, Politics
Pomona College ### 2022-03-15 --- # Announcements .large[ * Assignments + PS05 due <mark>Friday</mark> + Report 1 ] -- .large[ * Statistical Sleuth + Read Chapter 5 ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Feb 9 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Confidence Intervals </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> A Closer Look at Assumptions </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> A Closer Look at Assumptions </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Feb 21 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Alternatives to the t-Tools </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 6 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 23 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Comparison Among Several Samples </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Feb 28 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Comparison Among Several Samples </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Mar 2 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Simple Linear Regression </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 7 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Simple Linear Regression </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 9 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Regression by Calculation </td> <td style="text-align:right;"> 7 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Spring Recess </td> <td style="text-align:right;"> - </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS04 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 6 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 25 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS05 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 7 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mar 4 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Report1 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 6 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 11 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS06 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS07 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Apr 1 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS08 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 8 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report2 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 15 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS09 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 22 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS10 </td> <td style="text-align:right;"> 3 </td> </tr> </tbody> </table> --- ## R Markdown & Knitr tips * Useful chunk headers like: - `echo = TRUE # echo code in output` - `include = FALSE # suppress all output` - `results = 'asis' # output for HTML & Latex tables` - `results = 'hide' # hide output ` - `fig.height = 3 # set figure height to 3 inches` - `fig.width = 4 # set figure width to 4 inches` - `fig.align = 'center' # center figure` - `out.width = '50%' # shrink image 50%` * Example: - `{r some_chunk, echo = FALSE}` * Chatty packages - `suppressMessages(library(dplyr)) # suppress messages` --- ## Chunk tips .large[ - When to create new chunks? - Good to separate different parts of analysis - SCRUBBING: load and clean data chunks - ANALYZING: run analyses chunks (e.g., statistical tests) - PRESENTING: display results chunks (e.g., tables and plots) - Put tables and plots in separate chunks - `results = 'asis' # for tables` - `fig.height = 3 # for figures` ] --- ## Why `results = 'asis'` in Latex? .large[ - What you write: ] ```r t.test(mpg ~ am, data = mtcars) %>% broom::tidy() %>% select(-method, -alternative) %>% gt() %>% fmt_number(columns = everything(), decimals = 2) ``` .large[ - What you see in your pdf: ] <img src="images/example_tex_table.png" width="728" style="display: block; margin: auto;" /> --- ## Why `results = 'asis'` in Latex? .large[ - What you write: ] ```r t.test(mpg ~ am, data = mtcars) %>% broom::tidy() %>% select(-method, -alternative) %>% gt() %>% fmt_number(columns = everything(), decimals = 2) as_latex() ``` .large[ - What actually creates a Latex table in the pdf: ] ``` [[1]] [1] "\\captionsetup[table]{labelformat=empty,skip=1pt}" [2] "\\begin{longtable}{rrrrrrrr}" [3] "\\toprule" [4] "estimate & estimate1 & estimate2 & statistic & p.value & parameter & conf.low & conf.high \\\\ " [5] "\\midrule" [6] "$-7.24$ & $17.15$ & $24.39$ & $-3.77$ & $0.00$ & $18.33$ & $-11.28$ & $-3.21$ \\\\ " [7] " \\bottomrule" [8] "\\end{longtable}" [9] "" ``` --- ## Why `results = 'asis'` in HTML? .large[ - What you write: ] ```r t.test(mpg ~ am, data = mtcars) %>% broom::tidy() %>% gt() %>% fmt_number(columns = 1:8, decimals = 2) ```
estimate
estimate1
estimate2
statistic
p.value
parameter
conf.low
conf.high
method
alternative
−7.24
17.15
24.39
−3.77
0.00
18.33
−11.28
−3.21
Welch Two Sample t-test
two.sided
--- ## Why `results = 'asis'` in HTML? .large[ - What you write: ] ```r t.test(mpg ~ am, data = mtcars) %>% broom::tidy() %>% gt() %>% fmt_number(columns = 1:8, decimals = 2) ``` .large[ - What actually creates an HTML table: ] ``` [[1]] [1] "<table style=\"font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif; display: table; border-collapse: collapse; margin-left: auto; margin-right: auto; color: #333333; font-size: 16px; font-weight: normal; font-style: normal; background-color: #FFFFFF; width: auto; border-top-style: solid; border-top-width: 2px; border-top-color: #A8A8A8; border-right-style: none; border-right-width: 2px; border-right-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #A8A8A8; border-left-style: none; border-left-width: 2px; border-left-color: #D3D3D3;\">" [2] " " [3] " <thead style=\"border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3;\">" [4] " <tr>" [5] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">estimate</th>" [6] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">estimate1</th>" [7] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">estimate2</th>" [8] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">statistic</th>" [9] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">p.value</th>" [10] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">parameter</th>" [11] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">conf.low</th>" [12] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\" rowspan=\"1\" colspan=\"1\">conf.high</th>" [13] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: left;\" rowspan=\"1\" colspan=\"1\">method</th>" [14] " <th style=\"color: #333333; background-color: #FFFFFF; font-size: 100%; font-weight: normal; text-transform: inherit; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: bottom; padding-top: 5px; padding-bottom: 6px; padding-left: 5px; padding-right: 5px; overflow-x: hidden; text-align: left;\" rowspan=\"1\" colspan=\"1\">alternative</th>" [15] " </tr>" [16] " </thead>" [17] " <tbody style=\"border-top-style: solid; border-top-width: 2px; border-top-color: #D3D3D3; border-bottom-style: solid; border-bottom-width: 2px; border-bottom-color: #D3D3D3;\">" [18] " <tr><td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">-7.244939</td>" [19] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">17.14737</td>" [20] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">24.39231</td>" [21] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">-3.767123</td>" [22] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">0.001373638</td>" [23] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">18.33225</td>" [24] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">-11.28019</td>" [25] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: right; font-variant-numeric: tabular-nums;\">-3.209684</td>" [26] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;\">Welch Two Sample t-test</td>" [27] "<td style=\"padding-top: 8px; padding-bottom: 8px; padding-left: 5px; padding-right: 5px; margin: 10px; border-top-style: solid; border-top-width: 1px; border-top-color: #D3D3D3; border-left-style: none; border-left-width: 1px; border-left-color: #D3D3D3; border-right-style: none; border-right-width: 1px; border-right-color: #D3D3D3; vertical-align: middle; overflow-x: hidden; text-align: left;\">two.sided</td></tr>" [28] " </tbody>" [29] " " [30] " " [31] "</table>" ``` --- ## What is `group_by` again? Split-Apply-Combine <img src="images/split_apply_combine.png" width="80%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Chapter 5: Multiple Comparisons with ANOVA --- ## Caloric restriction and longevity <br> .large[ * Question: - Does reducing caloric intake increase longevity? * Data: - Female mice were randomly assigned to six diet groups and their lifetimes were measured * Why do we care? ] --- ## Caloric restriction and longevity with primates <img src="images/Science-2002-Roth-811.jpg" width="45%" style="display: block; margin: auto;" /> .footnote[Source: http://science.sciencemag.org/content/297/5582/811] --- ## Caloric restriction and longevity with primates <img src="images/Science-2002-Roth-811_highlights1.jpg" width="55%" style="display: block; margin: auto;" /> --- ## Caloric restriction and longevity with primates <img src="images/Science-2002-Roth-811_highlights2.jpg" width="55%" style="display: block; margin: auto;" /> --- ## Caloric restriction and longevity with primates <img src="images/Science-2002-Roth-811_highlights3.jpg" width="55%" style="display: block; margin: auto;" /> --- ## Caloric restriction and longevity with primates <img src="images/Science-2002-Roth-811_highlights4.jpg" width="55%" style="display: block; margin: auto;" /> --- ## Caloric restriction and longevity with primates <img src="images/Science-2002-Roth-811_highlights5.jpg" width="55%" style="display: block; margin: auto;" /> --- ## Caloric restriction and longevity with primates <img src="images/caloric_restriction_monkeys.jpg" width="90%" style="display: block; margin: auto;" /> --- ## Caloric restriction and longevity with mice <br><br> .large[ * **NP**: As much as they pleased, nonpurified standard diet * **N/N85**: Normal diet, before & after weaning, 85 kcal/wk * **N/R50**: Normal before, after weaning, 50 kcal/wk * **R/R50**: Before & after weaning, 50 kcal/wk * **N/R50**: lopro: Like N/R50, reduced protein w/age * **N/R40**: Normal before, after weaning, 40 kcal/wk ] --- ## Caloric restriction and longevity with mice <img src="images/ss_display_5_3.jpg" width="65%" style="display: block; margin: auto;" /> .footnote[Source: *Statistical Sleuth*, Display 5.3] --- ## Caloric restriction and longevity: look at data ```r mice <- Sleuth3::case0501 %>% janitor::clean_names() mice %>% head(5) ``` ``` lifetime diet 1 35.5 NP 2 35.4 NP 3 34.9 NP 4 34.8 NP 5 33.8 NP ``` ```r mice %>% tail(5) ``` ``` lifetime diet 345 33.9 N/R40 346 31.0 N/R40 347 29.4 N/R40 348 19.6 N/R40 349 47.6 N/R40 ``` --- ## Caloric restriction: Visualize .left-code[ ```r ggplot(data = mice) + aes(x = diet, y = lifetime) + geom_boxplot() + ylab("Lifetime (Months)") + xlab("Diet") ``` ] .right-plot[ <img src="week06_02_files/figure-html/mice_boxplot_plot1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Caloric restriction: reorder .left-code[ ```r *library(forcats) mice <- mice %>% mutate( * diet = fct_inorder(diet) ) ggplot(data = mice) + aes(x = diet, y = lifetime) + geom_boxplot() + ylab("Lifetime (Months)") + xlab("Diet") ``` ] .right-plot[ <img src="week06_02_files/figure-html/mice_boxplot_plot2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle, center # Mice: Two Sample Case --- ## Compare N/R50 vs N/N85 <img src="images/ss_display_5_3_modified.png" width="65%" style="display: block; margin: auto;" /> .footnote[Source: *Statistical Sleuth*, Display 5.3] --- ## Formula for Pooled Sample Standard Deviation <br><br><br> $$ s_p^2=\frac{\left(n_1-1\right) s_1^2+\left(n_2-1\right) s_2^2+\cdots+\left(n_I-1\right) s_I^2}{\left(n_1-1\right)+\left(n_2-1\right)+\cdots+\left(n_I-1\right)} $$ $$ d.f. = N - I $$ --- ## Calculating Pooled Sample Standard Deviation <br><br> <img src="images/ss_display_5_6.png" width="125%" style="display: block; margin: auto;" /> .footnote[Source: *Statistical Sleuth*, Display 5.6] --- ## Standard Error <br><br> $$ \mathrm{SE}\left(\bar{Y}_3-\bar{Y}_2\right)=s_p \sqrt{\frac{1}{n_3}+\frac{1}{n_2}} $$ --- ## Hypothesis Test for Two of Multiple Samples <img src="images/ss_display_5_7.png" width="63%" style="display: block; margin: auto;" /> .footnote[Source: *Statistical Sleuth*, Display 5.7] --- class: middle, center # Mice: Multiple Comparisons --- ## How to test for difference in means? Equal means <img src="week06_02_files/figure-html/mice_boxplot_plot3-1.png" width="100%" style="display: block; margin: auto;" /> --- ## A null hypothesis with multiple groups .large[ - Equality of all means with multiple groups ] `$$\begin{array}{llllllll} \text { Group: } & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ \text { Full (separate-means) model: } & \mu_1 & \mu_2 & \mu_3 & \mu_4 & \mu_5 & \mu_6 & \mu_7 \\ \text { Reduced (equal-means) model: } & \mu & \mu & \mu & \mu & \mu & \mu & \mu \end{array}$$` -- <br> .large[ - Equality of all averages with multiple samples ] `$$\begin{array}{llllllll} \text { Group: } & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ \text { Full (separate-means) model: } & \bar{Y}_1 & \bar{Y}_2 & \bar{Y}_3 & \bar{Y}_4 & \bar{Y}_5 & \bar{Y}_6 & \bar{Y}_7 \\ \text { Reduced (equal-means) model: } & \bar{Y} & \bar{Y} & \bar{Y} & \bar{Y} & \bar{Y} & \bar{Y} & \bar{Y} \end{array}$$` --- ## How to test for difference in means? Equal means <img src="week06_02_files/figure-html/mice_boxplot_plot4-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Equal means vs separate means <img src="week06_02_files/figure-html/mice_boxplot_plot5-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Equal means vs separate means <img src="week06_02_files/figure-html/mice_dotplot_plot1-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Equal means vs separate means <img src="week06_02_files/figure-html/mice_dotplot_plot2-1.png" width="100%" style="display: block; margin: auto;" /> --- class: center, middle # Discussion: Which model is better? --- ## Parsimony vs Explanatory Power <br><br> .large[ * Generally, we prefer simpler models to more complicated models * Generally, we prefer more explanatory power to less explanatory power ] --- ## Residuals with Equal Means Model <img src="week06_02_files/figure-html/unnamed-chunk-24-1.png" width="720" style="display: block; margin: auto;" /> --- ## Residuals with Separate Means Model <img src="week06_02_files/figure-html/unnamed-chunk-25-1.png" width="720" style="display: block; margin: auto;" /> --- ## Calculating Single Mean and Residuals ```r mice <- mice %>% mutate( single_mean = mean(lifetime), res_single_mean = lifetime - single_mean, res_single_mean_sq = (res_single_mean) ^ 2 ) head(mice) ``` ``` lifetime diet single_mean res_single_mean res_single_mean_sq 1 35.5 NP 38.79713 -3.297135 10.87110 2 35.4 NP 38.79713 -3.397135 11.54052 3 34.9 NP 38.79713 -3.897135 15.18766 4 34.8 NP 38.79713 -3.997135 15.97709 5 33.8 NP 38.79713 -4.997135 24.97135 6 33.5 NP 38.79713 -5.297135 28.05964 ``` ```r sum(mice$res_single_mean_sq) ``` ``` [1] 28031.36 ``` --- ## Calculating Single Means and Residuals ```r mice %>% filter(diet == "NP") %>% head(20) ``` ``` lifetime diet single_mean res_single_mean res_single_mean_sq 1 35.5 NP 38.79713 -3.297135 10.87110 2 35.4 NP 38.79713 -3.397135 11.54052 3 34.9 NP 38.79713 -3.897135 15.18766 4 34.8 NP 38.79713 -3.997135 15.97709 5 33.8 NP 38.79713 -4.997135 24.97135 6 33.5 NP 38.79713 -5.297135 28.05964 7 32.6 NP 38.79713 -6.197135 38.40448 8 32.4 NP 38.79713 -6.397135 40.92333 9 31.8 NP 38.79713 -6.997135 48.95989 10 31.6 NP 38.79713 -7.197135 51.79875 11 31.5 NP 38.79713 -7.297135 53.24817 12 31.4 NP 38.79713 -7.397135 54.71760 13 31.4 NP 38.79713 -7.397135 54.71760 14 31.3 NP 38.79713 -7.497135 56.20703 15 30.8 NP 38.79713 -7.997135 63.95416 16 30.7 NP 38.79713 -8.097135 65.56359 17 30.5 NP 38.79713 -8.297135 68.84244 18 30.4 NP 38.79713 -8.397135 70.51187 19 30.2 NP 38.79713 -8.597135 73.91072 20 30.2 NP 38.79713 -8.597135 73.91072 ``` --- ## Calculating Single Means and Residuals ```r mice %>% filter(diet == "lopro") %>% head(20) ``` ``` lifetime diet single_mean res_single_mean res_single_mean_sq 1 49.7 lopro 38.79713 10.902865 118.87247 2 49.3 lopro 38.79713 10.502865 110.31018 3 48.6 lopro 38.79713 9.802865 96.09617 4 48.3 lopro 38.79713 9.502865 90.30445 5 48.0 lopro 38.79713 9.202865 84.69273 6 47.7 lopro 38.79713 8.902865 79.26101 7 47.5 lopro 38.79713 8.702865 75.73986 8 47.2 lopro 38.79713 8.402865 70.60815 9 47.1 lopro 38.79713 8.302865 68.93757 10 47.0 lopro 38.79713 8.202865 67.28700 11 47.0 lopro 38.79713 8.202865 67.28700 12 47.0 lopro 38.79713 8.202865 67.28700 13 46.9 lopro 38.79713 8.102865 65.65643 14 46.9 lopro 38.79713 8.102865 65.65643 15 46.3 lopro 38.79713 7.502865 56.29299 16 45.9 lopro 38.79713 7.102865 50.45070 17 45.9 lopro 38.79713 7.102865 50.45070 18 44.5 lopro 38.79713 5.702865 32.52267 19 44.1 lopro 38.79713 5.302865 28.12038 20 44.0 lopro 38.79713 5.202865 27.06981 ``` --- ## Calculating Separate Mean and Residuals ```r mice <- mice %>% * group_by(diet) %>% mutate( separate_mean = mean(lifetime), res_separate_mean = lifetime - separate_mean, res_separate_mean_sq = (res_separate_mean) ^ 2 ) %>% ungroup() mice %>% head() ``` ``` # A tibble: 6 × 5 lifetime diet separate_mean res_separate_mean res_separate_mean_sq <dbl> <fct> <dbl> <dbl> <dbl> 1 35.5 NP 27.4 8.10 65.6 2 35.4 NP 27.4 8.00 64.0 3 34.9 NP 27.4 7.50 56.2 4 34.8 NP 27.4 7.40 54.7 5 33.8 NP 27.4 6.40 40.9 6 33.5 NP 27.4 6.10 37.2 ``` ```r sum(mice$res_separate_mean_sq) ``` ``` [1] 15297.42 ``` --- ## Calculating Separate Means and Residuals ```r mice %>% filter(diet == "NP") %>% head(20) ``` ``` # A tibble: 20 × 5 lifetime diet separate_mean res_separate_mean res_separate_mean_sq <dbl> <fct> <dbl> <dbl> <dbl> 1 35.5 NP 27.4 8.10 65.6 2 35.4 NP 27.4 8.00 64.0 3 34.9 NP 27.4 7.50 56.2 4 34.8 NP 27.4 7.40 54.7 5 33.8 NP 27.4 6.40 40.9 6 33.5 NP 27.4 6.10 37.2 7 32.6 NP 27.4 5.20 27.0 8 32.4 NP 27.4 5.00 25.0 9 31.8 NP 27.4 4.40 19.3 10 31.6 NP 27.4 4.20 17.6 11 31.5 NP 27.4 4.10 16.8 12 31.4 NP 27.4 4.00 16.0 13 31.4 NP 27.4 4.00 16.0 14 31.3 NP 27.4 3.90 15.2 15 30.8 NP 27.4 3.40 11.5 16 30.7 NP 27.4 3.30 10.9 17 30.5 NP 27.4 3.10 9.60 18 30.4 NP 27.4 3.00 8.99 19 30.2 NP 27.4 2.80 7.83 20 30.2 NP 27.4 2.80 7.83 ``` --- ## Calculating Separate Means and Residuals ```r mice %>% filter(diet == "lopro") %>% head(20) ``` ``` # A tibble: 20 × 5 lifetime diet separate_mean res_separate_mean res_separate_mean_sq <dbl> <fct> <dbl> <dbl> <dbl> 1 49.7 lopro 39.7 10.0 100. 2 49.3 lopro 39.7 9.61 92.4 3 48.6 lopro 39.7 8.91 79.5 4 48.3 lopro 39.7 8.61 74.2 5 48 lopro 39.7 8.31 69.1 6 47.7 lopro 39.7 8.01 64.2 7 47.5 lopro 39.7 7.81 61.1 8 47.2 lopro 39.7 7.51 56.5 9 47.1 lopro 39.7 7.41 55.0 10 47 lopro 39.7 7.31 53.5 11 47 lopro 39.7 7.31 53.5 12 47 lopro 39.7 7.31 53.5 13 46.9 lopro 39.7 7.21 52.0 14 46.9 lopro 39.7 7.21 52.0 15 46.3 lopro 39.7 6.61 43.7 16 45.9 lopro 39.7 6.21 38.6 17 45.9 lopro 39.7 6.21 38.6 18 44.5 lopro 39.7 4.81 23.2 19 44.1 lopro 39.7 4.41 19.5 20 44 lopro 39.7 4.31 18.6 ``` --- ## Extra Sum of Squares ```r # Residual sum of squares for equal means model sum(mice$res_single_mean_sq) ``` ``` [1] 28031.36 ``` ```r # Residual sum of squares for separate means model sum(mice$res_separate_mean_sq) ``` ``` [1] 15297.42 ``` ```r # Extra sum of squares: RSS_equal - RSS_separate sum(mice$res_single_mean_sq) - sum(mice$res_separate_mean_sq) ``` ``` [1] 12733.94 ``` --- class: middle, center # Extra-Sum-of-Squares *F*-Statistic --- ## Introducing the `\(F\)`-distribution ```r visualize::visualize.f(stat = 3, df1 = 6, df2 = 5, section = "upper") ``` <img src="week06_02_files/figure-html/unnamed-chunk-35-1.png" width="50%" style="display: block; margin: auto;" /> --- ## Let's Play with the `\(F\)`-distribution <img src="images/statkey_fdist.png" width="986" style="display: block; margin: auto;" /> --- ## General Form of the Extra-Sum-of-Squares *F*-Statistic .large[ * A full model is a general model that is found to adequately describe the data * A reduced model is a special case of the full model obtained by imposing the restriction of the null hypothesis + `\(H_0\)`: Reduced Model + `\(H_1\)`: Full Model * A residual sums of squares measures the variability in the observations that remains unexplained by a model ] --- ## General Form of the Extra-Sum-of-Squares *F*-Statistic .large[ * Fit both full and reduced models and compute residual sums of squares under both models, `\(RSS_{full}\)` and `\(RSS_{reduced}\)` * Basic Ideas: + If `\(H_0\)` is correct, then the two models should be about equal in their ability to explain the data and the magnitudes of the residuals should be about the same + If `\(H_0\)` is incorrect, the magnitudes of the residuals from the reduced model will tend to be larger * Hence, we would like to reject `\(H_0\)` when `\(RSS_{reduced}\)` is much larger than `\(RSS_{full}\)` ] --- ## General Form of the Extra-Sum-of-Squares *F*-Statistic <br><br> .large[ * Extra sum of squares measures the amount of unexplained variability in the reduced model that *is* explained by the full model `\begin{align*} \textrm{Extra sum of squares} & = RSS_{reduced} - RSS_{full} \end{align*}` ] --- ## General Form of the Extra-Sum-of-Squares *F*-Statistic <br><br> .large[ * `\(F\)`-statistic is the extra sum of squares per extra degree of freedom, scaled by the best estimate of variance `\begin{align*} F\textrm{-statistic} & = \frac{\frac{\textrm{Extra sum of squares}}{\textrm{Extra degrees of freedom}} } {\sigma^2_{full}}\\\\ F\textrm{-statistic} & = \frac{\frac{RSS_{reduced} - RSS_{full}}{df_{reduced} - df_{full}} } {\frac{RSS_{full}}{df_{full}}} \end{align*}` ] --- ## `\(F\)`-Test <br><br> .large[ * Large `\(F\)`-statistics are associated with large differences in the size of the residuals from the two models * If all means equal, sampling distribution of the `\(F\)`-statistic is an `\(F\)`-distribution * Depends on two parameters: *numerator degrees of freedom* and *denominator degrees of freedom* * `\(F\)` values in range of 0.5 - 3.0 are typical, 3.0 - 4.0 unlikely, >4.0 strong evidence against equal means ] --- ## Calculating ESS `\(F\)`-Statistic .large[ * `\(F\)`-statistic is the extra sum of squares per extra degree of freedom, scaled by the best estimate of variance ] `$$\begin{align*} F\textrm{-statistic} & = \frac{\frac{28,031.36 - 15,297.42}{348 - 343 } } {\frac{15,297.42}{343}} \\ & = \frac{\frac{12,733.94}{5} } {\frac{15,297.42}{343}} \\ & = \frac{2546.79} {44.60} \\ & = 57.10 \\ \end{align*}$$` --- ## Calculating ESS `\(F\)`-Statistic .large[ * `\(p\)`-value = `\(P ( F_{(n-1,n-I)} > F )\)` < .00001 ] ```r # left tail pf(q = 57.1, df1 = 348, df2 = 343) ``` ``` [1] 1 ``` ```r # right tail (what we want) 1 - pf(q = 57.1, df1 = 348, df2 = 343) ``` ``` [1] 0 ``` .large[ * What does this `\(F\)`-Test and `\(p\)`-value tell us? ] --- ## Calculating ESS `\(F\)`-Statistic <br><br> .large[ * Conclusion: Reject `\(H_0\)`. At least one mean is different * In cases where we accept `\(H_0\)`, there is no evidence of differences in means among `\(I\)` groups ] --- ## Caloric restriction and longevity .large[ * Summary of Statistical Findings: - There is overwhelming evidence that mean lifetimes in the six groups are different (*p*-value < 0.0001: analysis of variance *F*-test). * For the comparison (a): - There is convincing evidence that lifetime is increased as a result of restricting the diet from 85 kcal/wk to 50kcal/wk (one-tailed *p*-value < 0.0001: *t*-test) * The increase is estimated to be 9.6 months (95% confidence interval: 7.3 to 11.9 months) ] --- class: center, middle # Questions?