class: center, middle, inverse, title-slide # POL90: Statistics ### Prof Wasow
Assistant Professor, Politics
Pomona College ### 2022-01-31 --- ## Announcements .large[ * Assignments + PS02 due Friday including DataCamp <!-- + Report 1 --> <!-- + Available on BlackBoard --> <!-- + Due Tuesday 2/23 --> <!-- + PS04 due Friday 2/26 --> <!-- + Report 1 teams randomly assigned --> <!-- + Be professional, be kind --> <!-- - Gains to trade --> <!-- - Option to not have teammates --> ] <!-- * Slides will go up after class --> <!-- * Dinner in Mathey, Thursday 6-8pm --> --- ## Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Jan 17 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> MLK Day </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Jan 19 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Introduction </td> <td style="text-align:right;"> - </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Jan 24 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Drawing Statistical Conclusions </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Jan 26 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Drawing Statistical Conclusions </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Jan 31 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mon </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Inference Using t-Distributions </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Feb 2 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Inference Using t-Distributions </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Feb 7 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> A Closer Look at Assumptions </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Feb 9 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> A Closer Look at Assumptions </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Alternatives to the t-Tools </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Alternatives to the t-Tools </td> <td style="text-align:right;"> 4 </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> Jan 21 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> - </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Jan 28 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS01 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 4 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS02 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Feb 11 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS03 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS04 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Feb 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS05 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Mar 4 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report1 </td> <td style="text-align:right;"> 6 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 11 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS06 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS07 </td> <td style="text-align:right;"> 3 </td> </tr> </tbody> </table> --- ## Overview .Large[ <!-- - Report 1 --> - Chapter 1: Wrapping up creativity study - Chapter 2: Simulation vs theoretical approaches ] --- class: center, middle, inverse # Chapter 1: Creativity study --- ## Why Stem-and-leaf diagrams? <img src="images/IMG_0023_small.jpg" width="65%" style="display: block; margin: auto;" /> --- ## Creativity study: Stem-and-leaf diagram <img src="images/stem_and_leaf_plots.png" width="65%" style="display: block; margin: auto;" /> --- ## Question: Pros & cons? <img src="images/stem_and_leaf_plots.png" width="65%" style="display: block; margin: auto;" /> --- ## Creativity Study .large[ * Conclusion: + Either there is no treatment effect and we happened to get an unusual randomization ("we fail to reject the null hypothesis'), or we find evidence in favor of a treatment effect ("we reject the null hypothesis") + Therefore, the results suggest that the difference in creativity scores is due to the difference in motivational questions + This randomization test allows us to compute the *p*-value exactly and its concept is important for understanding the connection of the *p*-value to the chance mechanism + In practice we often use an approximation with a <mark>theoretical probability distribution</mark> ] --- class: center, middle, inverse # Getting to know the # `\(t\)`-distribution --- class: center, middle background-image: url("images/William_Sealy_Gosset.jpg") background-position: center background-size: contain background-color: #272822 --- ## Who was 'Student'? <img src="images/gosset_bio.png" width="100%" style="display: block; margin: auto;" /> .footnote[Source: https://en.wikipedia.org/wiki/William_Sealy_Gosset] --- ## Student’s *t*-distribution .Large[ * Parameters: `\(\nu\)` * Probability Density Function: `\(f(t) = \dfrac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\Gamma(\nu/2)}(1 + t^2 /\nu)^{-(\nu+1)/2}\)` * Mean: 0 if `\(\nu > 1\)` * Variance: `\(\dfrac{\nu}{\nu-2}\)` ] --- # Theoretical distribution via StatKey <img src="images/statkey_t_dist.png" width="80%" style="display: block; margin: auto;" /> .footnote[Source: [http://www.lock5stat.com/StatKey/](http://www.lock5stat.com/StatKey/)] --- ## Student's *t*-distribution in `R` ```r pt(q = 0, df = 45) ``` ``` ## [1] 0.5 ``` ```r visualize::visualize.t(stat = 0, df = 45) ``` <img src="week03_01_files/figure-html/t_0-1.png" width="45%" style="display: block; margin: auto;" /> --- ## Student's *t*-distribution in `R` ```r pt(q = -1, df = 45) ``` ``` ## [1] 0.1613288 ``` ```r visualize::visualize.t(stat = -1, df = 45) ``` <img src="week03_01_files/figure-html/t_-1-1.png" width="45%" style="display: block; margin: auto;" /> --- ## Student's *t*-distribution in `R` ```r pt(q = -2, df = 45) ``` ``` ## [1] 0.02577884 ``` ```r visualize::visualize.t(stat = -2, df = 45) ``` <img src="week03_01_files/figure-html/t_-2-1.png" width="45%" style="display: block; margin: auto;" /> --- ## Student's *t*-distribution in `R` ```r pt(q = 1, df = 45) ``` ``` ## [1] 0.8386712 ``` ```r visualize::visualize.t(stat = 1, df = 45) ``` <img src="week03_01_files/figure-html/t_1-1.png" width="45%" style="display: block; margin: auto;" /> --- ## Student's *t*-distribution in `R` ```r pt(q = 2, df = 45) ``` ``` ## [1] 0.9742212 ``` ```r visualize::visualize.t(stat = 2, df = 45) ``` <img src="week03_01_files/figure-html/t_2-1.png" width="45%" style="display: block; margin: auto;" /> --- ## Student's *t*-distribution in `R` * Visualizing `\(t\)` with different degrees of freedom .left-code[ ```r t <- seq(from = -4, to = +4, by = 0.02) *t_df2 <- dt(t, df = 2) t_df10 <- dt(t, df = 10) t_df1000 <- dt(t, df = 1000) plot(x = t, y = t_df1000, col = "black", cex = .2) points(x = t, y = t_df10, col = "red", cex = .2) points(x = t, y = t_df2, col = "green", cex = .2) # cex makes points smaller ``` ] .right-plot[ <img src="week03_01_files/figure-html/t-dist-plot-out-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Student's *t*-distribution in `R` * Visualizing `\(t\)` with `rt` or random draws from *t* .left-code[ ```r t_dist <- data.frame( * draws = rt(1000, df = 14) ) ggplot(data = t_dist) + aes(x = draws) + geom_histogram() ``` ] .right-plot[ <img src="week03_01_files/figure-html/rt-dist-plot-out1-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Student's *t*-distribution in `R` * Visualizing `\(t\)` with `rt` or random draws from *t* .left-code[ ```r t_dist <- data.frame( * draws = rt(10000, df = 14) ) ggplot(data = t_dist) + aes(x = draws) + geom_histogram() ``` ] .right-plot[ <img src="week03_01_files/figure-html/rt-dist-plot-out2-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Student's *t*-distribution in `R` * Visualizing `\(t\)` with `rt` or random draws from *t* .left-code[ ```r t_dist <- data.frame( * draws = rt(100000, df = 14) ) ggplot(data = t_dist) + aes(x = draws) + geom_histogram() ``` ] .right-plot[ <img src="week03_01_files/figure-html/rt-dist-plot-out3-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: center, middle, inverse # Simulation vs Theoretical Approaches # with creativity study --- ## Load `creativity` data ```r creativity <- Sleuth3::case0101 |> clean_names() head(creativity) ``` ``` ## score treatment ## 1 5.0 Extrinsic ## 2 5.4 Extrinsic ## 3 6.1 Extrinsic ## 4 10.9 Extrinsic ## 5 11.8 Extrinsic ## 6 12.0 Extrinsic ``` ```r mean_int <- mean(creativity$score[creativity$treatment == "Intrinsic"]) mean_ext <- mean(creativity$score[creativity$treatment == "Extrinsic"]) mean_int - mean_ext ``` ``` ## [1] 4.144203 ``` --- ## Simulate null distribution ```r library(infer) creativity_null_dist_simulated <- creativity |> specify(score ~ treatment) |> hypothesize(null = "independence") |> generate(reps = 1000, type = "permute") |> calculate(stat = "diff in means", order = c("Intrinsic", "Extrinsic")) head(creativity_null_dist_simulated) ``` ``` ## Response: score (numeric) ## Explanatory: treatment (factor) ## Null Hypothesis: independence ## # A tibble: 6 × 2 ## replicate stat ## <int> <dbl> ## 1 1 -0.173 ## 2 2 0.704 ## 3 3 -1.36 ## 4 4 -0.522 ## 5 5 3.40 ## 6 6 -0.905 ``` --- ## Visualize `\(p\)`-value with simulated null distribution ```r infer::visualize(creativity_null_dist_simulated) + shade_p_value(obs_stat = 4.14, direction = "two-sided") ``` <img src="week03_01_files/figure-html/unnamed-chunk-14-1.png" width="50%" style="display: block; margin: auto;" /> --- ## Calculate `\(p\)`-value with theoretical distribution ```r t.test(score ~ treatment, data = creativity, var.eq = TRUE) ``` ``` ## ## Two Sample t-test ## ## data: score by treatment *## t = -2.9259, df = 45, p-value = 0.005366 ## alternative hypothesis: true difference in means between group Extrinsic and group Intrinsic is not equal to 0 ## 95 percent confidence interval: ## -6.996973 -1.291432 ## sample estimates: ## mean in group Extrinsic mean in group Intrinsic ## 15.73913 19.88333 ``` --- ## Visualize `\(p\)`-value with theoretical distribution ```r visualize::visualize.t(stat = c(-2.9, +2.9), df = 45, section = "tails") ``` <img src="week03_01_files/figure-html/unnamed-chunk-16-1.png" width="60%" style="display: block; margin: auto;" /> --- class: center, middle # Questions?