class: center, middle, inverse, title-slide # POL90: Statistics ## Confidence Intervals ### Prof Wasow
Assistant Professor, Politics
Pomona College ### 2021-02-15 --- # Announcements .large[ * Assignments + PS03 due <mark>Friday, 2/11</mark> + Report 1 ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> Jan 26 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Drawing Statistical Conclusions </td> <td style="text-align:right;"> 1 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Jan 31 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Inference Using t-Distributions </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Feb 2 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Inference Using t-Distributions </td> <td style="text-align:right;"> 2 </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> Feb 7 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> A Closer Look at Assumptions </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 4 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 9 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> A Closer Look at Assumptions </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 14 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Alternatives to the t-Tools </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 16 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Alternatives to the t-Tools </td> <td style="text-align:right;"> 4 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Feb 21 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Comparison Among Several Samples </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Feb 23 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Comparison Among Several Samples </td> <td style="text-align:right;"> 5 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Feb 28 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Simple Linear Regression </td> <td style="text-align:right;"> 7 </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:left;"> Feb 4 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS02 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 4 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Feb 11 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Fri </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS03 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 5 </td> <td style="text-align:left;"> Feb 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS04 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 6 </td> <td style="text-align:left;"> Feb 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS05 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 7 </td> <td style="text-align:left;"> Mar 4 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report1 </td> <td style="text-align:right;"> 6 </td> </tr> <tr> <td style="text-align:right;"> 8 </td> <td style="text-align:left;"> Mar 11 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS06 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 9 </td> <td style="text-align:left;"> Mar 18 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Spring break </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 10 </td> <td style="text-align:left;"> Mar 25 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS07 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Apr 1 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS08 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 8 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> Report2 </td> <td style="text-align:right;"> 8 </td> </tr> </tbody> </table> --- class: center, middle, inverse # Report 1 --- class: center ## Report 1: Test a theory, elites vs masses .pull-left[<img src="images/Zaller_James_big_square.jpg" alt="drawing" style="width:200px;"/><img src="images/lenz_gabriel2.jpg" alt="drawing" style="width:200px;"/>] .pull-right[<img src="images/lee_taeku_big.jpg" alt="drawing" style="width:200px;"/><img src="images/erica-chenoweth-maria-stephan-nsc-briefing.jpg" alt="drawing" style="width:200px;"/><img src="images/Daniel-Gillion3.jpg" alt="drawing" style="width:200px;"/>] --- class: center ## Report 1: Engage with texts, not people .pull-left[<img src="images/zaller_book_cover.jpg" alt="drawing" style="width:150px;"/> <img src="images/lenz_book_cover.jpg" alt="drawing" style="width:150px;"/>] .pull-rigth[<img src="images/lee_book_cover.jpg" alt="drawing" style="width:100px;"/> <img src="images/stephan_chenoweth_book_cover.jpg" alt="drawing" style="width:100px;"/> <img src="images/gillion_book_cover.jpg" alt="drawing" style="width:100px;"/>] --- ## Pew: US opinion on same-sex marriage, 2001-2017 <img src="images/pew_same_sex_marriage.png" width="75%" style="display: block; margin: auto;" /> .footnote[Source: http://www.pewforum.org/fact-sheet/changing-attitudes-on-gay-marriage/] --- ## Report 1: Goals .large[ * Use data to test theories - A contest or horserace of theories - "Three cornered fight" or a court proceeding ] -- .large[ * Use data as rhetoric - Report a statistical test - Summarize data in a table - Convey trends and relationships with visualization ] -- .large[ * Produce replicable research - See how "literate programming" like R + R Markdown + Latex + knitr contributes to replication - Practice good programming and statistical style ] --- ## Report Questions & Suggestions .large[ - Plan to use iPoll - Register on iPoll it makes downloading many polls much easier - You will need to do some data cleaning, that's part of the assignment - See Data Scrubbing handout: http://appliedstats.org/data_scrub_handout.html - Collaboration may be easier with RStudio.cloud. See link on Canvas & handout: http://appliedstats.org/rstudio_cloud_guide.html ] --- class: middle, inverse, center # Sampling Distribution # of the Sample Average --- ## Sampling Distribution of the Sample Average <img src="images/ss_display_2_3.png" width="60%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.3] --- ## Population <img src="images/ss_display_2_3_fished1.png" width="60%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.3] --- ## Population vs Sample <img src="images/ss_display_2_3_fished2.png" width="60%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.3] --- ## Population vs Sample vs Sampling distribution <img src="images/ss_display_2_3_fished3.png" width="60%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.3] --- class: center, middle, inverse # Confidence Intervals --- <img src="images/epiellie_confidence_interval_1.png" width="100%" style="display: block; margin: auto;" /> .footnote[Source: https://twitter.com/EpiEllie/status/1073385394580979712] --- <img src="images/epiellie_confidence_interval_2.jpg" width="100%" style="display: block; margin: auto;" /> .footnote[Source: https://twitter.com/EpiEllie/status/1073385412993929217] --- <img src="images/epiellie_confidence_interval_3.jpg" width="100%" style="display: block; margin: auto;" /> .footnote[Source: https://twitter.com/EpiEllie/status/1073385427317465089] --- ## Think 95% Interval not 95% Confidence .large[ - Issue is partly wording - Would you rather be 95% confident or 99% confident? - Why not 100%? - Consider interval for heights ] --- ## Think 95% Interval not 95% Confidence .large[ - With 95% Confidence Interval, 95% modifies interval, not confidence - Think: Interval *of* 95% - A Confidence Interval of 95% should capture the true value 95% of the time - Wider the interval, more likely to capture true value ] --- <img src="images/epiellie_confidence_interval_4.png" width="75%" style="display: block; margin: auto;" /> .footnote[Source: https://twitter.com/EpiEllie/status/1073385445835329536] --- class: center, middle, inverse # Confidence Intervals # Analytically --- ## What Are Plausible Values for `\(\mu\)` (Given the Data)? <!-- \begin{aligned} --> `$$t_{df}(0.025) < \frac{(\bar{Y} - \mu)}{\textrm{SE}(\bar{Y})} < t_{df}(0.975)$$` -- - Looks similar to `\(t\)`-ratio equation... $$ t\text{-ratio} = \dfrac{\text{Estimate}-\text{Parameter}}{\text{SE(Estimate)}} $$ $$ t-\text{ratio}(\text{if } \mu \text{ is zero}) = \dfrac{0.199-0}{0.0615} = 3.236 $$ -- - ...but with Confidence Interval, we're solving for `\(\mu\)` - in second equation, we're solving for `\(t\)`-ratio (and `\(\mu\)` is set to zero) --- ## What Are Plausible Values for `\(\mu\)` (Given the Data)? <!-- \begin{aligned} --> `$$t_{df}(0.025) < \frac{(\bar{Y} - \mu)}{\textrm{SE}(\bar{Y})} < t_{df}(0.975)$$` -- `$$t_{14}(0.025) < \frac{(0.199 - \mu)}{0.0615} < t_{14}(0.975)$$` -- `$$t_{14}(0.025) \cdot 0.0615 < (0.199 - \mu) < t_{14}(0.975) \cdot 0.0615$$` -- $$ -2.145 \cdot 0.0615 < (0.199 - \mu) < +2.145 \cdot 0.0615 $$ -- $$ -0.199 -2.145 \cdot 0.0615 < - \mu < -0.199 +2.145 \cdot 0.0615 $$ -- $$ -0.331 < - \mu < -0.067 $$ -- $$ +0.331 > + \mu > +0.067 $$ -- $$ 0.067 < \quad \mu < 0.331 $$ --- ## Intuition about the Confidence Interval formula <br><br> .large[ - Looking at each term: $$ +0.199 -2.145 \cdot 0.0615 < + \mu < +0.199 +2.145 \cdot 0.0615 $$ ] -- .large[ - Three terms - +0.199 = Central tendency - `\(\pm\)` 2.145 = Two Standard Deviations on `\(t\)`-distribution with 14 df - 0.0615 = Scaling term ] --- ## What are `\(t_{14}(0.025)\)` and `\(t_{14}(0.975)\)` in `R`? - Now we use `qt()`. We give a probability and `qt()` returns quantile. ```r # quantile for p = 0.025 in t-dist with 14 df qt(p = 0.025, df = 14) ``` ``` [1] -2.144787 ``` ```r # quantile for p = 0.975 in t-dist with 14 df qt(p = 0.975, df = 14) ``` ``` [1] 2.144787 ``` --- ## How does `\(t_{14}\)` Compare to Other Distributions? ```r # quantile for p = 0.025 in t-dist with 14 df qt(p = 0.025, df = 14) ``` ``` [1] -2.144787 ``` ```r qt(p = 0.025, df = 60) ``` ``` [1] -2.000298 ``` ```r qt(p = 0.025, df = 100) ``` ``` [1] -1.983972 ``` ```r # quantile for p = 0.025 in Normal distribution qnorm(p = 0.025, mean = 0, sd = 1) ``` ``` [1] -1.959964 ``` --- ## Student's *t*-Distribution on 14 d.f <img src="images/ss_display_2_5_converted.png" width="80%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.5] --- ## Student's *t*-Distribution on 14 d.f <img src="images/ss_display_2_5_percentages.png" width="80%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.5] --- ## Student's *t*-Distribution on 14 d.f <img src="images/ss_display_2_5_percentiles.png" width="80%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.5] --- ## Student's *t*-Distribution on 14 d.f <img src="images/ss_display_2_5_r_commands1.png" width="80%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.5] --- ## Calculating Confidence Interval Bounds ```r # left 95% CI +0.199 -2.145 * 0.0615 ``` ``` [1] 0.0670825 ``` ```r # right 95% CI +0.199 +2.145 * 0.0615 ``` ``` [1] 0.3309175 ``` --- ## Loading `twins` data ```r twins <- Sleuth3::case0202 %>% janitor::clean_names() twins <- twins %>% mutate( difference = unaffected - affected, pair = paste("Pair", row_number()) ) %>% select(pair, difference) head(twins) ``` ``` pair difference 1 Pair 1 0.67 2 Pair 2 -0.19 3 Pair 3 0.09 4 Pair 4 0.19 5 Pair 5 0.13 6 Pair 6 0.40 ``` --- ## Checking Confidence Interval in `t.test` ```r t.test(twins$difference) ``` ``` One Sample t-test data: twins$difference t = 3.2289, df = 14, p-value = 0.006062 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: * 0.0667041 0.3306292 sample estimates: mean of x 0.1986667 ``` --- class: center, middle, inverse # Bootstrapping Confidence Intervals --- ## Recall populations & samples .center[![](images/statistics1e_figun_03_p162.jpg)] --- ## Sampling Distribution of the Sample Average <img src="images/ss_display_2_3.png" width="57%" style="display: block; margin: auto;" /> .footnote[Source: Statistical Sleuth 3e, Display 2.3] --- ```r twins_sample_as_pop <- rbind(twins, twins, twins, twins, twins, twins, twins, twins, twins, twins) dim(twins_sample_as_pop) ``` ``` [1] 150 2 ``` ```r twins_sample_as_pop %>% sample_n(15) ``` ``` pair difference 1 Pair 7 0.04 2 Pair 6 0.40 3 Pair 13 0.02 4 Pair 3 0.09 5 Pair 15 0.11 6 Pair 6 0.40 7 Pair 2 -0.19 8 Pair 11 0.23 9 Pair 11 0.23 10 Pair 8 0.10 11 Pair 13 0.02 12 Pair 4 0.19 13 Pair 10 0.07 14 Pair 10 0.07 15 Pair 6 0.40 ``` --- ```r twins_sample_as_pop <- rbind(twins, twins, twins, twins, twins, twins, twins, twins, twins, twins) dim(twins_sample_as_pop) ``` ``` [1] 150 2 ``` ```r twins_sample_as_pop %>% sample_n(15) ``` ``` pair difference 1 Pair 15 0.11 2 Pair 1 0.67 3 Pair 12 0.59 4 Pair 12 0.59 5 Pair 3 0.09 6 Pair 2 -0.19 7 Pair 14 0.03 8 Pair 6 0.40 9 Pair 1 0.67 10 Pair 14 0.03 11 Pair 1 0.67 12 Pair 5 0.13 13 Pair 2 -0.19 14 Pair 8 0.10 15 Pair 3 0.09 ``` --- ```r twins_sample_as_pop <- rbind(twins, twins, twins, twins, twins, twins, twins, twins, twins, twins) dim(twins_sample_as_pop) ``` ``` [1] 150 2 ``` ```r twins_sample_as_pop %>% sample_n(15) ``` ``` pair difference 1 Pair 15 0.11 2 Pair 15 0.11 3 Pair 2 -0.19 4 Pair 10 0.07 5 Pair 6 0.40 6 Pair 2 -0.19 7 Pair 13 0.02 8 Pair 7 0.04 9 Pair 13 0.02 10 Pair 1 0.67 11 Pair 13 0.02 12 Pair 7 0.04 13 Pair 14 0.03 14 Pair 11 0.23 15 Pair 2 -0.19 ``` --- class: middle, center background-color: #000000 <iframe width="1120" height="630" src="http://www.lock5stat.com/videos/BootstrapIntro.mp4" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" ></iframe> .footnote[Source: http://www.lock5stat.com/videos.html] --- ## Bootstrapping the Sampling Distribution <img src="images/schizophrenia_statkey_bootstrap_ci.png" width="100%" style="display: block; margin: auto;" /> .footnote[Source: http://www.lock5stat.com/StatKey/bootstrap_1_quant/bootstrap_1_quant.html] --- ## Bootstrapping vs randomization test .large[ * Both use simulation and randomization * Bootstrapping + mimics random sampling + assumes sample represents population + draws new 'sample' from original sample + typically draws with replacement + *no* randomization of group assignment ] -- .large[ * Randomization test + mimics random assignment + assumes null hypothesis for effect of 'treatment' + randomizes group assignment + *no* replacement ] --- class: center, middle # Questions?