class: center, middle, inverse, title-slide # POL90: Statistics ## Panel Data and Dummy Variables / Fixed Effects ### Prof. Wasow and Galileu Kim Pomona College ### 2022-04-13 --- <style type="text/css"> .regression10 table { font-size: 10px; } .regression12 table { font-size: 12px; } .regression14 table { font-size: 14px; } </style> # Announcements .large[ - Assignments - PS09 - Report 3 teams to be assigned - Reading: - Skim: Jennifer Hill & Andrew Gelman (2007), Chapters 11 and 12 - Optional: - Richard Williams, notes on Panel Data: A Brief Overview - Oscar Torres-Reyna, Getting Started in Fixed/Random Effects Models using R ] --- # Schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Title </th> <th style="text-align:right;"> Chapter </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Mar 30 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Interaction terms </td> <td style="text-align:right;"> 9 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 4 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 6 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Logistic regression </td> <td style="text-align:right;"> 20 </td> </tr> <tr> <td style="text-align:right;"> 13 </td> <td style="text-align:left;"> Apr 11 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Missing Data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 13 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 13 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Wed </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Panel Data </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 18 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Matching </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 20 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Matching </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 25 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Causal inference: Panel data </td> <td style="text-align:right;"> Handout </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> Apr 27 </td> <td style="text-align:left;"> Wed </td> <td style="text-align:left;"> Causal inference: Natural Experiments </td> <td style="text-align:right;"> Dunning </td> </tr> <tr> <td style="text-align:right;"> 16 </td> <td style="text-align:left;"> May 2 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Review </td> <td style="text-align:right;"> NA </td> </tr> </tbody> </table> --- ## Assignment schedule <table> <thead> <tr> <th style="text-align:right;"> Week </th> <th style="text-align:left;"> Date </th> <th style="text-align:left;"> Day </th> <th style="text-align:left;"> Assignment </th> <th style="text-align:right;"> Percent </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 11 </td> <td style="text-align:left;"> Apr 1 </td> <td style="text-align:left;"> Fri </td> <td style="text-align:left;"> PS08 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 12 </td> <td style="text-align:left;"> Apr 11 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Report2 </td> <td style="text-align:right;"> 8 </td> </tr> <tr> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 13 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Apr 18 </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> Mon </td> <td style="text-align:left;color: black !important;background-color: yellow !important;"> PS09 </td> <td style="text-align:right;color: black !important;background-color: yellow !important;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 14 </td> <td style="text-align:left;"> Apr 25 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> PS10 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:right;"> 15 </td> <td style="text-align:left;"> May 2 </td> <td style="text-align:left;"> Mon </td> <td style="text-align:left;"> Report3 </td> <td style="text-align:right;"> 10 </td> </tr> </tbody> </table> --- class: center, middle # Panel data --- ## Research questions <br><br> .large[ - Does increasing taxes on cigarettes reduce its consumption? - Does the legalization of medical marijuana increase its consumption? ] -- .large[ - Many questions are about change over time, across multiple units - How can we account for time trends? - How can we account for persistent differences across units? ] --- ## Time trends example: Did Giuliani reduce crime in NY? <br><br> <img src="images/blog_violent_crime_new_york_city_1986_2014_1.png" width="564" style="display: block; margin: auto;" /> .footnote[Source: https://www.motherjones.com/kevin-drum/2020/06/how-many-cops-does-new-york-city-need/] --- ## Time trends: 1/3 of crime drop precedes Giuliani <br><br> <img src="images/blog_violent_crime_new_york_city_1986_2014_2.png" width="564" style="display: block; margin: auto;" /> .footnote[Source: https://www.motherjones.com/kevin-drum/2020/06/how-many-cops-does-new-york-city-need/] --- ## Time trends: Across cities, evidence of a national drop <br><br> <img src="images/blog_violent_crime_new_york_city_1986_2014_3.png" width="564" style="display: block; margin: auto;" /> .footnote[Source: https://www.motherjones.com/kevin-drum/2020/06/how-many-cops-does-new-york-city-need/] --- ## Recall Derby Data ```r derby <- Sleuth3::ex0920 %>% clean_names() head(derby) ``` ``` year winner starters net_to_winner time speed track conditions 1 1896 Ben Brush 8 4850 127.8 35.23 Dusty Fast 2 1897 Typhoon II 6 4850 132.5 33.96 Heavy Slow 3 1898 Plaudit 4 4850 129.0 34.88 Good Fast 4 1899 Manuel 5 4850 132.0 34.09 Fast Fast 5 1900 Lieut. Gibson 7 4850 126.2 35.64 Fast Fast 6 1901 His Eminence 5 4850 127.8 35.23 Fast Fast ``` --- ## Is there a time trend? A group difference? <img src="week12_02_files/figure-html/unnamed-chunk-10-1.png" width="720" style="display: block; margin: auto;" /> --- ## Yes, both time trends and group differences <img src="week12_02_files/figure-html/unnamed-chunk-12-1.png" width="720" style="display: block; margin: auto;" /> --- ## Recall Solar Data ```r solar <- Sleuth3::ex0323 %>% clean_names() dim(solar) ``` ``` [1] 35 3 ``` ```r head(solar) ``` ``` year cancer_rate sunspot_activity 1 1938 0.8 Low 2 1939 1.3 High 3 1940 1.4 High 4 1941 1.2 High 5 1942 1.7 Low 6 1943 1.8 Low ``` --- ## Are there time trends? Group differences? <img src="week12_02_files/figure-html/unnamed-chunk-14-1.png" width="720" style="display: block; margin: auto;" /> --- ## Yes, both time trends and group differences <img src="week12_02_files/figure-html/unnamed-chunk-15-1.png" width="720" style="display: block; margin: auto;" /> --- ## Unit differences example: Eyesight & free throws .large[ - Imagine we are interested how an intervention improves free throws in basketball - Assume an aspect of eyesight a possible confounder but hard to measure - For example, one set of sports vision tests consists of "Visual Acuity, Eye Dominance, Speed of Recognition, Fixation Stability, Saccadic speed, Contrast Sensitivity, Eye Hand Coordination, Distance Stereopsis, and Eye Health." - If we compare individuals to themselves, we control for all aspects of eyesight, even factors not measured ] .footnote[https://www.aaopt.org/detail/knowledge-base-article/investigation-role-visual-skills-basketball-free-throw-shooting-accuracy] --- ## Motivation <br> .large[ - Goal: estimate a treatment effect, accounting for individual differences and time trends - Data: repeated observations of units `\(i\)` across time `\(t\)` - Time-invariant differences across groups or individuals: geography, eyesight - Year-specific trends, common across units: natural disaster, economic recession ] --- ## Three types of data .large[ - Cross sectional: multiple units, one moment in time - Most of what we've worked with this semester - Time series: one unit, multiple moments in time - Data like "Solar Radiation and Skin Cancer" from week 4 - Panel: multiple units, multiple moments in time - Individual: subjects or geographical units like countries - Entities: companies and schools - Moments: minute, month, year. - Example: Add Health data - Also known as longitudinal or time-series cross-sectional (tscs) ] --- class: center, middle # Panel Data Example: # Taxes and cigarettes --- ## Does increasing taxes reduce cigarette consumption? .large[ - The unit of analysis is state-year - The outcome of interest is `packpc`, per capita consumption of packs of cigarettes - The treatment is `taxs` , excise taxes targeted especifically at cigarettes - We estimate a dummy variable / fixed effects model, taking into account state-fixed effects `\(\alpha_i\)`: `$$pack_{it} = \beta_0 + \beta_1taxes_{it}+ \alpha_i + \epsilon_{it}$$` ] --- ## Panel data: Load data ```r # read-in cigarette <- read_csv("data/cigarette.csv") %>% clean_names() %>% select(-x) dim(cigarette) ``` ``` [1] 528 9 ``` ```r names(cigarette) ``` ``` [1] "state" "year" "cpi" "pop" "packpc" "income" "tax" "avgprs" [9] "taxs" ``` ```r head(cigarette) ``` ``` # A tibble: 6 × 9 state year cpi pop packpc income tax avgprs taxs <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 AL 1985 1.08 3973000 116. 46014968 32.5 102. 33.3 2 AR 1985 1.08 2327000 129. 26210736 37 101. 37 3 AZ 1985 1.08 3184000 105. 43956936 31 109. 36.2 4 CA 1985 1.08 26444000 100. 447102816 26 108. 32.1 5 CO 1985 1.08 3209000 113. 49466672 31 94.3 31 6 CT 1985 1.08 3201000 109. 60063368 42 128. 51.5 ``` --- ## Visualizing `packpc` vs `taxs` <img src="week12_02_files/figure-html/plot_baseline-1.png" width="720" style="display: block; margin: auto;" /> --- ## Visualizing `packpc` vs `taxs` <img src="week12_02_files/figure-html/plot_baseline_text-1.png" width="720" style="display: block; margin: auto;" /> --- ## Visualizing `packpc` vs `taxs` <img src="week12_02_files/figure-html/plot_baseline_text2-1.png" width="720" style="display: block; margin: auto;" /> --- ## Panel data: Unit-Time ```r # Note state repeats, year changes cigarette %>% arrange(state, year) %>% slice(1:9) %>% kable(format = 'html') %>% kable_styling() ``` <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> state </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> cpi </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> packpc </th> <th style="text-align:right;"> income </th> <th style="text-align:right;"> tax </th> <th style="text-align:right;"> avgprs </th> <th style="text-align:right;"> taxs </th> <th style="text-align:left;"> state_year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3973000 </td> <td style="text-align:right;"> 116.5 </td> <td style="text-align:right;"> 46014968 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 102.2 </td> <td style="text-align:right;"> 33.35 </td> <td style="text-align:left;"> AL85 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1986 </td> <td style="text-align:right;"> 1.096 </td> <td style="text-align:right;"> 3992000 </td> <td style="text-align:right;"> 117.2 </td> <td style="text-align:right;"> 48703940 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 108.0 </td> <td style="text-align:right;"> 33.41 </td> <td style="text-align:left;"> AL86 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1987 </td> <td style="text-align:right;"> 1.136 </td> <td style="text-align:right;"> 4016000 </td> <td style="text-align:right;"> 115.8 </td> <td style="text-align:right;"> 51846312 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 113.5 </td> <td style="text-align:right;"> 33.46 </td> <td style="text-align:left;"> AL87 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1988 </td> <td style="text-align:right;"> 1.183 </td> <td style="text-align:right;"> 4024000 </td> <td style="text-align:right;"> 115.3 </td> <td style="text-align:right;"> 55698852 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 120.0 </td> <td style="text-align:right;"> 33.53 </td> <td style="text-align:left;"> AL88 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1989 </td> <td style="text-align:right;"> 1.240 </td> <td style="text-align:right;"> 4030000 </td> <td style="text-align:right;"> 109.2 </td> <td style="text-align:right;"> 60044480 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 133.3 </td> <td style="text-align:right;"> 33.66 </td> <td style="text-align:left;"> AL89 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1990 </td> <td style="text-align:right;"> 1.307 </td> <td style="text-align:right;"> 4048508 </td> <td style="text-align:right;"> 111.7 </td> <td style="text-align:right;"> 64094948 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 143.4 </td> <td style="text-align:right;"> 33.76 </td> <td style="text-align:left;"> AL90 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1991 </td> <td style="text-align:right;"> 1.362 </td> <td style="text-align:right;"> 4091025 </td> <td style="text-align:right;"> 107.0 </td> <td style="text-align:right;"> 67649568 </td> <td style="text-align:right;"> 34.5 </td> <td style="text-align:right;"> 161.7 </td> <td style="text-align:right;"> 35.94 </td> <td style="text-align:left;"> AL91 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1992 </td> <td style="text-align:right;"> 1.403 </td> <td style="text-align:right;"> 4139269 </td> <td style="text-align:right;"> 106.9 </td> <td style="text-align:right;"> 72281824 </td> <td style="text-align:right;"> 36.5 </td> <td style="text-align:right;"> 176.1 </td> <td style="text-align:right;"> 38.08 </td> <td style="text-align:left;"> AL92 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;"> 1993 </td> <td style="text-align:right;"> 1.445 </td> <td style="text-align:right;"> 4193114 </td> <td style="text-align:right;"> 105.0 </td> <td style="text-align:right;"> 75439176 </td> <td style="text-align:right;"> 38.5 </td> <td style="text-align:right;"> 164.7 </td> <td style="text-align:right;"> 39.97 </td> <td style="text-align:left;"> AL93 </td> </tr> </tbody> </table> --- ## Panel data: Unit variable is `state` ```r # Note sorting by state we see state repeats over multiple years cigarette %>% arrange(state, year) %>% slice(1:9) %>% kable(format = 'html') %>% kable_styling() %>% column_spec(1, background = "yellow") ``` <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> state </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> cpi </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> packpc </th> <th style="text-align:right;"> income </th> <th style="text-align:right;"> tax </th> <th style="text-align:right;"> avgprs </th> <th style="text-align:right;"> taxs </th> <th style="text-align:left;"> state_year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3973000 </td> <td style="text-align:right;"> 116.5 </td> <td style="text-align:right;"> 46014968 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 102.2 </td> <td style="text-align:right;"> 33.35 </td> <td style="text-align:left;"> AL85 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1986 </td> <td style="text-align:right;"> 1.096 </td> <td style="text-align:right;"> 3992000 </td> <td style="text-align:right;"> 117.2 </td> <td style="text-align:right;"> 48703940 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 108.0 </td> <td style="text-align:right;"> 33.41 </td> <td style="text-align:left;"> AL86 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1987 </td> <td style="text-align:right;"> 1.136 </td> <td style="text-align:right;"> 4016000 </td> <td style="text-align:right;"> 115.8 </td> <td style="text-align:right;"> 51846312 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 113.5 </td> <td style="text-align:right;"> 33.46 </td> <td style="text-align:left;"> AL87 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1988 </td> <td style="text-align:right;"> 1.183 </td> <td style="text-align:right;"> 4024000 </td> <td style="text-align:right;"> 115.3 </td> <td style="text-align:right;"> 55698852 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 120.0 </td> <td style="text-align:right;"> 33.53 </td> <td style="text-align:left;"> AL88 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1989 </td> <td style="text-align:right;"> 1.240 </td> <td style="text-align:right;"> 4030000 </td> <td style="text-align:right;"> 109.2 </td> <td style="text-align:right;"> 60044480 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 133.3 </td> <td style="text-align:right;"> 33.66 </td> <td style="text-align:left;"> AL89 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1990 </td> <td style="text-align:right;"> 1.307 </td> <td style="text-align:right;"> 4048508 </td> <td style="text-align:right;"> 111.7 </td> <td style="text-align:right;"> 64094948 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 143.4 </td> <td style="text-align:right;"> 33.76 </td> <td style="text-align:left;"> AL90 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1991 </td> <td style="text-align:right;"> 1.362 </td> <td style="text-align:right;"> 4091025 </td> <td style="text-align:right;"> 107.0 </td> <td style="text-align:right;"> 67649568 </td> <td style="text-align:right;"> 34.5 </td> <td style="text-align:right;"> 161.7 </td> <td style="text-align:right;"> 35.94 </td> <td style="text-align:left;"> AL91 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1992 </td> <td style="text-align:right;"> 1.403 </td> <td style="text-align:right;"> 4139269 </td> <td style="text-align:right;"> 106.9 </td> <td style="text-align:right;"> 72281824 </td> <td style="text-align:right;"> 36.5 </td> <td style="text-align:right;"> 176.1 </td> <td style="text-align:right;"> 38.08 </td> <td style="text-align:left;"> AL92 </td> </tr> <tr> <td style="text-align:left;background-color: yellow !important;"> AL </td> <td style="text-align:right;"> 1993 </td> <td style="text-align:right;"> 1.445 </td> <td style="text-align:right;"> 4193114 </td> <td style="text-align:right;"> 105.0 </td> <td style="text-align:right;"> 75439176 </td> <td style="text-align:right;"> 38.5 </td> <td style="text-align:right;"> 164.7 </td> <td style="text-align:right;"> 39.97 </td> <td style="text-align:left;"> AL93 </td> </tr> </tbody> </table> --- ## Panel data: Time variable is `year` ```r # Note sorting by state we see state repeats over multiple years cigarette %>% arrange(state, year) %>% slice(1:9) %>% kable(format = 'html') %>% kable_styling() %>% column_spec(2, background = "yellow") ``` <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> state </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> cpi </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> packpc </th> <th style="text-align:right;"> income </th> <th style="text-align:right;"> tax </th> <th style="text-align:right;"> avgprs </th> <th style="text-align:right;"> taxs </th> <th style="text-align:left;"> state_year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3973000 </td> <td style="text-align:right;"> 116.5 </td> <td style="text-align:right;"> 46014968 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 102.2 </td> <td style="text-align:right;"> 33.35 </td> <td style="text-align:left;"> AL85 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1986 </td> <td style="text-align:right;"> 1.096 </td> <td style="text-align:right;"> 3992000 </td> <td style="text-align:right;"> 117.2 </td> <td style="text-align:right;"> 48703940 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 108.0 </td> <td style="text-align:right;"> 33.41 </td> <td style="text-align:left;"> AL86 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1987 </td> <td style="text-align:right;"> 1.136 </td> <td style="text-align:right;"> 4016000 </td> <td style="text-align:right;"> 115.8 </td> <td style="text-align:right;"> 51846312 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 113.5 </td> <td style="text-align:right;"> 33.46 </td> <td style="text-align:left;"> AL87 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1988 </td> <td style="text-align:right;"> 1.183 </td> <td style="text-align:right;"> 4024000 </td> <td style="text-align:right;"> 115.3 </td> <td style="text-align:right;"> 55698852 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 120.0 </td> <td style="text-align:right;"> 33.53 </td> <td style="text-align:left;"> AL88 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1989 </td> <td style="text-align:right;"> 1.240 </td> <td style="text-align:right;"> 4030000 </td> <td style="text-align:right;"> 109.2 </td> <td style="text-align:right;"> 60044480 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 133.3 </td> <td style="text-align:right;"> 33.66 </td> <td style="text-align:left;"> AL89 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1990 </td> <td style="text-align:right;"> 1.307 </td> <td style="text-align:right;"> 4048508 </td> <td style="text-align:right;"> 111.7 </td> <td style="text-align:right;"> 64094948 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 143.4 </td> <td style="text-align:right;"> 33.76 </td> <td style="text-align:left;"> AL90 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1991 </td> <td style="text-align:right;"> 1.362 </td> <td style="text-align:right;"> 4091025 </td> <td style="text-align:right;"> 107.0 </td> <td style="text-align:right;"> 67649568 </td> <td style="text-align:right;"> 34.5 </td> <td style="text-align:right;"> 161.7 </td> <td style="text-align:right;"> 35.94 </td> <td style="text-align:left;"> AL91 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1992 </td> <td style="text-align:right;"> 1.403 </td> <td style="text-align:right;"> 4139269 </td> <td style="text-align:right;"> 106.9 </td> <td style="text-align:right;"> 72281824 </td> <td style="text-align:right;"> 36.5 </td> <td style="text-align:right;"> 176.1 </td> <td style="text-align:right;"> 38.08 </td> <td style="text-align:left;"> AL92 </td> </tr> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1993 </td> <td style="text-align:right;"> 1.445 </td> <td style="text-align:right;"> 4193114 </td> <td style="text-align:right;"> 105.0 </td> <td style="text-align:right;"> 75439176 </td> <td style="text-align:right;"> 38.5 </td> <td style="text-align:right;"> 164.7 </td> <td style="text-align:right;"> 39.97 </td> <td style="text-align:left;"> AL93 </td> </tr> </tbody> </table> --- ## Panel data: Each `year` has all states ```r # Note, when we sort by year, we see for each year, multiple states cigarette %>% * arrange(year, state) %>% slice(1:9) %>% kable(format = 'html') %>% kable_styling() %>% column_spec(2, background = "yellow") ``` <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> state </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> cpi </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> packpc </th> <th style="text-align:right;"> income </th> <th style="text-align:right;"> tax </th> <th style="text-align:right;"> avgprs </th> <th style="text-align:right;"> taxs </th> <th style="text-align:left;"> state_year </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> AL </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3973000 </td> <td style="text-align:right;"> 116.5 </td> <td style="text-align:right;"> 46014968 </td> <td style="text-align:right;"> 32.5 </td> <td style="text-align:right;"> 102.18 </td> <td style="text-align:right;"> 33.35 </td> <td style="text-align:left;"> AL85 </td> </tr> <tr> <td style="text-align:left;"> AR </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 2327000 </td> <td style="text-align:right;"> 128.5 </td> <td style="text-align:right;"> 26210736 </td> <td style="text-align:right;"> 37.0 </td> <td style="text-align:right;"> 101.47 </td> <td style="text-align:right;"> 37.00 </td> <td style="text-align:left;"> AR85 </td> </tr> <tr> <td style="text-align:left;"> AZ </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3184000 </td> <td style="text-align:right;"> 104.5 </td> <td style="text-align:right;"> 43956936 </td> <td style="text-align:right;"> 31.0 </td> <td style="text-align:right;"> 108.58 </td> <td style="text-align:right;"> 36.17 </td> <td style="text-align:left;"> AZ85 </td> </tr> <tr> <td style="text-align:left;"> CA </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 26444000 </td> <td style="text-align:right;"> 100.4 </td> <td style="text-align:right;"> 447102816 </td> <td style="text-align:right;"> 26.0 </td> <td style="text-align:right;"> 107.84 </td> <td style="text-align:right;"> 32.10 </td> <td style="text-align:left;"> CA85 </td> </tr> <tr> <td style="text-align:left;"> CO </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3209000 </td> <td style="text-align:right;"> 113.0 </td> <td style="text-align:right;"> 49466672 </td> <td style="text-align:right;"> 31.0 </td> <td style="text-align:right;"> 94.27 </td> <td style="text-align:right;"> 31.00 </td> <td style="text-align:left;"> CO85 </td> </tr> <tr> <td style="text-align:left;"> CT </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 3201000 </td> <td style="text-align:right;"> 109.3 </td> <td style="text-align:right;"> 60063368 </td> <td style="text-align:right;"> 42.0 </td> <td style="text-align:right;"> 128.03 </td> <td style="text-align:right;"> 51.48 </td> <td style="text-align:left;"> CT85 </td> </tr> <tr> <td style="text-align:left;"> DE </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 618000 </td> <td style="text-align:right;"> 143.9 </td> <td style="text-align:right;"> 9927301 </td> <td style="text-align:right;"> 30.0 </td> <td style="text-align:right;"> 102.49 </td> <td style="text-align:right;"> 30.00 </td> <td style="text-align:left;"> DE85 </td> </tr> <tr> <td style="text-align:left;"> FL </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 11352000 </td> <td style="text-align:right;"> 122.2 </td> <td style="text-align:right;"> 166919248 </td> <td style="text-align:right;"> 37.0 </td> <td style="text-align:right;"> 115.29 </td> <td style="text-align:right;"> 42.49 </td> <td style="text-align:left;"> FL85 </td> </tr> <tr> <td style="text-align:left;"> GA </td> <td style="text-align:right;background-color: yellow !important;"> 1985 </td> <td style="text-align:right;"> 1.076 </td> <td style="text-align:right;"> 5963000 </td> <td style="text-align:right;"> 127.2 </td> <td style="text-align:right;"> 78364336 </td> <td style="text-align:right;"> 28.0 </td> <td style="text-align:right;"> 97.03 </td> <td style="text-align:right;"> 28.84 </td> <td style="text-align:left;"> GA85 </td> </tr> </tbody> </table> --- ## Visualizing `packpc` vs `taxs` in KY, NH and UT <img src="week12_02_files/figure-html/plot_highlight-1.png" width="720" style="display: block; margin: auto;" /> --- ## Estimation: individual-fixed effects (or dummies) <br><br><br> .large[ - One way to estimate a fixed effects (dummy variable) regression is through de-meaning - Note each state is different from one another, in absence of treatment (taxes) - These individual differences are assumed to be time-invariant (e.g., geography or culture) ] --- class: center, middle # Panel data: De-meaning --- ## What do we mean by "de-meaning"? ```r data <- 1:11 data ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 11 ``` ```r mean(data) ``` ``` [1] 6 ``` ```r data - mean(data) ``` ``` [1] -5 -4 -3 -2 -1 0 1 2 3 4 5 ``` --- ## De-meaning `packpc` for KY ```r data <- cigarette$packpc[cigarette$state == "KY"] data ``` ``` [1] 186.0 181.8 176.0 172.5 176.1 186.5 169.1 164.5 163.0 164.8 172.6 ``` ```r mean(data) ``` ``` [1] 173.9 ``` ```r data - mean(data) ``` ``` [1] 12.130 7.901 2.045 -1.397 2.190 12.577 -4.765 -9.410 -10.916 [10] -9.096 -1.257 ``` --- ## De-meaning `packpc` for UT ```r data <- cigarette$packpc[cigarette$state == "UT"] data ``` ``` [1] 68.05 64.76 68.53 55.74 57.44 53.01 52.48 52.54 52.08 51.13 49.27 ``` ```r mean(data) ``` ``` [1] 56.82 ``` ```r data - mean(data) ``` ``` [1] 11.2240 7.9402 11.7117 -1.0826 0.6221 -3.8079 -4.3374 -4.2831 -4.7431 [10] -5.6939 -7.5500 ``` --- ## What do we mean by mean-shifting or "de-meaning"? <img src="images/ss_display_5_17.png" width="80%" style="display: block; margin: auto;" /> .footnote[*Statistical Sleuth*, Display 5.17] --- ## Revisiting mean-shifing or "de-meaning" <img src="images/ss_display_5_17_markedup.png" width="80%" style="display: block; margin: auto;" /> .footnote[*Statistical Sleuth*, Display 5.17] --- ## Fixed-effects visualization <img src="week12_02_files/figure-html/unnamed-chunk-29-1.png" width="720" style="display: block; margin: auto;" /> --- ## Fixed-effects visualization: highlighting KY, NH and UT <img src="week12_02_files/figure-html/unnamed-chunk-30-1.png" width="720" style="display: block; margin: auto;" /> --- ## Fixed-effects visualization: mean of KY <img src="week12_02_files/figure-html/unnamed-chunk-31-1.png" width="720" style="display: block; margin: auto;" /> --- ## Fixed-effects visualization: mean of NH <img src="week12_02_files/figure-html/unnamed-chunk-32-1.png" width="720" style="display: block; margin: auto;" /> --- ## Fixed-effects visualization: mean of UT <img src="week12_02_files/figure-html/unnamed-chunk-33-1.png" width="720" style="display: block; margin: auto;" /> --- ## De-meaning by calculation - To adjust for state-specific differences, we can de-mean outcome, treatment and covariates by state `$$y_{it} = \beta_1x_{it} + \alpha_i + \epsilon_{it}$$` `$$y_{it} - \overline y_i= \beta_1(x_{it} - \overline{x_i}) + (\epsilon_{it} - \overline{\epsilon_i}),\\$$` where `$$\overline{x_i} = \frac{1}{T}\sum\limits_{t = 1}^Tx_{it} \textrm{ and } \overline{\epsilon_i} = \frac{1}{T}\sum\limits ^T_{t = 1}\epsilon_{it}$$` - This effectively subtracts the unobserved state-specific `\(\alpha_i\)` --- ## Calculating the mean-shift or de-meaning in R ```r # de-mean by state cigarette <- cigarette %>% group_by(state) %>% mutate( packpc_st_mean = mean(packpc, na.rm = TRUE), taxs_st_mean = mean(taxs, na.rm = TRUE), packpc_demean = packpc - packpc_st_mean, taxs_demean = taxs - taxs_st_mean ) %>% ungroup() ``` --- ## Viewing the new de-meaned columns ```r cigarette %>% filter(state == "AL") %>% select(state, year, packpc, packpc_st_mean, packpc_demean) %>% head() ``` ``` # A tibble: 6 × 5 state year packpc packpc_st_mean packpc_demean <chr> <dbl> <dbl> <dbl> <dbl> 1 AL 1985 116. 110. 6.57 2 AL 1986 117. 110. 7.24 3 AL 1987 116. 110. 5.92 4 AL 1988 115. 110. 5.34 5 AL 1989 109. 110. -0.710 6 AL 1990 112. 110. 1.83 ``` --- ## Viewing the new de-meaned columns ```r cigarette %>% filter(state == "AL") %>% select(state, year, taxs, taxs_st_mean, taxs_demean) %>% head() ``` ``` # A tibble: 6 × 5 state year taxs taxs_st_mean taxs_demean <chr> <dbl> <dbl> <dbl> <dbl> 1 AL 1985 33.3 36.3 -2.92 2 AL 1986 33.4 36.3 -2.86 3 AL 1987 33.5 36.3 -2.81 4 AL 1988 33.5 36.3 -2.74 5 AL 1989 33.7 36.3 -2.61 6 AL 1990 33.8 36.3 -2.51 ``` --- ## Fixed-effects visualization: after de-meaning <img src="week12_02_files/figure-html/plot_after_demeaning-1.png" width="720" style="display: block; margin: auto;" /> --- ## Fixed-effects estimation .large[ - Unit fixed effects / dummy variables soak up unit specific traits that don't vary over time - Analogously, time-fixed effects / dummy variables soak up year specific traits that don't vary across units - Variation in the outcome and the treatment no longer due to time or unit-invariant characteristics - Information leveraged from deviations in the treatment and covariates from their mean ] --- class: center, middle # Panel Data: Regression --- ## Regression with state and year fixed effects (dummies) <img src="week12_02_files/figure-html/unnamed-chunk-37-1.png" width="720" style="display: block; margin: auto;" /> --- ## Equivalent estimation: dummy variables - Equivalently, we can estimate a fixed-effects model with dummy variables for each unit and time - With our cigarette taxes example: `$$pack_{it} = \beta_0 + \beta_1taxes_{it} + \alpha_1 Alabama_{it} + \alpha_2 Arizona_{it} + \text{...} + \\ \delta_1 1985_{it} + \delta_2 1986_{it} + \text{...} + \epsilon_{it}$$` - Why? Adding a dummy variable effectively estimates a separate mean for each unit - For Alabama, the unit-specific intercept is: `\(\beta_0 + \alpha_1\)` - For the year 1985, the time-specific intercept is `\(\beta_0 + \delta_1\)`. --- ## Visualize state and year-fixed effects (dummies) ```r # fe regression with dummies fit_fixed <- lm( packpc ~ taxs + as.factor(state) + as.factor(year), data = cigarette ) # plot fe regression interact_plot( fit_fixed, pred = taxs, modx = state, modx.values = c("KY", "NH", "UT"), mod2 = year, mod2.values = c(1985, 1995), data = cigarette, colors = "Qual2", vary.lty = FALSE ) ``` --- ## Notice intercept shifts for state and year <img src="week12_02_files/figure-html/unnamed-chunk-38-1.png" width="720" style="display: block; margin: auto;" /> --- ## Comparison dummy vs. demeaned regression ```r # regression of demeaned data fit_demean <- cigarette %>% lm(packpc_demean ~ taxs_demean, data = .) # regression with state fixed effects / dummies fit_fixed_state <- cigarette %>% lm(packpc ~ taxs + as.factor(state), data = .) stargazer( fit_demean, fit_fixed_state, single.row = TRUE, type = "html", omit = c("state", "Constant"), omit.stat = "all", header = FALSE ) ``` --- ## What does the similarity tell us? <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr> <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr> <tr><td style="text-align:left"></td><td>packpc_demean</td><td>packpc</td></tr> <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">taxs_demean</td><td>-0.847<sup>***</sup> (0.023)</td><td></td></tr> <tr><td style="text-align:left">taxs</td><td></td><td>-0.847<sup>***</sup> (0.024)</td></tr> <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr> </table> --- ## Pros & cons of panel data .large[ - Cons - Expensive and time consuming to build - May not be available for your particular question - Non-response or attrition over time - Can demand a bit more data work (e.g., merging, reshaping) <!-- - Need to leverage within-group and across-unit variation --> ] -- .large[ - Pros - Big challenge with observational data is controlling for potential unobserved confounders - Can control for unobserved, time-invariant unit traits like culture or institutions - Can control for unobserved, across-unit trends in time - Well-established methodology for panel data ] --- ## Summary .large[ - Fixed-effects model is one way of dealing with unobserved heterogeneities - There is no free lunch: strong assumptions about both treatment effect and heterogeneities - How can we know what we do not observe? - If you have variables that change across units and time, control for them - Incorporating fixed effects *clusters* data into smaller groups, risks overfitting - Trade-off between parsimony and explanatory power ] --- class: middle, center # Questions? --- class: middle, center # Appendix --- ## Visualizing `packpc` vs `taxs` <img src="week12_02_files/figure-html/unnamed-chunk-41-1.png" width="720" style="display: block; margin: auto;" /> --- ## Sample code: plot highlight ```r # highlight only KY, NH and UT cig_plot <- cigarette %>% ggplot() + geom_point( aes( x = taxs, y = packpc, color = state ) ) + gghighlight( * state %in% c("KY", "NH", "UT"), use_group_by = F, use_direct_label = F ) + coord_cartesian( ylim = c(-50, 250) ) + scale_color_brewer( palette = "Set2" ) + ggtitle("Cigarettes: per capita consumption vs. excise taxes") cig_plot ``` --- ## Sample code: visualization: after demeaning ```r cigarette %>% ggplot() + geom_point( aes( x = taxs_demean, y = packpc_demean, color = state ) ) + gghighlight( state %in% c("KY", "NH", "UT"), use_group_by = FALSE, use_direct_label = FALSE ) + coord_cartesian( ylim = c(-50, 250) ) + scale_color_brewer( palette = "Set2" ) + ggtitle("Cigarettes: per capita consumption vs. excise taxes (de-meaned)") ```