A list of helpful tips creating tables.
This tutorial will be used to explain how you can make a table with
the kable
function. This type of table is specifically
useful when you are trying to make a table for summary statistics with
mean, median, number of terms, etc for your data. It is also useful to
create a table with certain rows of your data or a summary table for a
t-test.
If you do not have kable Extra yet, use install.packages to install. Then, we’ll download the package and go through an example. In order to access the example, use library to access the Sleuth3 package and put a meaningful name to ex0222. This example contains data documenting scores on the Armed Forces Qualifying Tests, which is a test for intelligence. This study was done to settle a lot of controversial and definitely wrong debates regarding the intelligence of women versus men. In particular, the test gives a score for arithmetic reasoning, word knowledge, paraphgraph comprehension, and mathmatical knowledge.
# set global options
knitr::opts_chunk$set(echo = TRUE)
# tidyverse packages
library(dplyr)
library(broom)
# table packages
library(xtable)
library(kableExtra)
Rows: 2,584
Columns: 6
$ Gender <fct> male, female, male, female, female, female, female, m…
$ Arith <int> 19, 23, 30, 30, 13, 8, 10, 4, 12, 3, 30, 10, 10, 28, …
$ Word <int> 27, 34, 35, 35, 30, 15, 17, 17, 33, 11, 33, 16, 16, 3…
$ Parag <int> 14, 11, 14, 13, 11, 6, 6, 6, 13, 5, 15, 3, 11, 14, 5,…
$ Math <int> 14, 20, 25, 21, 12, 4, 7, 6, 11, 6, 24, 7, 6, 18, 7, …
$ AFQT <dbl> 70.3, 60.4, 98.3, 84.7, 44.5, 4.0, 11.8, 8.9, 44.7, 2…
Now, that we’ve downloaded and explored the data a little bit, let’s make a summary statistics table which gives the average for math, word, paragraph, and artithmeitc scores per gender using kable.
To do this, first make a data frame for your summary statistics.
# silence noisy messages from summarize command
options(dplyr.summarise.inform = FALSE)
# calculate summary stats
summary_stats <- AFQT %>%
group_by(Gender) %>%
summarize(
mean_Arith = mean(Arith),
mean_Word = mean(Word),
mean_Paragraph = mean(Parag),
mean_Math = mean(Math),
number_of_subjects = n()
)
And now, let’s make the table! to do this use the data fram for summary stats you just made and use a pipe to carry to kable. in the kable function, format will refer to how it prints after knitting. If knitting to HTML, use “html and if to PDF use”latex” , caption lets you make a title and booktabs makes neaer when knitting to PDF. Then pipe the kable function to kable_styling to add nicer formation like making stripes and adjusting width
summary_stats %>%
kable(
format = "html",
caption = "Test Scores Summary by Gender",
booktabs = TRUE,
digits = 2
) %>%
kable_styling(
bootstrap_options = "striped",
full_width = FALSE
)
Gender | mean_Arith | mean_Word | mean_Paragraph | mean_Math | number_of_subjects |
---|---|---|---|---|---|
female | 17.5 | 26.6 | 11.5 | 13.8 | 1278 |
male | 19.5 | 26.6 | 10.9 | 14.6 | 1306 |
One interesting question for researches was to see if there were signficantly different results between genders on sections of the tests. So let’s do t-tests for each section and make tables for each using kable using kable.
names(AFQT)
[1] "Gender" "Arith" "Word" "Parag" "Math" "AFQT"
t_arith <- t.test(Arith ~ Gender, data = AFQT)
t_Word <- t.test(Word ~ Gender, data = AFQT)
t_parag <- t.test(Parag ~ Gender, data = AFQT)
t_math <- t.test(Math ~ Gender, data = AFQT)
# Converts t.test object to data.frame
t_arith_df <- tidy(t_arith)
t_word_df <- tidy(t_Word)
t_parag_df <- tidy(t_parag)
t_math_df <- tidy(t_math)
t_arith_df %>%
dplyr::select(-method, -alternative) %>% # drop extra cols
# rename to make names more understandable
# enclosing column name in tick marks allows for spaces
rename(
`Mean Group Female` = estimate1,
`Mean Group Male` = estimate2,
`t-statistic` = statistic,
df = parameter
) %>%
kable(
format = "html",
caption = "t-test for Arithmetic vs Gender",
booktabs = TRUE,
digits = 2
) %>%
kable_styling(
bootstrap_options = "striped",
full_width = FALSE
)
estimate | Mean Group Female | Mean Group Male | t-statistic | p.value | df | conf.low | conf.high |
---|---|---|---|---|---|---|---|
-2.04 | 17.5 | 19.5 | -7.31 | 0 | 2574 | -2.58 | -1.49 |
t_math_df %>%
dplyr::select(-method, -alternative) %>% # drop extra cols
# rename to make names more understandable
# enclosing column name in tick marks allows for spaces
rename(
`Mean Group Female` = estimate1,
`Mean Group Male` = estimate2,
`t-statistic` = statistic,
df = parameter
) %>%
kable(
format = "html",
caption = "t-test for Math vs Gender",
booktabs = TRUE,
digits = 2
) %>%
kable_styling(
bootstrap_options = "striped",
full_width = FALSE
)
estimate | Mean Group Female | Mean Group Male | t-statistic | p.value | df | conf.low | conf.high |
---|---|---|---|---|---|---|---|
-0.75 | 13.8 | 14.6 | -3.05 | 0 | 2573 | -1.24 | -0.27 |
t_word_df %>%
dplyr::select(-method, -alternative) %>% # drop cols
# rename to make names more understandable
# enclosing column name in tick marks allows for spaces
rename(
`Mean Group Female` = estimate1,
`Mean Group Male` = estimate2,
`t-statistic` = statistic,
df = parameter
) %>%
kable(
format = "html",
caption = "t-test for Word vs Gender",
booktabs = TRUE,
digits = 2
) %>%
kable_styling(
bootstrap_options = "striped",
full_width = FALSE
)
estimate | Mean Group Female | Mean Group Male | t-statistic | p.value | df | conf.low | conf.high |
---|---|---|---|---|---|---|---|
0.02 | 26.6 | 26.6 | 0.08 | 0.94 | 2581 | -0.52 | 0.57 |
t_parag_df %>%
dplyr::select(-method, -alternative) %>% # drop cols
# renames to make names more understandable
# enclosing column name in tick marks allows for spaces
rename(
`Mean Group Female` = estimate1,
`Mean Group Male` = estimate2,
`t-statistic` = statistic,
df = parameter
) %>%
kable(
format = "html",
caption = "t-test for Paragraph vs Gender",
booktabs = TRUE,
digits = 2
) %>%
kable_styling(
bootstrap_options = "striped",
full_width = FALSE
)
estimate | Mean Group Female | Mean Group Male | t-statistic | p.value | df | conf.low | conf.high |
---|---|---|---|---|---|---|---|
0.57 | 11.5 | 10.9 | 4.6 | 0 | 2562 | 0.33 | 0.81 |
What conclusions can we draw from these tests? Are there confounding factors that would limit these conclusions?
Lastly, let’s make a table that views the first five rows of the data. This is a good skill if you want to see a quick preview of data/explore it before doing anaylsis and show that exploration in a neat way. Can do this using slice which allows you to extract certain rows.
AFQT %>%
slice(1:5) %>% # extracts 5 rows
kable(
format = "html", # format = "latex" for pdfs
caption = "Some AFQT data",
digits = 2
) %>%
kable_styling(full_width = FALSE)
Gender | Arith | Word | Parag | Math | AFQT |
---|---|---|---|---|---|
male | 19 | 27 | 14 | 14 | 70.3 |
female | 23 | 34 | 11 | 20 | 60.4 |
male | 30 | 35 | 14 | 25 | 98.3 |
female | 30 | 35 | 13 | 21 | 84.7 |
female | 13 | 30 | 11 | 12 | 44.5 |
xtable
is another table style that prints some object as
either a LaTeX or HTML table. In this tutorial, we will run through some
sample code for the uses of ANOVA, as well as some tips and tricks for
its usage.
xtable
can print many R objects in a new object of class
xtable
. Two common examples of types of tables that can be
produced with xtable
are ANOVA tables and tables of whole
data frames. In POL90, xtable
is most commonly used for
ANOVA tables.
First, let’s read in an example data frame that we can work with in our tables. The following data set shows the years of the Kentucky Derby, the winners, their average speed and track conditions between 1896-2011.
derby <- Sleuth3::ex0920 %>% janitor::clean_names()
To show how to use xtable
to print the data frame as a
table, we will start by using head(derby) to print just the first six
rows of the data frame.
year | winner | starters | net_to_winner | time | speed | track | conditions | |
---|---|---|---|---|---|---|---|---|
1 | 1896 | Ben Brush | 8 | 4850 | 127.75 | 35.23 | Dusty | Fast |
2 | 1897 | Typhoon II | 6 | 4850 | 132.50 | 33.96 | Heavy | Slow |
3 | 1898 | Plaudit | 4 | 4850 | 129.00 | 34.88 | Good | Fast |
4 | 1899 | Manuel | 5 | 4850 | 132.00 | 34.09 | Fast | Fast |
5 | 1900 | Lieut. Gibson | 7 | 4850 | 126.25 | 35.64 | Fast | Fast |
6 | 1901 | His Eminence | 5 | 4850 | 127.75 | 35.23 | Fast | Fast |
There are two important things to remember here. First, in order to have the table print, it is necessary to place “results = ‘asis’” in the chunk header. Second, notice print(type = “html”). This can be changed to print(type = “latex”), depending on the output file type.
Now we will discuss the more common type of table that will made with
xtable
: an ANOVA table. ANOVA tables are made to compare
various models of relationships with data in order to find the model
with the best fit. Suppose we have three linear regression models for
our derby data, as shown below.
In order to compare the fits of these three models, we would make an
ANOVA table with xtable
, as is shown below. Here, an ANOVA
object is being piped into xtable
. Once again, don’t forget
to include results = ‘asis’ to see the table when knitting.
anova(reduced, full, interact) %>%
xtable() %>%
print(type = "html") # change to type = "latex" for PDF output
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
---|---|---|---|---|---|---|
1 | 114 | 41.84 | ||||
2 | 108 | 21.37 | 6 | 20.46 | 17.03 | 0.0000 |
3 | 103 | 20.62 | 5 | 0.75 | 0.75 | 0.5893 |
Regression outputs can also be visualized in xtable
, as
seen below with a lienar regression. The same concept applies to glm
regressions as well. However, stargazer is most likely the better option
in this case, as stargazer is better at producing a professional-looking
regression table with the stars showing statistical significance.
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 6.0947 | 2.6061 | 2.34 | 0.0212 |
year | 0.0154 | 0.0014 | 11.35 | 0.0000 |
trackFast | 0.3247 | 0.4556 | 0.71 | 0.4776 |
trackGood | 0.0208 | 0.4739 | 0.04 | 0.9650 |
trackHeavy | -1.3254 | 0.4809 | -2.76 | 0.0069 |
trackMuddy | -0.7660 | 0.4845 | -1.58 | 0.1168 |
trackSloppy | -0.3726 | 0.5068 | -0.74 | 0.4638 |
trackSlow | -0.3714 | 0.4895 | -0.76 | 0.4496 |
Now that we have discussed what xtable
can be used for
and the barebones code of how to make a table, we will discuss other
options to make our tables look just how we want them.
Titles for xtables objects can be made using the caption option.
anova(reduced, full, interact) %>%
xtable(
caption = "ANOVA table for Derby linear models"
) %>%
print(type = "html") # change to type = "latex" for PDF output
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
---|---|---|---|---|---|---|
1 | 114 | 41.84 | ||||
2 | 108 | 21.37 | 6 | 20.46 | 17.03 | 0.0000 |
3 | 103 | 20.62 | 5 | 0.75 | 0.75 | 0.5893 |
Oftentimes, the position of the table itself floats. Use table.placement = “h” to fix this.
anova(reduced, full, interact) %>%
xtable(
caption = "ANOVA table for Derby linear models",
table.placement = "h"
) %>%
print(type = "html") # type = "latex" for PDFs
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
---|---|---|---|---|---|---|
1 | 114 | 41.84 | ||||
2 | 108 | 21.37 | 6 | 20.46 | 17.03 | 0.0000 |
3 | 103 | 20.62 | 5 | 0.75 | 0.75 | 0.5893 |
If this doesn’t work on its own, you may need to add
\usepackage{float}
to the header of your r markdown file,
as seen below:
: "Example"
title: "29 April 2019"
date: pdf_document:
output-includes:
header- \usepackage{float}
Often, when knitting to a pdf with latex, there is a message that is produced that says “latex table generated in R 3.5.2 by xtable 1.8-3 package”. To correct this, after loading the xtable library, insert:
This supplement was put together by Amna Amin, Kavya Chaturvedi and Omar Wasow.