vignettes/paired.Rmd
paired.Rmd
Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro %paired
was written to create summary tables with a single call. With the increasing interest in R, we have developed the function paired()
to create similar tables within the R environment.
This vignette is light on purpose; paired()
piggybacks off of tableby, so most documentation there applies here, too.
The first step when using the paired()
function is to load the arsenal
package. We can’t use mockstudy
here because we need a dataset with paired observations, so we’ll create our own dataset.
library(arsenal) dat <- data.frame( tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)), id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6), Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"), Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")), Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA), Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")), Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4), stringsAsFactors = FALSE )
To create a simple table stratified by time point, use a formula=
statement to specify the variables that you want summarized and the id=
argument to specify the paired observations.
p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE) summary(p)
Time Point 1 (N=4) | Time Point 2 (N=4) | Difference (N=4) | p value | |
---|---|---|---|---|
Cat | 1.000 | |||
A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |
B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |
Fac | 0.261 | |||
A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |
B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | |
C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |
Num | 0.391 | |||
Mean (SD) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | |
Range | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |
Ord | 0.174 | |||
I | 2 (50.0%) | 0 (0.0%) | 2 (100.0%) | |
II | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |
III | 1 (25.0%) | 3 (75.0%) | 0 (0.0%) | |
Lgl | 1.000 | |||
FALSE | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |
TRUE | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | |
Dat | 0.182 | |||
Median | 2018-05-03 | 2018-05-04 | 0.500 | |
Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 |
The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2.
Note that by default, observations which do not have both timepoints are removed. This is easily changed using the na.action = na.paired("<arg>")
argument. For example:
p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("fill")) summary(p)
Time Point 1 (N=6) | Time Point 2 (N=6) | Difference (N=6) | p value | |
---|---|---|---|---|
Cat | 1.000 | |||
N-Miss | 2 | 1 | 2 | |
A | 2 (50.0%) | 2 (40.0%) | 1 (50.0%) | |
B | 2 (50.0%) | 3 (60.0%) | 1 (50.0%) | |
Fac | 0.261 | |||
N-Miss | 1 | 1 | 2 | |
A | 2 (40.0%) | 2 (40.0%) | 2 (100.0%) | |
B | 1 (20.0%) | 2 (40.0%) | 1 (100.0%) | |
C | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |
Num | 0.391 | |||
N-Miss | 1 | 2 | 2 | |
Mean (SD) | 2.200 (1.643) | 3.250 (0.957) | 0.500 (1.000) | |
Range | 0.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |
Ord | 0.174 | |||
N-Miss | 1 | 1 | 2 | |
I | 2 (40.0%) | 1 (20.0%) | 2 (100.0%) | |
II | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |
III | 1 (20.0%) | 3 (60.0%) | 0 (0.0%) | |
Lgl | 1.000 | |||
N-Miss | 1 | 1 | 2 | |
FALSE | 3 (60.0%) | 2 (40.0%) | 2 (100.0%) | |
TRUE | 2 (40.0%) | 3 (60.0%) | 1 (50.0%) | |
Dat | 0.182 | |||
N-Miss | 1 | 1 | 2 | |
Median | 2018-05-04 | 2018-05-05 | 0.500 | |
Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 |
For more details, see the help page for na.paired()
.
The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.
The following tests are accepted:
paired.t
: A paired t-test.
mcnemar
: McNemar’s test.
signed.rank
: the signed-rank test.
sign.test
: the sign test.
notest
: Don’t perform a test.
paired.control
settingsA quick way to see what arguments are possible to utilize in a function is to use the args()
command. Settings involving the number of digits can be set in paired.control
or in summary.tableby
.
args(paired.control)
## function (diff = TRUE, numeric.test = "paired.t", cat.test = "mcnemar",
## ordered.test = "signed.rank", date.test = "paired.t", mcnemar.correct = TRUE,
## signed.rank.exact = NULL, signed.rank.correct = TRUE, ...)
## NULL
summary.tableby
settingsSince the “paired” object inherits “tableby”, the summary.tableby
function is what’s actually used to format and print the table.
args(arsenal:::summary.tableby)
## function (object, ..., labelTranslations = NULL, text = FALSE,
## title = NULL, pfootnote = FALSE, term.name = "")
## NULL