`vignettes/paired.Rmd`

`paired.Rmd`

Another one of the most common tables in medical literature includes summary statistics for a set of variables paired across two time points. Locally at Mayo, the SAS macro `%paired`

was written to create summary tables with a single call. With the increasing interest in R, we have developed the function `paired()`

to create similar tables within the R environment.

This vignette is light on purpose; `paired()`

piggybacks off of tableby, so most documentation there applies here, too.

The first step when using the `paired()`

function is to load the `arsenal`

package. We can’t use `mockstudy`

here because we need a dataset with paired observations, so we’ll create our own dataset.

library(arsenal) dat <- data.frame( tp = paste0("Time Point ", c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2)), id = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 6), Cat = c("A", "A", "A", "B", "B", "B", "B", "A", NA, "B"), Fac = factor(c("A", "B", "C", "A", "B", "C", "A", "B", "C", "A")), Num = c(1, 2, 3, 4, 4, 3, 3, 4, 0, NA), Ord = ordered(c("I", "II", "II", "III", "III", "III", "I", "III", "II", "I")), Lgl = c(TRUE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE), Dat = as.Date("2018-05-01") + c(1, 1, 2, 2, 3, 4, 5, 6, 3, 4), stringsAsFactors = FALSE )

To create a simple table stratified by time point, use a `formula=`

statement to specify the variables that you want summarized and the `id=`

argument to specify the paired observations.

p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE) summary(p)

Time Point 1 (N=4) | Time Point 2 (N=4) | Difference (N=4) | p value | |
---|---|---|---|---|

Cat |
1.000 | |||

A | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |

B | 2 (50.0%) | 2 (50.0%) | 1 (50.0%) | |

Fac |
0.261 | |||

A | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |

B | 1 (25.0%) | 2 (50.0%) | 1 (100.0%) | |

C | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |

Num |
0.391 | |||

Mean (SD) | 2.750 (1.258) | 3.250 (0.957) | 0.500 (1.000) | |

Range | 1.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |

Ord |
0.174 | |||

I | 2 (50.0%) | 0 (0.0%) | 2 (100.0%) | |

II | 1 (25.0%) | 1 (25.0%) | 1 (100.0%) | |

III | 1 (25.0%) | 3 (75.0%) | 0 (0.0%) | |

Lgl |
1.000 | |||

FALSE | 2 (50.0%) | 1 (25.0%) | 2 (100.0%) | |

TRUE | 2 (50.0%) | 3 (75.0%) | 1 (50.0%) | |

Dat |
0.182 | |||

Median | 2018-05-03 | 2018-05-04 | 0.500 | |

Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 |

The third column shows the difference between time point 1 and time point 2. For categorical variables, it reports the percent of observations from time point 1 which changed in time point 2.

Note that by default, observations which do not have both timepoints are removed. This is easily changed using the `na.action = na.paired("<arg>")`

argument. For example:

p <- paired(tp ~ Cat + Fac + Num + Ord + Lgl + Dat, data = dat, id = id, signed.rank.exact = FALSE, na.action = na.paired("fill")) summary(p)

Time Point 1 (N=6) | Time Point 2 (N=6) | Difference (N=6) | p value | |
---|---|---|---|---|

Cat |
1.000 | |||

N-Miss | 2 | 1 | 2 | |

A | 2 (50.0%) | 2 (40.0%) | 1 (50.0%) | |

B | 2 (50.0%) | 3 (60.0%) | 1 (50.0%) | |

Fac |
0.261 | |||

N-Miss | 1 | 1 | 2 | |

A | 2 (40.0%) | 2 (40.0%) | 2 (100.0%) | |

B | 1 (20.0%) | 2 (40.0%) | 1 (100.0%) | |

C | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |

Num |
0.391 | |||

N-Miss | 1 | 2 | 2 | |

Mean (SD) | 2.200 (1.643) | 3.250 (0.957) | 0.500 (1.000) | |

Range | 0.000 - 4.000 | 2.000 - 4.000 | -1.000 - 1.000 | |

Ord |
0.174 | |||

N-Miss | 1 | 1 | 2 | |

I | 2 (40.0%) | 1 (20.0%) | 2 (100.0%) | |

II | 2 (40.0%) | 1 (20.0%) | 1 (100.0%) | |

III | 1 (20.0%) | 3 (60.0%) | 0 (0.0%) | |

Lgl |
1.000 | |||

N-Miss | 1 | 1 | 2 | |

FALSE | 3 (60.0%) | 2 (40.0%) | 2 (100.0%) | |

TRUE | 2 (40.0%) | 3 (60.0%) | 1 (50.0%) | |

Dat |
0.182 | |||

N-Miss | 1 | 1 | 2 | |

Median | 2018-05-04 | 2018-05-05 | 0.500 | |

Range | 2018-05-02 - 2018-05-06 | 2018-05-02 - 2018-05-07 | 0.000 - 1.000 |

For more details, see the help page for `na.paired()`

.

The tests used to calculate p-values differ by the variable type, but can be specified explicitly in the formula statement or in the control function.

The following tests are accepted:

`paired.t`

: A paired t-test.`mcnemar`

: McNemar’s test.`signed.rank`

: the signed-rank test.`sign.test`

: the sign test.`notest`

: Don’t perform a test.

`paired.control`

settingsA quick way to see what arguments are possible to utilize in a function is to use the `args()`

command. Settings involving the number of digits can be set in `paired.control`

or in `summary.tableby`

.

args(paired.control)

```
## function (diff = TRUE, numeric.test = "paired.t", cat.test = "mcnemar",
## ordered.test = "signed.rank", date.test = "paired.t", mcnemar.correct = TRUE,
## signed.rank.exact = NULL, signed.rank.correct = TRUE, ...)
## NULL
```

`summary.tableby`

settingsSince the “paired” object inherits “tableby”, the `summary.tableby`

function is what’s actually used to format and print the table.

args(arsenal:::summary.tableby)

```
## function (object, ..., labelTranslations = NULL, text = FALSE,
## title = NULL, pfootnote = FALSE, term.name = "")
## NULL
```