Chapter 4 Using R

The R environment for Statistical Computing is a free, open source environment for managing and visualizing data and performing analyses. Its capabilities can be augmented by downloading software packages from repositories, such as the Comprehensive R Archival Network (CRAN), and more recently, GitHub. Rstudio is a powerful development environment that makes it easier to use R, expands its functionality, and allows the integration of other tools, such as Git, into an analyst’s workflow. If you are new to R or would like additional resources on using R in practice, see the appendix.

In addition to R and Rstudio, we will also need to install RTools, also known as the R toolchain. These are software tools for compiling software packages. Some of the packages we use may have to be compiled after being downloaded, and RTools provides all the software needed to do so. CRAN provides instruction on which version of RTools you should use, depending on your version of R, and how to install it.

This section is primarily about how to use R and associated software packages. While this does include demonstration of how to implement statistical models and hypothesis tests, a larger discussion of their appropriate use and interpretation will take place in later sections.


4.1 Installing Packages

Once we have Rtools installed, we are ready to install the required packages from CRAN and Github. The install.packages() function in R can be used to install packages from CRAN, R-Forge, BioC, and other repositories via the command line. In Rstudio, users can also use the ‘Packages’ tab to see which packages are installed, their current version, and whether or not they have been loaded into the workspace for use. If you have installed the devtools package, devtools::install_github() can be used to install packages from Github or other version control repositories.

If you would like to explore which R packages are available for a given application or research area, the CRAN Task Views, including the CRAN Clinical Trials taskview, are worth exploring.


4.1.1 Packages from CRAN

Below are some of the packages that we will use from CRAN, along with a brief description of their purposes:

  • devtools - A suite of tools for R package development
  • cobalt - Creating tables and plots for assessing covariate balance
  • knitr - Tools for literate programming: including code in reproducible reports
  • margins - Calculating marginal or partial effects from regression models
  • mgcv - Fitting generalized additive models
  • sandwich Robust covariance matrix estimation
  • speff2trial Semiparametric estimation of the marginal hazard ratio using inverse probability weighting
  • survminer - Creating plots of time-to-event data
  • survRM2 - Calculating the restricted mean survival time (RMST) with and without covariate adjustment.
  • table1 - Creating simple tabulations in aggregate and by treatment arm
  • tidyverse - An ecosystem of packages for working with data
required_packages <-
  c("devtools",
    "cobalt",
    "knitr",
    "margins",
    "mgcv",
    "sandwich",
    "speff2trial"
    "survminer",
    "survRM2",
    "table1",
    "tidyverse"
  )

packages_to_install <-
  setdiff(
    x = required_packages,
    y = installed.packages(.Library)[, "Package"]
  )

install.packages(packages_to_install)

4.1.2 Packages from Github

We will also use the following packages from Github:

  • simul Inference based on Efficient Influence Function and Multiplier Bootstrap
  • adjrct Doubly Robust, Efficient Estimators for Survival and Time to Event Outcomes
devtools::install_github("nt-williams/simul")
devtools::install_github("nt-williams/adjrct")

4.1.3 Loading Packages into Workspace

Once a package has been successfully installed, we use the library() command to load it into the workspace for use:

library(tidyverse) # Data manipulation: dplyr, tidyr
library(table1) # Creation of Summary tables

It is possible that different packages contain a function of the same name. For example, the table1 package contains the function table1(): there are other packages that also have a function named table1(), such as the furniture package. If both of these packages are loaded, it can cause confusion about which version should be used when table1() is called. R will warn when such conflicts can arise, but as a best practice, it is useful to specify the package and function as as follows: table1::table1(). This makes code easier to use and potentially reduces ambiguity.


4.2 Loading the Data: MISTIE-III

The data dictionary and more information about the MISTIE III study were presented earlier (Hanley et al. 2019). All of the data needed for the examples are available on the web. To load these, we create a file connection to the URL using the url() function, and then use read.csv() to read in the comma separated values (CSV) file. To load data from a local file path, the file.path() function is useful for creating file paths. We can start by loading the simulated MISTIE III data. Once we have loaded the full data, we can use dplyr::slice to take the first 500 rows.

data_url <-
  paste0("https://github.com/jbetz-jhu/CovariateAdjustmentTutorial",
         "/raw/main/Simulated_MISTIE_III_v1.2.csv")

sim_miii_full <- read.csv(file = url(data_url))

# Read in data: Recast categorical variables as factors
sim_miii_full <-
  sim_miii_full %>% 
  dplyr::tibble() %>% 
  dplyr::mutate(
    # Convert variables from binary indicators to labeled categorical variables
    male =
      factor(
        x = male,
        levels = 0:1,
        labels = c("0. Female", "1. Male")
      ),
    across(
      .cols = 
        all_of(
          x = c("hx_cvd", "hx_hyperlipidemia",
                "on_anticoagulants", "on_antiplatelets")
        ),
      .fns = function(x) factor(x, levels = 0:1, labels = c("0. No", "1. Yes"))
    ),
    # Convert GCS and MRS variables from character data to categorical variables
    across(
      .cols = starts_with("gcs") | starts_with("mrs"),
      .fns = factor
    ),
    ich_location =
      factor(
        x = ich_location,
        levels = c("Deep", "Lobar")
      ),
    arm =
      factor(
        x = arm,
        levels = c("medical", "surgical")
      ),
    tx = 1*(arm == "surgical")
  )


# Take the first 500 rows
sim_miii <-
  sim_miii_full %>% 
  dplyr::slice(1:500)

Other useful functions include:

  • head()/tail() - Looking at the first \(n\) rows of a dataset
  • nrow()/ncol() - Counting the rows/columns of a dataset
  • colnames()/rownames() - Getting the row/column names of a dataset
head(sim_miii)
## # A tibble: 6 × 22
##   sim_participant_id   age male      hx_cvd hx_hyperlipidemia on_anticoagulants
##                <int> <int> <fct>     <fct>  <fct>             <fct>            
## 1                  1    52 1. Male   0. No  0. No             0. No            
## 2                  2    74 0. Female 0. No  1. Yes            1. Yes           
## 3                  3    62 1. Male   0. No  1. Yes            0. No            
## 4                  4    81 0. Female 0. No  0. No             0. No            
## 5                  5    73 0. Female 0. No  1. Yes            0. No            
## 6                  6    60 1. Male   1. Yes 1. Yes            0. No            
## # ℹ 16 more variables: on_antiplatelets <fct>, ich_location <fct>,
## #   ich_s_volume <dbl>, ivh_s_volume <int>, gcs_category <fct>, arm <fct>,
## #   ich_eot_volume <dbl>, mrs_30d <fct>, mrs_30d_complete <fct>,
## #   mrs_180d <fct>, mrs_180d_complete <fct>, mrs_365d <fct>,
## #   mrs_365d_complete <fct>, days_on_study <int>, died_on_study <int>, tx <dbl>
nrow(sim_miii)
## [1] 500
ncol(sim_miii)
## [1] 22
colnames(sim_miii)
##  [1] "sim_participant_id" "age"                "male"              
##  [4] "hx_cvd"             "hx_hyperlipidemia"  "on_anticoagulants" 
##  [7] "on_antiplatelets"   "ich_location"       "ich_s_volume"      
## [10] "ivh_s_volume"       "gcs_category"       "arm"               
## [13] "ich_eot_volume"     "mrs_30d"            "mrs_30d_complete"  
## [16] "mrs_180d"           "mrs_180d_complete"  "mrs_365d"          
## [19] "mrs_365d_complete"  "days_on_study"      "died_on_study"     
## [22] "tx"

4.3 Assessing Baseline Balance

While randomization will tend to produce treatment groups that are similar on both measured and unmeasured factors, there will always be some degree of imbalance between groups in some characteristics. It is important to remember that these differences only represent confounding if groups are imbalanced on variables that are associated with the outcome. To assess the degree of imbalance, we can tabulate characteristics by treatment arm, and compute standardized differences to get a scale-free measure of the magnitude of imbalance.

table1(
  ~ age + male +
    on_antiplatelets + ich_location + ich_s_volume + ivh_s_volume +
    gcs_category | arm, 
  data = sim_miii
)
medical
(N=250)
surgical
(N=250)
Overall
(N=500)
age
Mean (SD) 60.6 (12.9) 60.1 (12.5) 60.3 (12.7)
Median [Min, Max] 62.0 [28.0, 85.0] 61.0 [35.0, 88.0] 62.0 [28.0, 88.0]
male
0. Female 100 (40.0%) 94 (37.6%) 194 (38.8%)
1. Male 150 (60.0%) 156 (62.4%) 306 (61.2%)
on_antiplatelets
0. No 179 (71.6%) 164 (65.6%) 343 (68.6%)
1. Yes 71 (28.4%) 86 (34.4%) 157 (31.4%)
ich_location
Deep 155 (62.0%) 159 (63.6%) 314 (62.8%)
Lobar 95 (38.0%) 91 (36.4%) 186 (37.2%)
ich_s_volume
Mean (SD) 47.9 (17.3) 47.4 (16.2) 47.7 (16.7)
Median [Min, Max] 45.9 [15.2, 112] 47.0 [12.9, 106] 46.3 [12.9, 112]
ivh_s_volume
Mean (SD) 2.46 (3.96) 2.09 (3.40) 2.28 (3.69)
Median [Min, Max] 0 [0, 32.0] 0 [0, 18.0] 0 [0, 32.0]
gcs_category
1. Severe (3-8) 71 (28.4%) 63 (25.2%) 134 (26.8%)
2. Moderate (9-12) 111 (44.4%) 102 (40.8%) 213 (42.6%)
3. Mild (13-15) 68 (27.2%) 85 (34.0%) 153 (30.6%)

This allows us to compare the means, medians, frequencies, and ranges of variables between groups.

library(cobalt)

cobalt::bal.tab(
  x = 
    # Only tabulate baseline variables
    sim_miii %>% 
    dplyr::select(
      dplyr::all_of(
        x = c("age", "male", "on_antiplatelets",
              "ich_location", "ich_s_volume", "ivh_s_volume", "gcs_category")
      )
    ),
  treat = sim_miii$arm,
  # Compute standardized differences for both binary and continuous variables
  binary = "std",
  continuous = "std"
)
## Balance Measures
##                                    Type Diff.Un
## age                             Contin. -0.0362
## male_1. Male                     Binary  0.0493
## on_antiplatelets_1. Yes          Binary  0.1295
## ich_location_Lobar               Binary -0.0331
## ich_s_volume                    Contin. -0.0292
## ivh_s_volume                    Contin. -0.1019
## gcs_category_1. Severe (3-8)     Binary -0.0723
## gcs_category_2. Moderate (9-12)  Binary -0.0729
## gcs_category_3. Mild (13-15)     Binary  0.1480
## 
## Sample sizes
##     medical surgical
## All     250      250

4.4 Visualizing Data

One of the many strength of R is the powerful and flexible data visualization tools in its software ecosystem: see the R Graph Gallery for some examples. The ggplot2 package, and some related packages like survminer, are useful for assessing baseline balance and visualizing outcome data.

For example, we may want to assess the cumulative distribution of a baseline covariate instead of just checking the summary statistics.

library(ggplot2)

ggplot(
  data = sim_miii,
  aes(
    x = age
  )
) +
  stat_ecdf(
    alpha = 0.6,
  ) +
  stat_ecdf(
    aes(color = arm),
    alpha = 0.6,
  ) +
  theme_bw()

Or we may want to create a plot of the Kaplan-Meier estimate of the survival function (see the estimands section for its definition, and the time-to-event section for further discussion).

library(survival)
library(survminer)

# Create a 'survival object' from event times and indicators
miii_surv <- 
  with(sim_miii,
       survival::Surv(
         time = days_on_study,
         event = died_on_study
       )
  )

# Use survfit to calculate survival data
time_to_death_km <-
  survival::survfit(
    formula = miii_surv ~ arm,
    data = sim_miii
  )

# Create the Kaplan-Meier Plot
survminer::ggsurvplot(
  fit = time_to_death_km,
  conf.int = TRUE,
  risk.table = TRUE,
  xlab = "Days", 
  ylab = "Survival probability"
)

From this plot, we can see that most of the mortality occurs soon after randomization, with greater mortality in the medical management arm early in the trial. After the first 90 days, the rate of events decreases in both arms.


4.5 Fitting Regression Models

While covariate adjustment does involve regression modeling, an in-depth discussion of regression modeling is not needed for the purposes of implementing covariate adjustment. For those wanting a more in-depth presentation of fitting generalized linear models (GLMs) in R see Dobson & Barnett (2018).

4.5.1 Generalized Linear Model

Fitting a GLM in R is done using the glm() function. For example if we wanted to model the probability of being assigned to the surgical arm in MISTIE III by an individual’s age, ICH volume, and IVH volume, the code is as follows:

pr_mis_glm <-
  glm(
    formula =
      tx ~ age + ich_s_volume + ivh_s_volume,
    data = sim_miii,
    family = binomial(link = "logit")
  )

Once the model has been fit, we can use it for creating summary tables, calculating confidence intervals, or generating fitted values for a new dataset:

# Produce GLM Summary Table
summary(pr_mis_glm)
## 
## Call:
## glm(formula = tx ~ age + ich_s_volume + ivh_s_volume, family = binomial(link = "logit"), 
##     data = sim_miii)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)
## (Intercept)   0.3090527  0.5028292   0.615    0.539
## age          -0.0034148  0.0070781  -0.482    0.629
## ich_s_volume -0.0008168  0.0054271  -0.151    0.880
## ivh_s_volume -0.0282560  0.0249439  -1.133    0.257
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 693.15  on 499  degrees of freedom
## Residual deviance: 691.58  on 496  degrees of freedom
## AIC: 699.58
## 
## Number of Fisher Scoring iterations: 3
# Calculate Confidence Intervals for Coefficients
confint(pr_mis_glm)
##                    2.5 %     97.5 %
## (Intercept)  -0.67578350 1.29885632
## age          -0.01732742 0.01045830
## ich_s_volume -0.01148661 0.00983616
## ivh_s_volume -0.07808960 0.02017825
# Calculate fitted probabilities for new data:
# 1. 65 years old, 30 mL ICH, 0 mL IVH
# 2. 70 years old, 50 mL ICH, 15 mL IVH
predict(
  object = pr_mis_glm,
  newdata = 
    data.frame(
      age = c(65, 70),
      ich_s_volume = c(30, 50),
      ivh_s_volume = c(0, 15)
    ),
  type = "response"
)
##         1         2 
## 0.5156414 0.4025950

4.5.2 Logrank Test and Cox Proportional Hazards Model

For time-to-event outcomes, such as mortality, the logrank test and Cox Proportional Hazards (PH) model are commonly used analytic approaches. Using the survival object miii_surv created earlier using survival::Surv(), we can pass this object to other functions to perform the logrank test (using survival::survdiff) and fit the Cox PH model (using survival::coxph).

mortality_logrank <-
  survival::survdiff(
    formula = miii_surv ~ arm,
    data = sim_miii
  )

mortality_logrank
## Call:
## survival::survdiff(formula = miii_surv ~ arm, data = sim_miii)
## 
##                N Observed Expected (O-E)^2/E (O-E)^2/V
## arm=medical  250       64     49.6      4.19      8.06
## arm=surgical 250       40     54.4      3.82      8.06
## 
##  Chisq= 8.1  on 1 degrees of freedom, p= 0.005

For the Cox PH model, the ties = "efron" argument specifies how tied survival times are addressed, and the robust = TRUE computes robust estimates of the covariance matrix of regression coefficients:

mortality_cox <-
  survival::coxph(
    formula = miii_surv ~ arm,
    data = sim_miii,
    ties = "efron",
    robust = TRUE
  )

summary(mortality_cox)
## Call:
## survival::coxph(formula = miii_surv ~ arm, data = sim_miii, ties = "efron", 
##     robust = TRUE)
## 
##   n= 500, number of events= 104 
## 
##                coef exp(coef) se(coef) robust se      z Pr(>|z|)   
## armsurgical -0.5670    0.5672   0.2016    0.1995 -2.841  0.00449 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##             exp(coef) exp(-coef) lower .95 upper .95
## armsurgical    0.5672      1.763    0.3836    0.8387
## 
## Concordance= 0.575  (se = 0.024 )
## Likelihood ratio test= 8.16  on 1 df,   p=0.004
## Wald test            = 8.07  on 1 df,   p=0.004
## Score (logrank) test = 8.12  on 1 df,   p=0.004,   Robust = 8.03  p=0.005
## 
##   (Note: the likelihood ratio and score tests assume independence of
##      observations within a cluster, the Wald and robust score tests do not).

This model assumes that the ratio of the rates of events between the treatment and control arm is approximately constant over time. We can assess using the survival::cox.zph function:

# Test Proportionality Assumption: Schoenfeld Residuals
cox.zph(fit = mortality_cox)
##        chisq df      p
## arm     9.85  1 0.0017
## GLOBAL  9.85  1 0.0017
# Plot smoothed residuals
plot(cox.zph(fit = mortality_cox))
abline(h = 0, col = "red")

From the plot, we can see that the hazard ratio is initially negative, but increases towards zero as time goes on. It seems as if the treatment has a beneficial effect on mortality for a few weeks after randomization, which then diminishes. This is a violation of the proportional hazards assumption, which is further discussed in the chapter on estimands.


4.6 Variance Estimation

4.6.1 Robust Standard Errors

Most of the standard errors reported in software, such as those returned by vcov(), are model-based estimates of the standard error, which assume that the model is correctly specified. Robust or “Sandwich” standard errors can be used to obtain a consistent estimate of the standard error in such cases. Note that robust estimates of standard errors are different from robust estimates of the regression coefficients themselves.

The sandwich::vcovHC() function can be used to obtain different types of robust standard errors:

library(sandwich)

# Model-based standard errors
vcov(object = pr_mis_glm)
##               (Intercept)           age  ich_s_volume  ivh_s_volume
## (Intercept)   0.252837171 -2.993310e-03 -1.285278e-03 -1.316243e-03
## age          -0.002993310  5.009960e-05 -1.254035e-06  1.346355e-05
## ich_s_volume -0.001285278 -1.254035e-06  2.945316e-05 -1.887093e-05
## ivh_s_volume -0.001316243  1.346355e-05 -1.887093e-05  6.221973e-04
# Robust standard errors
sandwich::vcovHC(x = pr_mis_glm, type = "HC3")
##               (Intercept)           age  ich_s_volume  ivh_s_volume
## (Intercept)   0.263262683 -3.112090e-03 -1.349840e-03 -0.0014409131
## age          -0.003112090  5.179135e-05 -9.534434e-07  0.0000144568
## ich_s_volume -0.001349840 -9.534434e-07  3.044098e-05 -0.0000171047
## ivh_s_volume -0.001440913  1.445680e-05 -1.710470e-05  0.0006168577

These can be passed as an argument to other functions for computing confidence intervals for contrasts and marginal means.


4.6.2 Bootstrap Estimator

The bootstrap procedure uses resampling to obtain an estimate of the variance of the sampling distribution of an estimator. In R, this is done using the boot package, which is part of base R.

First, we need to write a function to produce the statistic of interest. In this case, we will bootstrap the Mann-Whitney U statistic: see the estimands for more information on this estimand. The first argument to this function must be the data, and the second argument should be a vector of indices for our bootstrap sample, and any other arguments can be supplied thereafter.

# 1. Write a function that produces the test statistic:
wilcox_to_auc <-
  function(data, indices = NULL, formula){
    # Input data must be a data.frame
    if(!all(class(data) == "data.frame")){
      stop("`data` must be a data.frame: use `as.data.frame()` for a tibble.")
    }
    
    # If bootstrap indices not supplied, use entire dataset
    if(is.null(indices)) indices <- 1:nrow(data)
    
    # Extract Outcome/Treatment from Formula
    outcome <- all.vars(update(formula, . ~ 0))
    treatment <- all.vars(update(formula, 0 ~ .))
    stopifnot(length(treatment) == 1)
    
    # Convert outcome to numeric using levels: Assumes levels are ordered
    if(!is.numeric(data[, outcome])){
      data[, outcome] <- as.numeric(data[, outcome])
    }
    
    # Run Wilcoxon Rank Sum on data using the bootstrap indices
    wrst_result <-
      wilcox.test(
        formula = formula,
        data = data[indices,]
      )
    
    # Compute AUC statistic
    return(wrst_result$statistic/prod(table(data[indices, treatment])))
  }

Now, we call this function using boot, passing any arguments required: boot will resample the data, and pass the indices to the function, and evaluate the result. From the result, we can calculate the standard error or estimate of the bootstrap distribution:

library(boot)

# Perform 10,000 bootstraps of data
n_boot_samples <- 10000

mrs_365d_auc_boot <-
  boot::boot(
    data = as.data.frame(sim_miii),
    statistic = wilcox_to_auc,
    R = n_boot_samples,
    formula = mrs_365d ~ arm
  )

Once the bootstrap procedure is complete, we can view and summarize the results.

# Bootstrap Standard Error
sd(mrs_365d_auc_boot$t[,1])
## [1] 0.02515884
# Bootstrap Variance (SE^2)
var(mrs_365d_auc_boot$t[,1])
## [1] 0.000632967
# Produce a histogram and quantile-quantile plot
plot(mrs_365d_auc_boot)

Confidence intervals can be computed using boot::boot.ci:

mrs_365d_auc_boot_ci <-
  boot::boot.ci(
    boot.out = mrs_365d_auc_boot,
    conf = 0.95,
    type = "bca",
    index = 1
  )

The results can be printed out or extracted for use elsewhere:

# boot.ci result
mrs_365d_auc_boot_ci
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 10000 bootstrap replicates
## 
## CALL : 
## boot::boot.ci(boot.out = mrs_365d_auc_boot, conf = 0.95, type = "bca", 
##     index = 1)
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.5022,  0.6005 )  
## Calculations and Intervals on Original Scale
# Extract the lower/upper confidence limits
mrs_365d_auc_boot_ci_result <-
  tail(x = mrs_365d_auc_boot_ci$bca[1,], n = 2)

# Print out result
mrs_365d_auc_boot_ci_result
##                     
## 0.5022205 0.6005486

The most straightforward way to calculate a p-value using the bootstrap involves finding the smallest value of \(\alpha\) at which the \(100(1 - \alpha)\%\) confidence interval does not contain the null value, resulting in a rejection of the null hypothesis. While not implemented in boot, this can be done using a binary search algorithm for the values of \(\alpha\) typically used in practice: \(0.05 \ge \alpha \ge 0.0001\).


4.7 Addressing Multiplicity

For pivotal trials, it is often required to have an analysis plan that control the probability of rejecting at least one truly null hypothesis, also known as the familywise type I error rate (FWER). When there are more than one primary comparison of interest, due to having multiple pairwise treatment contrasts, endpoints, sequential analyses, or timepoints, a strategy must be used to control the FWER at a pre-specified level.

A later chapter is devoted to Group Sequential Designs, a methodology for doing pre-planned analyses that allow stopping a randomized trial early for success or futility. When all hypotheses are tested simultaneously, the multcomp package allows for decisions and confidence intervals that appropriately control the FWER. When there is a specific ordering to how hypotheses are tested, the gMCP package implements graphical methods for addressing multiple comparisons. We will focus mainly on controlling the FWER for one pairwise treatment comparison on a single endpoint, with one or more pre-planned analyses.

References

Dobson, Annette J., and Adrian G. Barnett. 2018. An Introduction to Generalized Linear Models, Fourth Edition. Chapman; Hall/CRC. https://doi.org/10.1201/9781315182780.
Hanley, Daniel F, Richard E Thompson, Michael Rosenblum, Gayane Yenokyan, Karen Lane, Nichol McBee, Steven W Mayo, et al. 2019. “Efficacy and Safety of Minimally Invasive Surgery with Thrombolysis in Intracerebral Haemorrhage Evacuation (MISTIE III): A Randomised, Controlled, Open-Label, Blinded Endpoint Phase 3 Trial.” The Lancet 393 (10175): 1021–32. https://doi.org/10.1016/s0140-6736(19)30195-3.