In a stepped wedge group-randomized trial (SWGRT), also called a stepped wedge cluster-randomized trial, groups or clusters begin the study in the control condition, are randomly assigned to sequences, and cross-over to the intervention condition at pre-determined time points in a sequential, staggered fashion until all groups or clusters receive the intervention (

;

;

;

;

;

;

;

;

;

;

;

;

;

;

).

Special methods are needed for analysis and sample size estimation for these studies, as detailed below and in the SWGRT sample size calculator.

### Features and Uses

#### Staggered, Sequential Cross-Over

A SWGRT is a trial in which groups cross over to the intervention condition at predetermined time points in a sequential, staggered fashion until all groups receive the intervention. The design has been used when limited resources or a large geographical area prevent the use of a conventional parallel GRT (

).

#### NIH Webinars

- Methods: Mind the Gap Webinar: Robust Inference for Stepped Wedge Designs

- Methods: Mind the Gap Webinar: When is the Stepped Wedge Study a Good Study Design Choice?
- Methods: Mind the Gap Webinar: Does it Decay? Decaying Correlations in the Design and Analysis of Stepped Wedge Trials
- Methods: Mind the Gap Webinar: Overview of Statistical Models for the Design and Analysis of Stepped Wedge Cluster Randomized Trials

The design has also been used when staff training requirements in a clinical care intervention necessitated phased implementation (

). It has also been employed to improve power when a limited number of groups was available ( ).#### Nested or Hierarchical Design

In a SWGRT, members are nested within groups or clusters so that each member appears in only one group or cluster. In cross-sectional SWGRTs, different members are observed in each group at each measurement occasion; in closed cohort SWGRTs, members are observed repeatedly so that measurements are nested within members; in open cohort SWGRTs, some members are observed in only one time period and others are observed during multiple time periods (

; ; ; ).#### Appropriate Use

SWGRTs have become more popular over time but the design has a greater risk of bias compared to conventional parallel GRTs (

). Therefore, the use of SWGRT over more conventional alternatives must have strong justification. Hemming and Taljaard (2020) provide a non-exhaustive list of broad justifications, indicating that it may be appropriate to use a SWGRT if:- it provides a randomized trial when the only alternative is a staggered non-randomized trial and stakeholders can be convinced to randomly assign treatment order,
- it increases the likelihood that gatekeepers and stakeholders will enroll groups in the study due to receiving perceived benefits of the intervention while the trial is ongoing,
- staggered, sequential delivery of the intervention is the only logistically feasible design, or
- limited groups or resources are available and a SWGRT can attain the desired statistical power when a parallel GRT cannot.

#### Potential for Confounding

While a SWGRT tends to involve a limited number of groups, the impact of chance imbalances may be minimal because each group is exposed to both the control and intervention conditions (

). Chance imbalances may still occur, in which case stratified or constrained randomization on important group-level characteristics has been shown to improve power and maintain type I error rates in parallel GRTs ( ; ; ; ) and we might expect similar effects in a SWGRT. In the SWGRT, the method for restricted randomization would be applied when the groups or clusters are randomized to sequences.As time progresses, more groups implement the intervention condition. Therefore, time always has the potential to be a confounder in the relationship between the outcome and the intervention condition. To guard against this, time must be accounted for in the SWGRT design and analysis (

; ).#### Within- and Between-Period Correlations

Important factors in determining the sample size for a SWGRT are the intraclass correlation (ICC), the cluster autocorrelation (CAC), and the individual autocorrelation (IAC) (

; ; ). These quantities provide information on the similarity among outcome values due to correlation within groups or clusters at the same time and to repeated measurements on the same groups or clusters or on the same members.The ICC measures the similarity among values on the outcome variable for different members of the same group or cluster within a given time period. It is often described as the average correlation among members within the same group or cluster and within the same time period or as the proportion of variance due to group or cluster membership. The CAC is the correlation between the population means from the same group or cluster at two different time periods; it is sometimes called over-time correlation at the group level. The CAC is present in cross-sectional, closed cohort, and open cohort designs. The IAC is the correlation on the outcome variable for the same individual at two different time periods; it is sometimes called over-time correlation at the member level. The IAC is present only in closed and open cohort designs.

A characteristic of longitudinal GRTs such as SWGRTs is that the CAC and IAC can be considered as functions of compared time periods whose values decay over time. Failing to account for such decay in SWGRTs can result in increased Type I error rates (

; ; ; ). There are many possible decay structures, such as discrete-time decay and block-exchangeable structures.#### Intervention Effect Heterogeneity

It is common in SWGRTs to assume an instantaneous and sustained intervention effect when a group transitions to the intervention condition. However, there may be intervention effect heterogeneity as a function of exposure-time (

;

; ). For example, the intervention effect may initially be weak, strengthen quickly in the first few time periods of exposure, and then slowly decay. Without evidence to the contrary, it is prudent to assume intervention effect heterogeneity will be present. Not accounting for intervention effect heterogeneity when it is present can result in severely biased estimates of the intervention effects and standard errors ( ; ). The SWGRT sample size calculator implements the approach taken by Kenny, et al. for cross-sectional designs and Hughes, et al. for cohort designs.In the heterogeneous intervention effect setting, the intervention effect changes with time, requiring greater care in defining the estimand of interest. Kenny et al. describe three estimands of interest. The first is to estimate the average intervention effect across all exposure periods, or time average treatment effect (TATE). The second option is to estimate the intervention effect at a specific time period, or point treatment effect (PTE). Finally, the third option is to estimate the long-term treatment effect (LTE), or the intervention effect at a later time point when the intervention effect is thought to have stabilized. If this later time point corresponds to the final time period in the trial, then the LTE can be estimated with the PTE for the final time period. The choice of estimand depends largely on scientific relevance and investigator interest, but in general tests for the TATE will be more powerful than those for the PTE or LTE. In addition, the LTE assumes that long-term effects are attained by the end of the trial.

#### Solutions

The recommended solutions to these challenges are to 1) employ stratified or constrained randomization techniques to balance important cluster-level covariates when assigning groups to sequences, 2) account for time in the study design and analysis, 3) assume intervention effect heterogeneity is present in the absence of evidence to the contrary, and 4) estimate the sample size for SWGRTs based on realistic and data-based estimates of within- and between-period correlations and other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the estimation of sample size.