Stepped Wedge Group-Randomized Trials (SWGRTs)

In a stepped wedge group-randomized trial (SWGRT), also called a stepped wedge cluster-randomized trial, groups or clusters begin the study in the control condition, are randomly assigned to sequences, and cross-over to the intervention condition at pre-determined time points in a sequential, staggered fashion until all groups or clusters receive the intervention (

Copas et al., 2015

;

Grantham et al., 2019

;

Hemming et al., 2015b

;

Hemming et al., 2020

;

Hemming and Taljaard, 2016

;

Hooper et al., 2016

;

Hughes et al., 2015

;

Hussey and Hughes, 2007

;

Kasza et al., 2019a

;

Kasza et al., 2020

;

Li et al., 2021

;

Nickless et al., 2018

).

Special methods are needed for analysis and sample size estimation for these studies, as detailed below and in the SWGRT sample size calculator.

Features and Uses

Staggered, Sequential Cross-Over

A SWGRT is a trial in which groups cross over to the intervention condition at predetermined time points in a sequential, staggered fashion until all groups receive the intervention. The design has been used when limited resources or a large geographical area prevent the use of a conventional parallel GRT (

Hall et al., 1987

).

The design has also been used when staff training requirements in a clinical care intervention necessitated phased implementation (Moulton et al., 2007). It has also been employed to improve power when a limited number of groups was available (Peden et al., 2019).

Nested or Hierarchical Design

In a SWGRT, members are nested within groups or clusters so that each member appears in only one group or cluster. In cross-sectional SWGRTs, different members are observed in each group at each measurement occasion; in closed cohort SWGRTs, members are observed repeatedly so that measurements are nested within members; in open cohort SWGRTs, some members are observed in only one time period and others are observed during multiple time periods (

Copas et al., 2015

;

Hooper et al., 2016

;

Kasza et al., 2019a

;

Kasza et al., 2020

).  

Appropriate Use

SWGRTs have become more popular over time but the design has a greater risk of bias compared to conventional parallel GRTs (

Hemming and Taljaard, 2020

). Therefore, the use of SWGRT over more conventional alternatives must have strong justification. Hemming and Taljaard (2020) provide a non-exhaustive list of broad justifications, indicating that it may be appropriate to use a SWGRT if:  

  • it provides a randomized trial when the only alternative is a staggered non-randomized trial and stakeholders can be convinced to randomly assign treatment order, 
  • it increases the likelihood that gatekeepers and stakeholders will enroll groups in the study due to receiving perceived benefits of the intervention while the trial is ongoing, 
  • staggered, sequential delivery of the intervention is the only logistically feasible design, or
  • limited groups or resources are available and a SWGRT can attain the desired statistical power when a parallel GRT cannot. 

Potential for Confounding

While a SWGRT tends to involve a limited number of groups, the impact of chance imbalances may be minimal because each group is exposed to both the control and intervention conditions (

Hemming et al., 2020

). Chance imbalances may still occur, in which case stratified or constrained randomization on important group-level characteristics has been shown to improve power and maintain type I error rates in parallel GRTs (Donner and Klar, 2000; Li et al., 2016; Li et al., 2017b; Murray, 1998) and we might expect similar effects in a SWGRT. In the SWGRT, the method for restricted randomization would be applied when the groups or clusters are randomized to sequences. 

As time progresses, more groups implement the intervention condition. Therefore, time always has the potential to be a confounder in the relationship between the outcome and the intervention condition. To guard against this, time must be accounted for in the SWGRT design and analysis (

Hemming et al., 2015b

;

Hemming and Taljaard, 2020

). 

Within- and Between-Period Correlations

Important factors in determining the sample size for a SWGRT are the intraclass correlation (ICC), the cluster autocorrelation (CAC), and the individual autocorrelation (IAC) (

Girling and Hemming, 2016

;

Hooper et al., 2016

;

Kasza et al., 2020

). These quantities provide information on the similarity among outcome values due to correlation within groups or clusters at the same time and to repeated measurements on the same groups or clusters or on the same members. 

The ICC measures the similarity among values on the outcome variable for different members of the same group or cluster within a given time period. It is often described as the average correlation among members within the same group or cluster and within the same time period or as the proportion of variance due to group or cluster membership. The CAC is the correlation between the population means from the same group or cluster at two different time periods; it is sometimes called over-time correlation at the group level. The CAC is present in cross-sectional, closed cohort, and open cohort designs. The IAC is the correlation on the outcome variable for the same individual at two different time periods; it is sometimes called over-time correlation at the member level. The IAC is present only in closed and open cohort designs. 

A characteristic of longitudinal GRTs such as SWGRTs is that the CAC and IAC can be considered as functions of compared time periods whose values decay over time. Failing to account for such decay in SWGRTs can result in increased Type I error rates (

Kasza and Forbes, 2019b

;

Kasza et al., 2019a

;

Kasza et al., 2020

;

Li, 2020

). There are many possible decay structures, such as discrete-time decay and block-exchangeable structures.   

Solutions

The recommended solutions to these challenges are to 1) employ stratified or constrained randomization techniques to balance important cluster-level covariates when assigning groups to sequences, 2) account for time in the study design and analysis, and 3) estimate the sample size for SWGRTs based on realistic and data-based estimates of within- and between-period correlations and other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the estimation of sample size.  

SWGRTs should only be used when all efforts to implement a more conventional parallel GRT have been exhausted. Compared to parallel GRTs, SWGRTs are at greater risk of bias. Given these risks, strong justifications must be given for the use of SWGRTs. 

Z-scores and t-scores will give similar results if the df available for the test of the intervention effect are more than about 30.  As the df decline below 30, it becomes increasingly important to use t-scores rather than z-scores. Unfortunately, the precise df to use for t-scores when calculating power or sample size for SWGRTs is unsettled, though it is a subject of on-going research. One approach is to use the number of groups or clusters minus the number of time periods minus one, but other approaches are possible (Hemming et al., 2020; Thompson et al., 2021).

Standard sources assume that each group or cluster has the same number of observations, but that is almost never true in practice. In GRTs, power decreases as the variation in group or cluster size increases. This is true for SWGRTs as well, but the power decrease is less pronounced (Girling, 2018; Kristunas et al., 2017).

If the distribution of group sizes within each sequence is the same, expressions for design effects assuming block-exchangeable correlation structure are available that inflate the average cluster size relative the corresponding equal-cluster design (Girling, 2018; Harrison et al., 2020).

Power in a SWGRT is a function of several factors. These include the treatment effect, the number of time periods, the number of groups, the number of members per group, ICC, CAC, IAC, and the correlation decay structure.  

Yes, but the CAC and IAC estimates from the block-exchangeable study should be adjusted for use with the planned discrete-time decay analysis. Expressions for this adjustment are available (Kasza et al., 2020). Note that this adjustment should only be obtained when using CAC and IAC estimates from an analysis that incorrectly assumed a block-exchangeable correlation structure when discrete-time decay was present. In addition, the proposed study must have the same number of periods and period length as the previous study. If this is not the case, or if you are unsure of the previous study's decay mechanism, number of time periods, or period length, then you should not make this adjustment. 

The original approach assumed a common secular trend and an immediate and constant intervention effect (

Hussey and Hughes, 2007

). Further work allowed treatment effects to vary across groups (Hughes et al., 2015). In addition, methods that model the intervention effect as a trend over time have been offered (Hughes et al., 2015; Nickless et al., 2018). Recently, a general model for SWGRTs that accommodates various forms for the intervention effect has been provided (Li et al., 2021).

In the parallel GRT, the groups or clusters in the control condition remain in that condition throughout the trial. As such, if external events occur that affect the outcome, that will be seen in the control condition and it may be possible to adjust for it.  In the SWGRT, the groups or clusters gradually cross over from the control condition to the intervention condition, so that there are fewer and fewer groups or clusters in the control condition as the study progresses. That can make it difficult to observe or adjust for the effect of an external event that may affect the outcome. 

CONSORT Statement
Hemming K, Taljaard M, McKenzie JE, Hooper R, Copas A, Thompson JA, et al. Reporting of stepped wedge cluster randomised trials: Extension of the CONSORT 2010 statement with explanation and elaboration. BMJ. 2018;363:k1614. Epub 2018/11/09.
PMID: 30413417.
Key References

SWGRTs

Thompson JA, Hemming K, Forbes AB, Fielding KL, Hayes RJ. Comparison of small-sample standard-error corrections for generalised estimating equations in stepped wedge cluster randomised trials with a binary outcome: A simulation study. Stat Methods Med Res. 2021;30(2):425-439. Epub 2020/09/24.
PMID: 32970526.
Li F, Hughes JP, Hemming K, Taljaard M, Melnick ER, Heagerty PJ. Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview. Stat Methods Med Res. 2021;30(2):612-639. Epub 2020/07/06.
PMID: 32631142.
Hemming K, Taljaard M. Reflection on modern methods: When is a stepped-wedge cluster randomized trial a good study design choice? Int J Epidemiol. 2020;49(3):1043-1052. Epub 2020/05/09.
PMID: 32386407.
Li F. Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Stat Med. 2020;39(4):438-455. Epub 2019/12/04.
PMID: 31797438.
Grantham KL, Kasza J, Heritier S, Hemming K, Forbes AB. Accounting for a decaying correlation structure in cluster randomized trials with continuous recruitment. Stat Med. 2019;38(11):1918-1934. Epub 2019/01/21.
PMID: 30663132.
Nickless A, Voysey M, Geddes J, Yu LM, Fanshawe TR. Mixed effects approach to the analysis of the stepped wedge cluster randomised trial-Investigating the confounding effect of time through simulation. PLoS One. 2018;13(12):e0208876. Epub 2018/12/13.
PMID: 30543671.
Girling AJ. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling. Stat Med. 2018;37(30):4652-4664. Epub 2018/09/12.
PMID: 30209812.
Girling AJ, Hemming K. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016;35(13):2149-66. Epub 2016/01/07.
PMID: 26748662.
Copas A, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: Three main designs, carry-over effects and randomisation approaches. Trials. 2015;16(1):352. Epub 2015/08/17.
PMID: 26279154.
Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ. 2015b;350:h391. Epub 2015/02/11.
PMID: 25662947.
Hemming K, Lilford RJ, Girling AJ. Stepped-wedge cluster randomised controlled trials: A generic framework including parallel and multiple-level designs. Stat Med. 2015a;34(2):181-96. Epub 2014/10/24.
PMID: 25346484.
Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28(2):182-91. Epub 2006/07/07.
PMID: 16829207.

 

State of the Practice Reviews for SWGRTs

Murray DM, Taljaard M, Turner EL, George SM. Essential ingredients and innovations in the design and analysis of group-randomized trials. Annu Rev Public Health. 2020;41:1-19. Epub 2019/12/23.
PMID: 31869281.
Hemming K, Carroll K, Thompson JA, Forbes AB, Taljaard M, SW-CRT Review Group. Quality of stepped-wedge trial reporting can be reliably assessed using an updated CONSORT: Crowd-sourcing systematic review. J Clin Epidemiol. 2019;107:77-88. Epub 2018/12/01.
PMID: 30500405.
Eichner FA, Groenwold RHH, Grobbee DE, Rengerink O. Systematic review showed that stepped-wedge cluster randomized trials often did not reach their planned sample size. J Clin Epidemiol. 2019;107:89-100. Epub 2018/11/21.
PMID: 30458261.
Turner EL, Prague M, Gallis JA, Li F, Murray DM. Review of recent methodological developments in group-randomized trials: Part 2-Analysis. Am J Public Health. 2017b;107(7):1078-1086. Epub 2017/05/18.
PMID: 28520480.
Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of recent methodological developments in group-randomized trials: Part 1-Design. Am J Public Health. 2017a;107(6):907-915. Epub 2017/04/20.
PMID: 28426295.
Barker D, McElduff P, D'Este C, Campbell MJ. Stepped wedge cluster randomised trials: A review of the statistical methodology used and available. BMC Med Res Methodol. 2016;16:69. Epub 2016/06/09.
PMID: 27267471.
Martin JT, Taljaard M, Girling AJ, Hemming K. Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials. BMJ Open. 2016;6(2):e010166. Epub 2016/02/06.
PMID: 26846897.
Mdege ND, Man MS, Brown CATN, Torgerson DJ. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011;64(9):936-48. Epub 2011/03/16.
PMID: 21411284.

 

Sample Size Estimation for SWGRTs

Voldal EC, Hakhu NR, Xia F, Heagerty PJ, Hughes JP. swCRTdesign: An RPackage for Stepped Wedge Trial Design and Analysis. Comput Methods Programs Biomed. 2020;196:105514. Epub 2020/06/20.
PMID: 32554025.
Kasza J, Hooper R, Copas A, Forbes AB. Sample size and power calculations for open cohort longitudinal cluster randomized trials. Stat Med. 2020 Epub 2020/03/04.
PMID: 32133688.
Hemming K, Kasza J, Hooper R, Forbes AB, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. Int J Epidemiol. 2020;49(3):979-995. Epub 2020/02/22.
PMID: 32087011.
Harrison LJ, Chen T, Wang R. Power calculation for cross-sectional stepped wedge cluster randomized trials with variable cluster sizes. Biometrics. 2020;76(3):951-962. Epub 2019/11/04.
PMID: 31625596.
Kasza J, Hemming K, Hooper R, Matthews JNS, Forbes AB. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Stat Methods Med Res. 2019a;28(3):703-716. Epub 2017/10/13.
PMID: 29027505.
Kristunas CA, Smith KL, Gray LJ. An imbalance in cluster sizes does not lead to notable loss of power in cross-sectional, stepped-wedge cluster randomised trials with a continuous outcome. Trials. 2017;18(1):109. Epub 2017/03/07.
PMID: 28270224.
Hooper R, Teerenstra S, de Hoop E, Eldridge S. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med. 2016;35(26):4718-4728. Epub 2016/06/28.
PMID: 27350420.
Moerbeek M, Teerenstra S. Power analysis of trials with multilevel data. 2016 Boca Raton: CRC Press.
Baio G, Copas A, Ambler G, Hargreaves JR, Beard E, Omar RZ. Sample size calculation for a stepped wedge trial. Trials. 2015;16(1):354. Epub 2015/08/19.
PMID: 26282553.
Body
Looking for more information?

Check out the References, Glossary, and Frequently Asked Questions sections or send us a message.

Last updated on July 6, 2022