Stepped Wedge Group-Randomized Trials

FAQs CONSORT Key References

In a stepped wedge group-randomized trial (SWGRT), also called a stepped wedge cluster-randomized trial, groups or clusters begin the study in the control condition, are randomly assigned to sequences, and cross-over to the intervention condition at pre-determined time points in a sequential, staggered fashion until all groups or clusters receive the intervention (

Copas et al., 2015

;

Grantham et al., 2019

;

Hemming et al., 2015b

;

Hemming et al., 2020

;

Hemming and Taljaard, 2016

;

Hooper et al., 2016

;

Hughes et al., 2015

;

Hughes et al., 2024

;

Hussey and Hughes, 2007

;

;

;

;

;

Maleyeff et al., 2023

;

Nickless et al., 2018

Special methods are needed for analysis and sample size estimation for these studies, as detailed below and in the SWGRT sample size calculator.

Features and Uses

Staggered, Sequential Cross-Over

Launch the SWGRT Calculator

FAQs CONSORT Key References

Webinars and Training

An SWGRT is a trial in which groups cross over to the intervention condition at predetermined time points in a sequential, staggered fashion until all groups receive the intervention. The design has been used when limited resources or a large geographical area prevent the use of a conventional parallel GRT (

Hall et al., 1987

The design has also been used when staff training requirements in a clinical care intervention necessitated phased implementation (Moulton et al., 2007). It has also been employed to improve power when a limited number of groups was available (Peden et al., 2019).

Nested or Hierarchical Design

In a SWGRT, members are nested within groups or clusters so that each member appears in only one group or cluster. In cross-sectional SWGRTs, different members are observed in each group at each measurement occasion; in closed cohort SWGRTs, members are observed repeatedly so that measurements are nested within members; in open cohort SWGRTs, some members are observed in only one time period and others are observed during multiple time periods (Copas et al., 2015; Hooper et al., 2016; Kasza et al., 2019a; Kasza et al., 2020).

Appropriate Use

SWGRTs have become more popular over time but the design has a greater risk of bias compared to conventional parallel GRTs (Hemming and Taljaard, 2020). Therefore, the use of SWGRT over more conventional alternatives must have strong justification. Hemming and Taljaard (2020) provide a non-exhaustive list of broad justifications, indicating that it may be appropriate to use a SWGRT if:

it provides a randomized trial when the only alternative is a staggered non-randomized trial and stakeholders can be convinced to randomly assign treatment order,
it increases the likelihood that gatekeepers and stakeholders will enroll groups in the study due to receiving perceived benefits of the intervention while the trial is ongoing,
staggered, sequential delivery of the intervention is the only logistically feasible design, or
limited groups or resources are available and a SWGRT can attain the desired statistical power when a parallel GRT cannot.

Potential for Confounding

While a SWGRT tends to involve a limited number of groups, the impact of chance imbalances may be minimal because each group is exposed to both the control and intervention conditions (Hemming et al., 2020). Chance imbalances may still occur, in which case stratified or constrained randomization on important group-level characteristics has been shown to improve power and maintain type I error rates in parallel GRTs (Donner and Klar, 2000; Li et al., 2016; Li et al., 2017b; Murray, 1998) and we might expect similar effects in a SWGRT. In the SWGRT, the method for restricted randomization would be applied when the groups or clusters are randomized to sequences.

As time progresses, more groups implement the intervention condition. Therefore, time always has the potential to be a confounder in the relationship between the outcome and the intervention condition. To guard against this, time must be accounted for in the SWGRT design and analysis (Hemming et al., 2015b; Hemming and Taljaard, 2020).

Within- and Between-Period Correlations

Important factors in determining the sample size for a SWGRT are the intraclass correlation (ICC), the cluster autocorrelation (CAC), and the individual autocorrelation (IAC) (Girling and Hemming, 2016; Hooper et al., 2016; Kasza et al., 2020). These quantities provide information on the similarity among outcome values due to correlation within groups or clusters at the same time and to repeated measurements on the same groups or clusters or on the same members.

The ICC measures the similarity among values on the outcome variable for different members of the same group or cluster within a given time period. It is often described as the average correlation among members within the same group or cluster and within the same time period or as the proportion of variance due to group or cluster membership. The CAC is the correlation between the population means from the same group or cluster at two different time periods; it is sometimes called over-time correlation at the group level. The CAC is present in cross-sectional, closed cohort, and open cohort designs. The IAC is the correlation on the outcome variable for the same individual at two different time periods; it is sometimes called over-time correlation at the member level. The IAC is present only in closed and open cohort designs.

A characteristic of longitudinal GRTs such as SWGRTs is that the CAC and IAC can be considered as functions of compared time periods whose values decay over time. Failing to account for such decay in SWGRTs can result in increased Type I error rates (Kasza and Forbes, 2019b; Kasza et al., 2019a; Kasza et al., 2020; Li, 2020). There are many possible decay structures, such as discrete-time decay and block-exchangeable structures.

Intervention Effect Heterogeneity

It is common in SWGRTs to assume an instantaneous and sustained intervention effect when a group transitions to the intervention condition. However, there may be intervention effect heterogeneity as a function of exposure-time (Hughes et al., 2024

;

Kenny et al., 2022; Maleyeff et al., 2023). For example, the intervention effect may initially be weak, strengthen quickly in the first few time periods of exposure, and then slowly decay. Without evidence to the contrary, it is prudent to assume intervention effect heterogeneity will be present. Not accounting for intervention effect heterogeneity when it is present can result in severely biased estimates of the intervention effects and standard errors (Kenny et al., 2022; Maleyeff et al., 2023). The SWGRT sample size calculator implements the approach taken by Kenny, et al. for cross-sectional designs and Hughes, et al. for cohort designs.

In the heterogeneous intervention effect setting, the intervention effect changes with time, requiring greater care in defining the estimand of interest. Kenny et al. describe three estimands of interest. The first is to estimate the average intervention effect across all exposure periods, or time average treatment effect (TATE). The second option is to estimate the intervention effect at a specific time period, or point treatment effect (PTE). Finally, the third option is to estimate the long-term treatment effect (LTE), or the intervention effect at a later time point when the intervention effect is thought to have stabilized. If this later time point corresponds to the final time period in the trial, then the LTE can be estimated with the PTE for the final time period. The choice of estimand depends largely on scientific relevance and investigator interest, but in general tests for the TATE will be more powerful than those for the PTE or LTE. In addition, the LTE assumes that long-term effects are attained by the end of the trial.

Solutions

The recommended solutions to these challenges are to 1) employ stratified or constrained randomization techniques to balance important cluster-level covariates when assigning groups to sequences, 2) account for time in the study design and analysis, 3) assume intervention effect heterogeneity is present in the absence of evidence to the contrary, and 4) estimate the sample size for SWGRTs based on realistic and data-based estimates of within- and between-period correlations and other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the estimation of sample size.

FAQs

Show All Answers

When do I need to use an SWGRT?

SWGRTs should only be used when all efforts to implement a more conventional parallel GRT have been exhausted. Compared to parallel GRTs, SWGRTs are at greater risk of bias. Given these risks, strong justifications must be given for the use of SWGRTs.

What are some important references on the design and analysis of SWGRTs?

There are no textbooks dedicated to SWGRTs, but some provide overviews of design and analysis (Hayes and Moulton, 2017; Moerbeek and Teerenstra, 2016).

Several papers provide further information (Copas et al., 2015; Hemming et al., 2020; Hemming et al., 2015a; Hemming and Taljaard, 2020; Hooper et al., 2016; Hussey and Hughes, 2007; Kasza et al., 2019b; Kasza et al., 2020; Li et al., 2021).

Most references use z-scores instead of t-scores when calculating power or sample size for SWGRTs, but I would like to use t-scores. How do I calculate the degrees of freedom for such t-scores?

Z-scores and t-scores will give similar results if the df available for the test of the intervention effect are more than about 30. As the df decline below 30, it becomes increasingly important to use t-scores rather than z-scores. Unfortunately, the precise df to use for t-scores when calculating power or sample size for SWGRTs is unsettled, though it is a subject of on-going research. One approach is to use the number of groups or clusters minus the number of time periods minus one, but other approaches are possible (Hemming et al., 2020; Thompson et al., 2021).

What is the impact of variation in the size of the groups or clusters in a SWGRT?

Standard sources assume that each group or cluster has the same number of observations, but that is almost never true in practice. In GRTs, power decreases as the variation in group or cluster size increases. This is true for SWGRTs as well, but the power decrease is less pronounced (Girling, 2018; Kristunas et al., 2017).

If the distribution of group sizes within each sequence is the same, expressions for design effects assuming block-exchangeable correlation structure are available that inflate the average cluster size relative the corresponding equal-cluster design (Girling, 2018; Harrison et al., 2020).

Is it a problem if groups transition to the treatment condition too early (or vice versa)?

In SWGRTs, all groups eventually experience both study conditions. However, if groups transition to the treatment condition too late or too early, within-cluster contamination will arise that may produce biased results Several strategies have been suggested to mitigate the impact of this contamination (Hemming and Taljaard, 2020).

What are the factors that influence power in a SWGRT?

Power in a SWGRT is a function of several factors. These include the treatment effect, the number of time periods, the number of groups, the number of members per group, ICC, CAC, IAC, and the correlation decay structure.

I am planning a study and my estimates of CAC and IAC come from a previous analysis that assumed block-exchangeable structure. I believe discrete-time decay structure will be appropriate for my study. Can I still use these estimates of CAC and IAC?

Yes, but the CAC and IAC estimates from the block-exchangeable study should be adjusted for use with the planned discrete-time decay analysis. Expressions for this adjustment are available (Kasza et al., 2020). Note that this adjustment should only be obtained when using CAC and IAC estimates from an analysis that incorrectly assumed a block-exchangeable correlation structure when discrete-time decay was present. In addition, the proposed study must have the same number of periods and period length as the previous study. If this is not the case, or if you are unsure of the previous study's decay mechanism, number of time periods, or period length, then you should not make this adjustment.

How do I estimate sample size or power for a SWGRT?

There are numerous publications on sample size and power for SWGRTs (Baio et al., 2015; Hemming et al., 2020; Kasza et al., 2019b; Kasza et al., 2020). Detailed information is also available in the SWGRT Sample Size Calculator section of this website. That calculator supports sample size estimation for the three main types of SWGRTs: cross-sectional, open cohort, and closed cohort.

The intervention effect in a SWGRT can take a variety of forms. How does that affect the analytic plan and sample size estimation?

The original approach assumed a common secular trend and an immediate and constant intervention effect (

Hussey and Hughes, 2007

). Further work allowed treatment effects to vary across groups (

Hughes et al., 2015

). In addition, methods that model the intervention effect as a trend over time have been offered (

Hughes et al., 2015 ; Nickless et al., 2018). A general model for SWGRTs that accommodates various forms for the intervention effect has also been provided ( Li et al., 2021). Recently, the impact of ignoring time-varying intervention effects when it is present has been discussed ( Kenny et al., 2022

;

Maleyeff et al., 2023), which found that severely biased estimates of intervention effects and standard errors are possible. As a result, it is prudent to assume intervention effect heterogeneity will be present in the absence of evidence to the contrary and to estimate sample size and analyze the data using procedures that reflect that assumption.

Why is the SWGRT more sensitive to external events that might affect the outcome than the parallel GRT?

In the parallel GRT, the groups or clusters in the control condition remain in that condition throughout the trial. As such, if external events occur that affect the outcome, that will be seen in the control condition and it may be possible to adjust for it. In the SWGRT, the groups or clusters gradually cross over from the control condition to the intervention condition, so that there are fewer and fewer groups or clusters in the control condition as the study progresses. That can make it difficult to observe or adjust for the effect of an external event that may affect the outcome.

Show All FAQs

CONSORT Statement

Hemming K, Taljaard M, McKenzie JE, Hooper R, Copas A, Thompson JA, et al. Reporting of stepped wedge cluster randomised trials: Extension of the CONSORT 2010 statement with explanation and elaboration. BMJ. 2018;363:k1614. Epub 2018/11/09.

PMID: 30413417.

Key References

SWGRTs

Maleyeff L, Li F, Haneuse S, Wang R. Assessing exposure-time treatment effect heterogeneity in stepped-wedge cluster randomized trials. Biometrics. 2023;79(3):2551-64. Epub 2022/12/08.

PMID: 36416302.

Kenny A, Voldal EC, Xia F, Heagerty PJ, Hughes JP. Analysis of stepped wedge cluster randomized trials in the presence of a time-varying treatment effect. Stat Med. 2022;41(22):4311-39. Epub 2022/06/30.

PMID: 35774016.

Thompson JA, Hemming K, Forbes AB, Fielding KL, Hayes RJ. Comparison of small-sample standard-error corrections for generalised estimating equations in stepped wedge cluster randomised trials with a binary outcome: A simulation study. Stat Methods Med Res. 2021;30(2):425-439. Epub 2020/09/24.

PMID: 32970526.

Li F, Hughes JP, Hemming K, Taljaard M, Melnick ER, Heagerty PJ. Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: An overview. Stat Methods Med Res. 2021;30(2):612-639. Epub 2020/07/06.

PMID: 32631142.

Hemming K, Taljaard M. Reflection on modern methods: When is a stepped-wedge cluster randomized trial a good study design choice? Int J Epidemiol. 2020;49(3):1043-1052. Epub 2020/05/09.

PMID: 32386407.

Li F. Design and analysis considerations for cohort stepped wedge cluster randomized trials with a decay correlation structure. Stat Med. 2020;39(4):438-455. Epub 2019/12/04.

PMID: 31797438.

Grantham KL, Kasza J, Heritier S, Hemming K, Forbes AB. Accounting for a decaying correlation structure in cluster randomized trials with continuous recruitment. Stat Med. 2019;38(11):1918-1934. Epub 2019/01/21.

PMID: 30663132.

Nickless A, Voysey M, Geddes J, Yu LM, Fanshawe TR. Mixed effects approach to the analysis of the stepped wedge cluster randomised trial-Investigating the confounding effect of time through simulation. PLoS One. 2018;13(12):e0208876. Epub 2018/12/13.

PMID: 30543671.

Girling AJ. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling. Stat Med. 2018;37(30):4652-4664. Epub 2018/09/12.

PMID: 30209812.

Girling AJ, Hemming K. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med. 2016;35(13):2149-66. Epub 2016/01/07.

PMID: 26748662.

Copas A, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR. Designing a stepped wedge trial: Three main designs, carry-over effects and randomisation approaches. Trials. 2015;16(1):352. Epub 2015/08/17.

PMID: 26279154.

Hemming K, Haines TP, Chilton PJ, Girling AJ, Lilford RJ. The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ. 2015b;350:h391. Epub 2015/02/11.

PMID: 25662947.

Hemming K, Lilford RJ, Girling AJ. Stepped-wedge cluster randomised controlled trials: A generic framework including parallel and multiple-level designs. Stat Med. 2015a;34(2):181-96. Epub 2014/10/24.

PMID: 25346484.

Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials. 2007;28(2):182-91. Epub 2006/07/07.

PMID: 16829207.

State of the Practice Reviews for SWGRTs

Nevins P, Ryan M, Davis-Plourde K, Ouyang Y, Pereira Macedo JA, Meng C, et al. Adherence to key recommendations for design and analysis of stepped-wedge cluster randomized trials: A review of trials published 2016-2022. Clin Trials. 2023;17407745231208397. Epub 2023/11/21.

PMID: 37990575.

Murray DM, Taljaard M, Turner EL, George SM. Essential ingredients and innovations in the design and analysis of group-randomized trials. Annu Rev Public Health. 2020;41:1-19. Epub 2019/12/23.

PMID: 31869281.

Hemming K, Carroll K, Thompson JA, Forbes AB, Taljaard M, SW-CRT Review Group. Quality of stepped-wedge trial reporting can be reliably assessed using an updated CONSORT: Crowd-sourcing systematic review. J Clin Epidemiol. 2019;107:77-88. Epub 2018/12/01.

PMID: 30500405.

Eichner FA, Groenwold RHH, Grobbee DE, Rengerink O. Systematic review showed that stepped-wedge cluster randomized trials often did not reach their planned sample size. J Clin Epidemiol. 2019;107:89-100. Epub 2018/11/21.

PMID: 30458261.

Turner EL, Prague M, Gallis JA, Li F, Murray DM. Review of recent methodological developments in group-randomized trials: Part 2-Analysis. Am J Public Health. 2017b;107(7):1078-1086. Epub 2017/05/18.

PMID: 28520480.

Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of recent methodological developments in group-randomized trials: Part 1-Design. Am J Public Health. 2017a;107(6):907-915. Epub 2017/04/20.

PMID: 28426295.

Barker D, McElduff P, D'Este C, Campbell MJ. Stepped wedge cluster randomised trials: A review of the statistical methodology used and available. BMC Med Res Methodol. 2016;16:69. Epub 2016/06/09.

PMID: 27267471.

Martin JT, Taljaard M, Girling AJ, Hemming K. Systematic review finds major deficiencies in sample size methodology and reporting for stepped-wedge cluster randomised trials. BMJ Open. 2016;6(2):e010166. Epub 2016/02/06.

PMID: 26846897.

Mdege ND, Man MS, Brown CATN, Torgerson DJ. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011;64(9):936-48. Epub 2011/03/16.

PMID: 21411284.

Sample Size Estimation for SWGRTs

Hughes JP, Lee WY, Troxel AB, Heagerty PJ. Sample Size Calculations for Stepped Wedge Designs with Treatment Effects that May Change with the Duration of Time under Intervention. Prev Sci. 2024;25(Suppl 3):348-55. Epub 2023/09/20.

PMID: 37728810.

Xia F, Hughes JP, Voldal EC, Heagerty PJ. Power and sample size calculation for stepped-wedge designs with discrete outcomes. Trials. 2021;22(1):598. Epub 2021/09/06.

PMID: 34488848.

Voldal EC, Hakhu NR, Xia F, Heagerty PJ, Hughes JP. swCRTdesign: An RPackage for Stepped Wedge Trial Design and Analysis. Comput Methods Programs Biomed. 2020;;196:105514. Epub 2020/06/20.

PMID: 32554025.

Kasza J, Hooper R, Copas A, Forbes AB. Sample size and power calculations for open cohort longitudinal cluster randomized trials. Stat Med. 2020 Epub 2020/03/04.

PMID: 32133688.

Hemming K, Kasza J, Hooper R, Forbes AB, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. Int J Epidemiol. 2020;49(3):979-995. Epub 2020/02/22.

PMID: 32087011.

Harrison LJ, Chen T, Wang R. Power calculation for cross-sectional stepped wedge cluster randomized trials with variable cluster sizes. Biometrics. 2020;76(3):951-962. Epub 2019/11/04.

PMID: 31625596.

Kasza J, Hemming K, Hooper R, Matthews JNS, Forbes AB. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomised trials. Stat Methods Med Res. 2019a;28(3):703-716. Epub 2017/10/13.

PMID: 29027505.

Kristunas CA, Smith KL, Gray LJ. An imbalance in cluster sizes does not lead to notable loss of power in cross-sectional, stepped-wedge cluster randomised trials with a continuous outcome. Trials. 2017;18(1):109. Epub 2017/03/07.

PMID: 28270224.

Hooper R, Teerenstra S, de Hoop E, Eldridge S. Sample size calculation for stepped wedge and other longitudinal cluster randomised trials. Stat Med. 2016;35(26):4718-4728. Epub 2016/06/28.

PMID: 27350420.

Moerbeek M, Teerenstra S. Power analysis of trials with multilevel data. 2016 Boca Raton: CRC Press.

Baio G, Copas A, Ambler G, Hargreaves JR, Beard E, Omar RZ. Sample size calculation for a stepped wedge trial. Trials. 2015;16(1):354. Epub 2015/08/19.

PMID: 26282553.