Individually Randomized Group-Treatment Trials

FAQs CONSORT Key References

In an individually randomized group-treatment (IRGT) trial, also called a partially clustered or partially nested design, individuals are randomized to study conditions but receive at least some of their intervention with other participants or through an interventionist or facilitator shared with other participants (

Baldwin et al., 2011

;

Bauer et al., 2008

;

Kahan and Morris, 2013

;

Lee and Thompson, 2005b

;

Lee and Thompson, 2005a

;

Pals et al., 2008

;

Pals et al., 2011

;

Roberts and Roberts, 2005

;

Roberts and Walwyn, 2013

Special methods are needed for analysis and sample size estimation for these studies, as detailed below and in the IRGT sample size calculator.

Features and Uses

Groups or Common Interventionist or Facilitator

An IRGT trial is a randomized trial in which participants in one or more study conditions receive at least some of their treatment in groups or through a common interventionist or facilitator. This design is common in surgical trials, where each surgeon operates on multiple patients (

Cook et al., 2012

;

Oltean and Gagnier, 2015

Launch the IRGT Calculator

FAQs CONSORT Key References

Webinars and Training

It is common in psychotherapy trials, where a therapist may treat multiple patients, either in groups or as individuals (Baldwin et al., 2005; Carlbring et al., 2006; Sterba, 2017). It is common in a variety of intervention trials addressing health behaviors such as weight loss, smoking cessation, and physical activity, which may include group activities as well as individual activities (Jeffery et al., 2006).

Nested or Hierarchical Design

IRGTs always have a hierarchical structure in the intervention condition. Participants may receive some of their treatment in groups, or they may receive their intervention individually, but through a common interventionist or facilitator, whether in person or through a video or other virtual connection (Lee and Thompson, 2005b). These issues are especially challenging in an IRGT because the design may not have the same hierarchical structure in all conditions. In that case, the analytic model must accommodate a heterogeneous variance-covariance structure, allowing for intraclass correlation (ICC) in the intervention condition, but not in the control condition. For this reason, it is even more important for investigators to rely on an experienced methodologist in developing design and analytic plans for an IRGT.

Appropriate Uses

IRGTs can be employed in a wide variety of settings and populations to address a wide variety of research questions. They are an appropriate design when the investigator wants to evaluate an intervention that:

involves at least one component that is delivered in a group format,
it is necessary to use a limited number of intervention delivery staff, or interventionists or facilitators, so that each one interacts with multiple participants, or
it is necessary to have participants interact with one another in a virtual environment.

Potential for Confounding

IRGTs randomize individuals to study conditions. If the number is large, confounding is not likely to be a threat to the internal validity of the design. If the number is small, confounding could be a threat, and a priori matching, a priori stratification, or constrained randomization would be useful strategies to protect against confounding.

Intraclass Correlation (ICC)

The more challenging feature of IRGTs is that participants in at least the intervention condition will interact post-randomization. Even if their groups or virtual networks are constructed using random assignment, those participants will interact with one another directly in their groups or indirectly through their common interventionist or facilitator. This interaction creates the expectation that some level of ICC will develop. The magnitude of the ICC in an IRGT will depend on the type, duration, and intensity of these interactions. That ICC may be negligible at baseline, but it can develop over the course of the trial. With a limited number of groups or interventionists or facilitators, the degrees of freedom (df) available to estimate the ICC, or the component of variance associated with the group or interventionist or facilitator, will be limited. As for group-randomized trials (GRTs), any analysis that ignores the extra variation (or positive ICC) or the limited df will have a type 1 error rate that is inflated. (Baldwin et al., 2011; Bauer et al., 2008; Candlish et al., 2018; Kahan and Morris, 2013; Lee and Thompson, 2005b; Lee and Thompson, 2005a; Pals et al., 2008; Pals et al., 2011; Roberts and Roberts, 2005; Roberts and Walwyn, 2013).

Solutions

The recommended solution to these challenges is like the solution proposed for GRTs. It is important to employ a priori matching, a priori stratification, or constrained randomization to balance potential confounders if the number of assignment units is limited, to reflect the hierarchical or partially hierarchical structure of the design in the analytic plan, and to estimate the sample size for the IRGT based on realistic and data-based estimates of the ICC and the other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the sample size estimation.

The sections below provide additional resources for investigators considering an individually randomized group-treatment trial, or partially clustered design.

FAQs

Show All Answers

When do I need to use an IRGT?

Use an IRGT if you can randomize individual participants but need to deliver the intervention in groups or through a common interventionist or facilitator either for pedagogic or practical reasons. If you can randomize individual participants and can deliver the intervention to participants one at a time without going through a common interventionist or facilitator, it is more efficient and easier to use a traditional RCT.

What are some important references on the design and analysis of IRGTs?

There are no textbooks on IRGTs, but there are several papers (Baldwin et al., 2011; Lohr et al., 2014; Moerbeek, 2020; Murray et al., 2020; Pals et al., 2008; Roberts et al., 2016; Roberts and Roberts, 2005). There are also sources for information on power and sample size in IRGTs (Heo et al., 2017; Moerbeek, 2020; Moerbeek and Teerenstra, 2016).

Most references use z-scores in calculating power or sample size for parallel GRTs and IRGTs, but others use t-scores. Which one should be used?

The most accurate result will be available with t-scores. For studies in which the number of units randomized to conditions is 50 or more, z-scores will work well. As the number of randomization units decreases, the df available for the test of the intervention effect also decrease, and the difference between z-scores and t-scores increases.

In longer trials, it is common for participants to change groups over time. Is this a problem?

It is important to distinguish between changing study conditions or study arms and changing groups or clusters. In a parallel GRT or IRGT, it is important to ensure that each participant remains in the study condition to which they were randomized. Those assigned to the intervention condition should not move to the control condition, and vice versa. Sometimes that is unavoidable, but it should be uncommon. If it does happen, standard practice is to analyze as randomized, under the intention-to-treat principle.

The other possibility is that a participant in a GRT or IRGT would change groups or clusters even as they stay in the same study condition or study arm. In a school-based trial, a participant from one intervention school might move to another intervention school. Or in an IRGT, a participant who usually went to the Tuesday night class might sometimes go to the Saturday morning class. Recent studies have shown that failure to account for changing group membership can result in an inflated type I error rate (Andridge et al., 2014). Several authors provide methods for analyzing data to account for such changes (Candlish et al., 2018; Luo et al., 2015; Roberts and Walwyn, 2013; Sterba, 2017).

What is the impact of variation in the size of the groups or clusters that are randomized, or through which participants receive their intervention?

Standard sources assume that each group or cluster has the same number of observations, but that is almost never true in practice. So long as the coefficient of variation (CV) of group size is less than 0.23, such variation can be ignored (Eldridge et al., 2006). But as the variation grows more marked, analysts risk an inflated type I error rate if they ignore it (Johnson et al., 2015).

In addition, power falls as the variation in group or cluster size increases, so that it needs to be addressed in the sample size calculations. There are a number of publications on this issue for GRTs (Candel and van Breukelen, 2010; Candel and van Breukelen, 2016; Hemming et al., 2020; Lauer et al., 2015; Liu et al., 2021; Moerbeek and Teerenstra, 2016; van Breukelen et al., 2007; Wang et al., 2020; Xu et al., 2019; You et al., 2011). There are also a few publications on this issue for IRGTs (Candel and van Breukelen, 2009; Moerbeek and Teerenstra, 2016).

If I randomize individuals to conditions, but they receive their treatments in small groups led by a trained instructor or facilitators, how many such small groups do I need?

This will depend on several factors, including the expected ICC that reflects the average correlation among observations taken on members of the same small group, and the number of participants in each group. There is no one answer that will be correct for all studies. The best approach is to perform a power analysis, or calculate sample size, using methods appropriate to IRGTs. There are several papers that address this question (Hemming et al., 2020; Moerbeek and Teerenstra, 2016; Pals et al., 2008).

If I use the same trained instructor or facilitator for all the groups in the intervention condition, won’t that improve fidelity of implementation? What is wrong with that approach?

It is true that this approach will improve the fidelity of implementation. But the problem is that this approach completely confounds the instructor/facilitator with the study condition: everyone who gets the intervention gets exposed to that instructor/facilitator. In that situation, it is impossible to separate the effect of the intervention from the effect of the instructor/facilitator. It is possible that a charismatic instructor/facilitator could generate beneficial effects on the outcome of interest, even if the intervention itself is completely ineffective, and the investigator would not be able to distinguish those two effects. It is better to use an IRGT design in which multiple instructors/facilitators are used in the intervention condition so that variability due to instructors/facilitators can be separated from variability due to the intervention.

Is there any way to avoid having to include the groups in the analysis as a random effect?

In a parallel GRT, the groups are the units of assignment and are nested within study conditions, with different groups in each condition. In an IRGT, the groups are created in the intervention condition to facilitate delivery of the intervention; those groups may be defined by their instructor or facilitator, surgeon, therapist, or other interventionist, or they may be virtual groups. So long as the groups are nested within study conditions, they must be included in the analysis as levels of a random effect; ignoring them, or including them as levels of a fixed effect, will result in an inflated type 1 error rate. That is true for GRTs (Campbell and Walters, 2014; Donner and Klar, 2000; Eldridge and Kerry, 2012; Hayes and Moulton, 2017; Murray, 1998) and for IRGTs (Baldwin et al., 2011; Bauer et al., 2008; Candlish et al., 2018; Kahan and Morris, 2013; Lee and Thompson, 2005a; Lee and Thompson, 2005b; Pals et al., 2008; Pals et al., 2011; Roberts and Roberts, 2005; Roberts and Walwyn, 2013). This is because nested factors must be modeled as random effects (

Zucker, 1990

This explanation also offers a potential solution – if the investigator can avoid nesting groups within study conditions, the requirement to model those groups as levels of a random effect disappears. The alternative to nesting is crossing, so if it is possible to cross the levels of the grouping factor with study conditions, then the grouping factor becomes a stratification factor and the investigator is free to model the grouping factor as a random effect, as a fixed effect, or to ignore the grouping factor in the analysis. –

For example, if schools are randomized to study conditions, the study is a GRT. But if students within schools are randomized to study conditions, the schools will be crossed with study conditions and we have a stratified RCT; the investigator can model the schools as a random effect, as a fixed effect, or ignore it in the analysis. As another example, if the therapists used to deliver the intervention in an IRGT also deliver an alternative intervention in the control condition, the therapists will be crossed with study condition and the investigator can model therapist as a random effect, as a fixed effect, or ignore therapist in the analysis. In either example, the choice between modeling the grouping factor as random, as fixed, or ignoring it will depend on factors like power and generalizability.

Many studies seem to pick an ICC value arbitrarily for use in their power or sample size calculations. What criteria should be used for selecting an ICC for such calculations?

The best estimate for the ICC will reflect the circumstances for the trial being planned. That estimate will be from the same target population, so that it reflects the appropriate groups or clusters (e.g., schools vs. clinics vs. worksites vs. communities); age groups (e.g., youth vs. young adults vs. seniors); ethnic, racial, and gender diversity; and other characteristics of the target population. That estimate will derive from data collected for the same outcome using the same measurement methods to be used for the primary outcome in the trial being planned. For example, if planning a trial to improve servings of fruits and vegetables in inner-city third graders, it would be important to get an ICC estimate for servings of fruits and vegetables, measured in the same way as servings would be measured in the trial being planned, from third-graders in inner-city schools like the schools that would be recruited for the trial being planned.

Can regression adjustment for covariates improve power in a parallel GRT or IRGT?

Regression adjustment for covariates often improves power in a GRT or IRGT by reducing the residual error variance or the ICC (Murray and Blitstein, 2003). At the same time, it is important to remember that regression adjustment for covariates can reduce power in a GRT or IRGT by increasing the ICC (Murray, 1998). As such, it is important to choose covariates carefully. The best covariates will be related to the outcome and unevenly distributed between the study conditions or among the groups or clusters randomized to the study conditions.

Many people say that if you match or stratify a priori, you must use a matched or stratified analysis. Is this true for parallel GRTs and IRGTs?

No. When a priori matching or stratification is used for balance, the matching or stratification factor may be included in the analysis of intervention effects, but that is not required, and it may be inefficient to do so. It is not required because the type 1 error rate is unaffected when the matching or stratification factor is ignored in the analysis of intervention effects (Diehr et al., 1995; Proschan, 1996). Both procedures reduce the df available for the test of the intervention effect, and if the number of df is limited, the unmatched or unstratified analysis may be more powerful than the matched or stratified analysis. In that circumstance, it is to the investigator’s advantage to match or stratify in the design to achieve balance on potential confounders, but to ignore the matching or stratification in the analysis to improve power or reduce sample size (Diehr et al., 1995). The choice of whether to include the matching or stratification factor in the analysis should be made a priori based on sample size calculations comparing the matched or stratified analysis to the unmatched or unstratified analysis.

The choice between a priori matching and a priori stratification for balance should be guided by whether the investigator anticipates doing analyses that do not involve intervention effects. Donner et al., 2007 have warned against ignoring matching in analyses that do not involve intervention effects, e.g., in an analysis to examine the association between a risk factor and an outcome. Ignoring matching in the analysis in this situation can lead to an inflated type 1 error rate when the correlation between the matching factor and either the outcome or the risk factor is at least modest (>0.2) and the number of members per group is not large (<100). Stratification with strata of size four avoids this problem and improves efficiency almost as much as matching. For this reason, stratification with strata of size four is a prudent strategy for balancing potential confounders across study conditions because it is almost as efficient as matching, and it does not limit the range of analyses that can be applied to the data.

Many studies randomize individuals but deliver their treatment in small groups, or through a common interventionist or facilitator. Most ignore those design features in their power & sample size calculations & in their analysis. Why is that a problem?

These studies are individually randomized group treatment trials (IRGTs), sometimes called partially clustered designs. While these trials are common, most investigators do not recognize the implications of this design. IRGTs always have a hierarchical structure in the intervention condition. Participants may receive some of their treatment in groups, or they may receive their intervention individually, but through a common interventionist or facilitator, whether in person or through a video or other virtual connection (Lee and Thompson, 2005b). There may or may not be a similar structure in the control condition, depending on the nature of the control condition. Whether it exists in one or both study conditions, the hierarchical structure requires that the positive ICC expected in the data be accounted for in the sample size estimation and in the data analysis. Any analysis that ignores the positive ICC or what may be limited df will have a type 1 error rate that is inflated, (Baldwin et al., 2011; Bauer et al., 2008; Candlish et al., 2018; Kahan and Morris, 2013; Lee and Thompson, 2005b; Lee and Thompson, 2005a; Pals et al., 2008; Pals et al., 2011; Roberts and Roberts, 2005; Roberts and Walwyn, 2013).

The recommended solution to these challenges is like the solution recommended for GRTs. It is important to employ a priori matching or stratification to balance potential confounders if the number of assignment units is limited, to reflect the hierarchical or partially hierarchical structure of the design in the analytic plan, and to estimate the sample size for the IRGT based on realistic and data-based estimates of the ICC and the other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the sample size estimation.

No. In public health and medicine, ICCs in group- or cluster-randomized trials are often small, usually ranging from 0.01–0.05 (Moerbeek and Teerenstra, 2016). While it is tempting to ignore such small correlations, doing so risks an inflated type I error rate, and the risk is substantial both in parallel GRTs (Campbell and Walters, 2014; Donner and Klar, 2000; Eldridge and Kerry, 2012; Hayes and Moulton, 2017; Murray, 1998) and in IRGTs (Baldwin et al., 2011; Bauer et al., 2008; Candlish et al., 2018; Kahan and Morris, 2013; Lee and Thompson, 2005b; Lee and Thompson, 2005a; Pals et al., 2011; Pals et al., 2008; Roberts and Roberts, 2005; Roberts and Walwyn, 2013). The prudent course is to reflect all nested factors as random effects and to plan the study to have sufficient power given a proper analysis.

No. That is another tempting strategy that can risk an inflated type I error rate. The standard error for the variance component is not well estimated when the value is close to zero, and if the df are limited, the power will be limited. As such, it is likely that the result will suggest that the ICC or variance component is negligible, when ignoring it will inflate the type I error rate. The prudent course is to reflect all nested factors as random effects and to plan the study to have sufficient power given a proper analysis.

Show All FAQs

CONSORT Statement

Boutron I, Altman DG, Moher D, Schulz KF, Ravaud P, Consort NPT Group. CONSORT Statement for Randomized Trials of Nonpharmacologic Treatments: A 2017 update and a CONSORT Extension for Nonpharmacologic Trial Abstracts. Ann Intern Med. 2017;167(1):40-47. Epub 2017/06/20.

PMID: 28630973.

Key References

IRGTs

Wang X, Turner EL, Li F. Designing individually randomized group treatment trials with repeated outcome measurements using generalized estimating equations. Stat Med. 2024;43(2):358-78. Epub 2023/11/27.

PMID: 38009329.

Lange KM, Kasza J, Sullivan TR, Yelland LN. Partially clustered designs for clinical trials: Unifying existing designs using consistent terminology. Clin Trials. 2023;20(2):99-110. Epub 2023/01/10.

PMID: 36628406.

Brown H, Hedeker D, Gibbons RD, Duan N, Almirall D, Gallo C, et al. Accounting for Context in Randomized Trials after Assignment. Prev Sci. 2022;23(8):1321-32. Epub 2022/09/09.

PMID: 36083435.

Roberts C. The implications of noncompliance for randomized trials with partial nesting due to group treatment. Stat Med. 2021;40(2):349-368. Epub 2020/10/28.

PMID: 33118193.

Moerbeek M. Optimal designs for group randomized trials and group administered treatments with outcomes at the subject and group level. Stat Methods Med Res. 2020;29(3):797-810. Epub 2019/05/01.

PMID: 31041883.

Candlish J, Teare MD, Dimairo M, Flight L, Mandefield L, Walters SJ. Appropriate statistical methods for analysing partially nested randomised controlled trials with continuous outcomes: a simulation study. BMC Med Res Methodol. 2018;18(1):105. Epub 2018/10/11.

PMID: 30314463.

Hooper R, Forbes AB, Hemming K, Takeda A, Beresford L. Analysis of cluster randomised trials with an assessment of outcome at baseline. BMJ. 2018;360:k1121. Epub 2018/03/20.

PMID: 29559436.

Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clinical Trials. 2005;2(2):152-162.

PMID: 16279137.

Sterba SK. Partially nested designs in psychotherapy trials: A review of modeling developments. Psychother Res. 2017;27(4):425-436. Epub 2015/12/19.

PMID: 26686878.

Turner EL, Li F, Gallis JA, Prague M, Murray DM. Review of recent methodological developments in group-randomized trials: Part 1-Design. Am J Public Health. 2017a;107(6):907-915. Epub 2017/04/20.

PMID: 28426295.

Turner EL, Prague M, Gallis JA, Li F, Murray DM. Review of recent methodological developments in group-randomized trials: Part 2-Analysis. Am J Public Health. 2017b;107(7):1078-1086. Epub 2017/05/18.

PMID: 28520480.

Lai MH, Kwok OM. Estimating standardized effect sizes for two- and three-level partially nested data. Multivariate Behav Res. 2016;51(6):740-756. Epub 2016/11/01.

PMID: 27802077.

Hedges LV, Citkowicz M. Estimating effect size when there is clustering in one treatment group. Behav Res Methods. 2015;47(4):1295-1308. Epub 2014/11/27.

PMID: 25425393.

Andridge RR, Shoben AB, Muller KE, Murray DM. Analytic methods for individually randomized group treatment trials and group-randomized trials when subjects belong to multiple groups. Statistics in Medicine. 2014;33(13):2178-90. Epub 2014/01/08.

PMID: 24399701.

Kahan BC, Morris TP. Assessing potential sources of clustering in individually randomised trials. BMC Med Res Methodol. 2013;13:58. Epub 2013/04/18.

PMID: 23590245.

Baldwin SA, Bauer DJ, Stice E, Rohde P. Evaluating models for partially clustered designs. Psychological Methods. 2011;16(2):149-65. Epub 2011/04/27.

PMID: 21517179.

Pals SL, Murray DM, Alfano CM, Shadish WR, Hannan PJ, Baker WL. Individually randomized group treatment trials: A critical appraisal of frequently used design and analytic approaches. American Journal of Public Health. 2008;98(8):1418-1424. Epub 2008/06/12. Erratum.

PMID: 18556603.

Hoover DR. Clinical trials of behavioral interventions with heterogeneous teaching subgroup effects. Statistics in Medicine. 2002;21(10):1351-1364.

PMID: 12185889.

State of the Practice Reviews for IRGTs

Oltean H, Gagnier JJ. Use of clustering analysis in randomized controlled trials in orthopaedic surgery. BMC Med Res Methodol. 2015;15:17. Epub 2015/04/19.

PMID: 25887529.

Pals SL, Wiegand RE, Murray DM. Ignoring the group in group-level HIV/AIDS intervention trials: A review of reported design and analytic methods. AIDS. 2011;25(7):989-96. Epub 2011/04/14.

PMID: 21487252.

PMID: 18556603.

Lee KJ, Thompson SG. Clustering by health professional in individually randomised trials. BMJ. 2005a;330(7483):142-4. Epub 2005/01/15.

PMID: 15649931.

Sample Size Estimation for IRGTs

Teerenstra S, Kasza J, Leontjevas R, Forbes AB. Sample size for partially nested designs and other nested or crossed designs with a continuous outcome when adjusted for baseline. Stat Med. 2023;42(19):3568-92. Epub 2023/06/22.

PMID: 37348855.

Hemming K, Kasza J, Hooper R, Forbes AB, Taljaard M. A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator. Int J Epidemiol. 2020;49(3):979-995. Epub 2020/02/22.

PMID: 32087011.

Heo M, Litwin AH, Blackstock O, Kim N, Arnsten JH. Sample size determinations for group-based randomized clinical trials with different levels of data hierarchy between experimental and control arms. Stat Methods Med Res. 2017;26(1):399-413. Epub 2016/07/11.

PMID: 25125453.

Moerbeek M, Teerenstra S. Power analysis of trials with multilevel data. 2016 Boca Raton: CRC Press.

Roberts C, Walwyn R. Design and analysis of non-pharmacological treatment trials with multiple therapists per patient. Statistics in Medicine. 2013;32(1):81-98. Epub 2012/08/02.

PMID: 22865729.

Candel MJ, van Breukelen GJ. Varying cluster sizes in trials with clusters in one treatment arm: Sample size adjustments when testing treatment effects with linear mixed models. Statistics in Medicine. 2009;28(18):2307-24. Epub 2009/05/01.

PMID: 19472169.

Moerbeek M, Wong WK. Sample size formulae for trials comparing group and individual treatments in a multilevel model. Statistics in Medicine. 2008;27(15):2850-64. Epub 2007/10/26.

PMID: 17960589.

PMID: 18556603.

Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clinical Trials. 2005;2(2):152-162.

PMID: 16279137.

Research Methods Resources

Individually Randomized Group-Treatment Trials

Features and Uses

Groups or Common Interventionist or Facilitator

Webinars and Training

Nested or Hierarchical Design

Appropriate Uses

Potential for Confounding

Intraclass Correlation (ICC)

Solutions

IRGTs

State of the Practice Reviews for IRGTs

Sample Size Estimation for IRGTs

Looking for more information?