In an individually randomized group-treatment (IRGT) trial, also called a partially clustered or partially nested design, individuals are randomized to study conditions but receive at least some of their intervention with other participants or through an interventionist or facilitator shared with other participants. (

;

;

;

;

;

;

;

;

).

Special methods are required for analysis and sample size estimation for these studies, and investigators will need to show that their methods are appropriate.

### Features and Uses

#### Groups or Common Interventionist or Facilitator

An IRGT trial is a randomized trial in which participants in one or more study conditions receive at least some of their treatment in groups or through a common interventionist or facilitator. This design is common in surgical trials, where each surgeon operates on multiple patients (

;

).

### NIH Webinars

- Methods: Mind the Gap Webinar: Design and Analysis of Individually Randomized Group-Treatment Trials in Public Health

- Pragmatic and Group-Randomized Trials in Public Health and Medicine Course
- Methods: Mind the Gap Webinar: Design and Analysis of Studies to Evaluate Multilevel Interventions in Public Health and Medicine

It is common in psychotherapy trials, where a therapist may treat multiple patients, either in groups or as individuals (

; ; ). It is common in a variety of intervention trials addressing health behaviors such as weight loss, smoking cessation, and physical activity, which may include group activities as well as individual activities ( ).#### Nested or Hierarchical Design

IRGTs always have a hierarchical structure in the intervention condition. Participants may receive some of their treatment in groups, or they may receive their intervention individually, but through a common interventionist or facilitator, whether in person or through a video or other virtual connection (

). These issues are especially challenging in an IRGT because the design may not have the same hierarchical structure in all conditions. In that case, the analytic model must accommodate a heterogeneous variance-covariance structure, allowing for intraclass correlation (ICC) in the intervention condition, but not in the control condition. For this reason, it is even more important for investigators to rely on an experienced methodologist in developing design and analytic plans for an IRGT.#### Appropriate Uses

IRGTs can be employed in a wide variety of settings and populations to address a wide variety of research questions. They are an appropriate design when the investigator wants to evaluate an intervention that:

- involves at least one component that is delivered in a group format,
- it is necessary to use a limited number of intervention delivery staff, or interventionists or facilitators, so that each one interacts with multiple participants, or
- it is necessary to have participants interact with one another in a virtual environment.

#### Potential for Confounding

IRGTs randomize individuals to study conditions. If the number is large, confounding is not likely to be a threat to the internal validity of the design. If the number is small, confounding could be a threat, and a *priori* matching, a *priori* stratification, or constrained randomization would be useful strategies to protect against confounding.

#### Intraclass Correlation (ICC)

The more challenging feature of IRGTs is that participants in at least the intervention condition will interact post-randomization. Even if their groups or virtual networks are constructed using random assignment, those participants will interact with one another directly in their groups or indirectly through their common interventionist or facilitator. This interaction creates the expectation that some level of ICC will develop. The magnitude of the ICC in an IRGT will depend on the type, duration, and intensity of these interactions. That ICC may be negligible at baseline, but it can develop over the course of the trial. With a limited number of groups or interventionists or facilitators, the degrees of freedom (df) available to estimate the ICC, or the component of variance associated with the group or interventionist or facilitator, will be limited. As for group-randomized trials (GRTs), any analysis that ignores the extra variation (or positive ICC) or the limited df will have a type 1 error rate that is inflated. (

; ; ; ; ; ; ; ; ; ).#### Solutions

The recommended solution to these challenges is like the solution proposed for GRTs. It is important to employ a *priori* matching, a *priori* stratification, or constrained randomization to balance potential confounders if the number of assignment units is limited, to reflect the hierarchical or partially hierarchical structure of the design in the analytic plan, and to estimate the sample size for the IRGT based on realistic and data-based estimates of the ICC and the other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the sample size estimation.

The sections below provide additional resources for investigators considering an individually randomized group-treatment trial, or partially clustered design.

Use an IRGT if you can randomize individual participants but need to deliver the intervention in groups or through a common interventionist or facilitator either for pedagogic or practical reasons. If you can randomize individual participants and can deliver the intervention to participants one at a time without going through a common interventionist or facilitator, it is more efficient and easier to use a traditional RCT.

There are no textbooks on IRGTs, but there are several papers (

; ; ; ; ; ; ). There are also sources for information on power and sample size in IRGTs ( ; ; ).The most accurate result will be available with t-scores. For studies in which the number of units randomized to conditions is 50 or more, z-scores will work well. As the number of randomization units decreases, the df available for the test of the intervention effect also decrease, and the difference between z-scores and t-scores increases.

It is important to distinguish between changing study conditions or study arms and changing groups or clusters. In a parallel GRT or IRGT, it is important to ensure that each participant remains in the study condition to which they were randomized. Those assigned to the intervention condition should not move to the control condition, and vice versa. Sometimes that is unavoidable, but it should be uncommon. If it does happen, standard practice is to analyze as randomized, under the intention-to-treat principle.

The other possibility is that a participant in a GRT or IRGT would change groups or clusters even as they stay in the same study condition or study arm. In a school-based trial, a participant from one intervention school might move to another intervention school. Or in an IRGT, a participant who usually went to the Tuesday night class might sometimes go to the Saturday morning class. Recent studies have shown that failure to account for changing group membership can result in an inflated type I error rate (

). Several authors provide methods for analyzing data to account for such changes ( ; ; ; ).Standard sources assume that each group or cluster has the same number of observations, but that is almost never true in practice. So long as the ratio of the largest to the smallest group is no worse than about 2:1, such variation can be ignored. But as the variation grows more marked, analysts risk an inflated type I error rate if they ignore it (

).In addition, power falls as the variation in group or cluster size increases, so that it needs to be addressed in the sample size calculations. There are a number of publications on this issue for GRTs (

; ; ; ; ; ; ; ; ; ). There are also a few publications on this issue for IRGTs ( ; ).This will depend on several factors, including the expected ICC that reflects the average correlation among observations taken on members of the same small group, and the number of participants in each group. There is no one answer that will be correct for all studies. The best approach is to perform a power analysis, or calculate sample size, using methods appropriate to IRGTs. There are several papers that address this question (

; ; ).It is true that this approach will improve the fidelity of implementation. But the problem is that this approach completely confounds the instructor/facilitator with the study condition: everyone who gets the intervention gets exposed to that instructor/facilitator. In that situation, it is impossible to separate the effect of the intervention from the effect of the instructor/facilitator. It is possible that a charismatic instructor/facilitator could generate beneficial effects on the outcome of interest, even if the intervention itself is completely ineffective, and the investigator would not be able to distinguish those two effects. It is better to use an IRGT design in which multiple instructors/facilitators are used in the intervention condition so that variability due to instructors/facilitators can be separated from variability due to the intervention.

In a parallel GRT, the groups are the units of assignment and are nested within study conditions, with different groups in each condition. In an IRGT, the groups are created in the intervention condition to facilitate delivery of the intervention; those groups may be defined by their instructor or facilitator, surgeon, therapist, or other interventionist, or they may be virtual groups. So long as the groups are nested within study conditions, they must be included in the analysis as levels of a random effect; ignoring them, or including them as levels of a fixed effect, will result in an inflated type 1 error rate. That is true for GRTs (

; ; ; ; ) and for IRGTs ( ; ; ; ; ; ; ; ; ; ). This is because nested factors must be modeled as random effects ().

This explanation also offers a potential solution – if the investigator can avoid nesting groups within study conditions, the requirement to model those groups as levels of a random effect disappears. The alternative to nesting is crossing, so if it is possible to cross the levels of the grouping factor with study conditions, then the grouping factor becomes a stratification factor and the investigator is free to model the grouping factor as a random effect, as a fixed effect, or to ignore the grouping factor in the analysis. –

For example, if schools are randomized to study conditions, the study is a GRT. But if students within schools are randomized to study conditions, the schools will be crossed with study conditions and we have a stratified RCT; the investigator can model the schools as a random effect, as a fixed effect, or ignore it in the analysis. As another example, if the therapists used to deliver the intervention in an IRGT also deliver an alternative intervention in the control condition, the therapists will be crossed with study condition and the investigator can model therapist as a random effect, as a fixed effect, or ignore therapist in the analysis. In either example, the choice between modeling the grouping factor as random, as fixed, or ignoring it will depend on factors like power and generalizability.

The best estimate for the ICC will reflect the circumstances for the trial being planned. That estimate will be from the same target population, so that it reflects the appropriate groups or clusters (e.g., schools vs. clinics vs. worksites vs. communities); age groups (e.g., youth vs. young adults vs. seniors); ethnic, racial, and gender diversity; and other characteristics of the target population. That estimate will derive from data collected for the same outcome using the same measurement methods to be used for the primary outcome in the trial being planned. For example, if planning a trial to improve servings of fruits and vegetables in inner-city third graders, it would be important to get an ICC estimate for servings of fruits and vegetables, measured in the same way as servings would be measured in the trial being planned, from third-graders in inner-city schools like the schools that would be recruited for the trial being planned.

Regression adjustment for covariates often improves power in a GRT or IRGT by reducing the residual error variance or the ICC (

). At the same time, it is important to remember that regression adjustment for covariates can reduce power in a GRT or IRGT by increasing the ICC ( ). As such, it is important to choose covariates carefully. The best covariates will be related to the outcome and unevenly distributed between the study conditions or among the groups or clusters randomized to the study conditions.No. When *a priori* matching or stratification is used for balance, the matching or stratification factor may be included in the analysis of intervention effects, but that is not required, and it may be inefficient to do so. It is not required because the type 1 error rate is unaffected when the matching or stratification factor is ignored in the analysis of intervention effects ( ;
). Both procedures reduce the df available for the test of the intervention effect, and if the number of df is limited, the unmatched or unstratified analysis may be more powerful than the matched or stratified analysis. In that circumstance, it is to the investigator’s advantage to match or stratify in the design to achieve balance on potential confounders, but to ignore the matching or stratification in the analysis to improve power or reduce sample size ( ). The choice of whether to include the matching or stratification factor in the analysis should be made *a priori* based on sample size calculations comparing the matched or stratified analysis to the unmatched or unstratified analysis.

The choice between *a priori* matching and *a priori* stratification for balance should be guided by whether the investigator anticipates doing analyses that do not involve intervention effects.
have warned against ignoring matching in analyses that do not involve intervention effects, e.g., in an analysis to examine the association between a risk factor and an outcome. Ignoring matching in the analysis in this situation can lead to an inflated type 1 error rate when the correlation between the matching factor and either the outcome or the risk factor is at least modest (>0.2) and the number of members per group is not large (<100). Stratification with strata of size four avoids this problem and improves efficiency almost as much as matching. For this reason, stratification with strata of size four is a prudent strategy for balancing potential confounders across study conditions because it is almost as efficient as matching, and it does not limit the range of analyses that can be applied to the data.

These studies are individually randomized group treatment trials (IRGTs), sometimes called partially clustered designs. While these trials are common, most investigators do not recognize the implications of this design. IRGTs always have a hierarchical structure in the intervention condition. Participants may receive some of their treatment in groups, or they may receive their intervention individually, but through a common interventionist or facilitator, whether in person or through a video or other virtual connection (

). There may or may not be a similar structure in the control condition, depending on the nature of the control condition. Whether it exists in one or both study conditions, the hierarchical structure requires that the positive ICC expected in the data be accounted for in the sample size estimation and in the data analysis. Any analysis that ignores the positive ICC or what may be limited df will have a type 1 error rate that is inflated, ( ; ; ; ; ; ; ; ; ; ).The recommended solution to these challenges is like the solution recommended for GRTs. It is important to employ *a priori* matching or stratification to balance potential confounders if the number of assignment units is limited, to reflect the hierarchical or partially hierarchical structure of the design in the analytic plan, and to estimate the sample size for the IRGT based on realistic and data-based estimates of the ICC and the other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the sample size estimation.

No. In public health and medicine, ICCs in group- or cluster-randomized trials are often small, usually ranging from 0.01–0.05 (

). While it is tempting to ignore such small correlations, doing so risks an inflated type I error rate, and the risk is substantial both in parallel GRTs ( ; ; ; ; ) and in IRGTs ( ; ; ; ; ; ; ; ; ; ). The prudent course is to reflect all nested factors as random effects and to plan the study to have sufficient power given a proper analysis.No. That is another tempting strategy that can risk an inflated type I error rate. The standard error for the variance component is not well estimated when the value is close to zero, and if the df are limited, the power will be limited. As such, it is likely that the result will suggest that the ICC or variance component is negligible, when ignoring it will inflate the type I error rate. The prudent course is to reflect all nested factors as random effects and to plan the study to have sufficient power given a proper analysis.