Use a GRT if you have an intervention that operates at a group level, manipulates the social or physical environment, or simply cannot be delivered to individuals without serious risk of contamination. If you can deliver your intervention to individuals without risk of contamination and can avoid interaction among participants post-randomization, it is more efficient and easier to use a traditional RCT.

# FAQs

Use an IRGT if you can randomize individual participants but need to deliver the intervention in groups or through a common interventionist or facilitator either for pedagogic or practical reasons. If you can randomize individual participants and can deliver the intervention to participants one at a time without going through a common interventionist or facilitator, it is more efficient and easier to use a traditional RCT.

A pragmatic trial is one that helps users choose between options for care. These trials are usually done in the real world, under less well-controlled conditions than more traditional clinical trials. Pragmatic trials can use a traditional RCT design, or they can use a GRT design. Stepped wedge group-randomized trials (SW-GRTs) are also used in pragmatic trials. Of 19 pragmatic trials funded by the Health Care Systems Research Collaboratory at the NIH, two are RCTs, 11 are GRTs, two are IRGTs, and four are SW-GRTs.

There are five published textbooks on the design and analysis of group- or cluster-randomized trials (

; ; ; ; ). A recent textbook is devoted to power and sample size calculation for multilevel designs, including GRTs, IRGTs, and stepped wedge group-randomized trials ( ).There are no textbooks on IRGTs, but there are a number of papers (

; ; ; ). There are also sources for information on power and sample size in IRGTs ( ; ).The most accurate result will be available with t-scores. For studies in which the number of units randomized to conditions is 50 or more, z-scores will work well. As the number of randomization units decreases, the df available for the test of the intervention effect also decrease, and the difference between z-scores and t-scores increases.

Yes. Sometimes investigators randomize months or weeks within clinics to study conditions. As an example, consider a study in which over the course of a year, six months are spent delivering the intervention condition and six months are spent delivering the control condition, with the order randomized within each clinic. The unit of assignment in this case is the time block within the clinic, rather than the clinic itself. Patients receive the intervention or control condition appropriate to the time block when they come to the clinic. While these groups are not structural groups like whole clinics, they are still groups, and this is still a group- or cluster-randomized trial. The key number in this case for power or sample size calculations is the number of time blocks, not the number of clinics.

It is important to distinguish between changing study conditions or study arms and changing groups or clusters. In a parallel group RCT, GRT, or IRGT, it is important to ensure that each participant remains in the study condition to which they were randomized. Those assigned to the intervention condition should not move to the control condition, and vice versa. Sometimes that is unavoidable, but it should be uncommon. If it does happen, standard practice is to analyze as randomized, under the intention-to-treat principle.

The other possibility is that a participant in a GRT or IRGT would change groups or clusters even as they stay in the same study condition or study arm. In a school-based trial, a participant from one intervention school might move to another intervention school. Or in an IRGT, a participant who usually went to the Tuesday night class might sometimes go to the Saturday morning class. Recent studies have shown that failure to account for changing group membership can result in an inflated type I error rate (

). provide a good method for analyzing data to account for such changes.Standard sources assume that each group or cluster has the same number of observations, but that is almost never true in practice. So long as the ratio of the largest to the smallest group is no worse than about 2:1, such variation can be ignored. But as the variation grows more marked, analysts risk an inflated type I error rate if they ignore it (

).In addition, power falls as the variation in group or cluster size increases, so that it needs to be addressed in the sample size calculations. There are a number of publications on this issue for GRTs (

; ; ; ; ). There are also a number of publications on this issue for IRGTs ( ; ).We have known for some time that the magnitude of the ICC is inversely related to the level of aggregation ( Pragmatic and Group-Randomized Trials in Public Health and Medicine Course.

). The smaller the level of aggregation, the larger the ICC. Spouse pairs and family units are small clusters, so their ICCs are often large. Moving to larger aggregates, like worksites or schools, the ICCs are usually smaller. Moving to even larger aggregates, like communities, the ICCs are usually even smaller. Fortunately, the ICC is not the only factor that determines sample size in a GRT. The variance inflation factor is defined as (1+(m-1)ICC) where m is the average number of observations in the groups randomized in the study. In a spouse pair, m=2, so that the formula is reduced to 1+ICC, and the VIF will be less than 2. If a school study, the ICC may be much smaller, e.g., 0.05, but the number of observations may be much larger, e.g., 400, and the VIF=1+(400-1)0.05=20.95, which will have a much more deleterious effect on the power of the study. It is important to account for the ICC, but also the average number of observations expected in each group randomized to the study conditions, as well as the number of groups randomized, as that dictates the df available for the test of the intervention effect. These issues are discussed in Part 4 of theThis will depend on several factors, including the expected ICC that reflects the average correlation among observations taken on members of the same small group, and the number of participants in each group. There is no one answer that will be correct for all studies. The best approach is to perform a power analysis, or calculate sample size, using methods appropriate to IRGTs. There are a number of papers that address this question (

; ; ).It is true that this approach will improve the fidelity of implementation. But the problem is that this approach completely confounds the instructor/facilitator with the study condition: everyone who gets the intervention gets exposed to that instructor/facilitator. In that situation, it is impossible to separate the effect of the intervention from the effect of the instructor/facilitator. It is possible that a charismatic instructor/facilitator could generate beneficial effects on the outcome of interest, even if the intervention itself is completely ineffective, and the investigator would not be able to distinguish those two effects.

The best estimate for the ICC will reflect the circumstances for the trial being planned. That estimate will be from the same target population, so that it reflects the appropriate groups or clusters (e.g., schools vs. clinics vs. worksites vs. communities); age groups (e.g., youth vs. young adults vs. seniors); ethnic, racial, and gender diversity; and other characteristics of the target population. That estimate will derive from data collected for the same outcome using the same measurement methods to be used for the primary outcome in the trial being planned. For example, if planning a trial to improve servings of fruits and vegetables in inner-city third-graders, it would be important to get an ICC estimate for servings of fruits and vegetables, measured in the same way as servings would be measured in the trial being planned, from third-graders in inner-city schools similar to the schools that would be recruited for the trial being planned.

Regression adjustment for covariates often improves power in a GRT or IRGT by reducing the residual error variance or the ICC (

). At the same time, it is important to remember that regression adjustment for covariates can reduce power in a GRT or IRGT by increasing the ICC ( ). As such, it is important to choose covariates carefully. The best covariates will be related to the outcome and unevenly distributed between the study conditions or among the groups or clusters randomized to the study conditions.*A priori* matching can improve power in a GRT, but it can also reduce power, so investigators need to be thoughtful about *a priori* matching in their design and analysis. *A priori* matching reduces the df for the test of the intervention effect by half, and if the correlation between the matching factor and the outcome is not large enough to overcome the loss of df, power will be reduced in the matched analysis compared to the unmatched analysis.

*A priori* matching is often used to balance potential confounders, and it is then up to the investigator to decide whether to reflect that *a priori* matching in the analysis. It is not required, because the type 1 error rate is unaffected when the matching or stratification factor is ignored in the analysis of intervention effects ( ;
). However, have warned against ignoring matching in analyses that do not involve intervention effects, e.g., in an analysis to examine the association between a risk factor and an outcome. Ignoring matching in the analysis in this situation can lead to an inflated type 1 error rate when the correlation between the matching factor and either the outcome or the risk factor is at least modest (>0.2) and the number of members per group is not large (<100). Stratification with strata of size four avoids this problem, and improves efficiency almost as much as matching. For this reason, stratification with strata of size four is a prudent strategy for balancing potential confounders across study conditions or study arms.

*A priori* stratification can also improve power in a GRT, but the situation is more complicated, because it depends on how the stratification is reflected in the analysis. As with *a priori* matching, *a priori* stratification can be used to balance potential confounders, and it is then up to the investigator to decide whether and how to reflect that *a priori* stratification in the analysis.

If the primary interest is to balance on potential confounders, the stratification factor could be included in the analysis as a covariate, but without creating interactions with study condition or other factors. To the extent that the stratification factor is related to the outcome, there is likely to be benefit to power, because the gain from the regression adjustment is likely to outweigh any reduction due to lost df.

If the primary interest is differential intervention effects, the stratification factor is included in the analysis as a main effect, but additional interaction terms are required, both for fixed and random effects. The number and nature of the additional fixed and random effects will depend on the design and analytic plan (

; ). Inclusion of the correct fixed and random effects is essential to a valid analysis, so investigators are strongly encouraged to work with a methodologist familiar with stratified designs to ensure that the analysis is structured correctly. With regard to power, detection of differential intervention effects will always require a larger study than detection of uniform intervention effects.Some have suggested that 4 groups or clusters per study condition should be considered as an absolute minimum (

). Investigators should be cautious about such rules of thumb because it is quite possible that 4 groups or clusters per study condition would result in a badly underpowered trial. ICCs in public health and medicine often fall in the range of 0.01–0.05, and if the ICC does fall in that range, 8–12 groups or clusters will often be needed in each study condition. The best advice is to estimate sample size requirements for the trial under consideration, using the best parameter estimates available.There is no general answer to this question. Instead, investigators should estimate sample size requirements for the trial under consideration, using the best parameter estimates available. At the same time, it is fair to say that increasing the number of groups or clusters per condition will more effectively increase power than will increasing the number of members per group or cluster.

No. When *a priori* matching or stratification is used for balance, the matching or stratification factor may be included in the analysis of intervention effects, but that is not required, and it may be inefficient to do so. It is not required because the type 1 error rate is unaffected when the matching or stratification factor is ignored in the analysis of intervention effects ( ;
). Both procedures reduce the df available for the test of the intervention effect, and if the number of df is limited, the unmatched or unstratified analysis may be more powerful than the matched or stratified analysis. In that circumstance, it is to the investigator’s advantage to match or stratify in the design to achieve balance on potential confounders, but to ignore the matching or stratification in the analysis to improve power or reduce sample size ( ). The choice of whether to include the matching or stratification factor in the analysis should be made *a priori* based on sample size calculations comparing the matched or stratified analysis to the unmatched or unstratified analysis.

The choice between *a priori* matching and *a priori* stratification for balance should be guided by whether the investigator anticipates doing analyses that do not involve intervention effects. have warned against ignoring matching in analyses that do not involve intervention effects, e.g., in an analysis to examine the association between a risk factor and an outcome. Ignoring matching in the analysis in this situation can lead to an inflated type 1 error rate when the correlation between the matching factor and either the outcome or the risk factor is at least modest (>0.2) and the number of members per group is not large (<100). Stratification with strata of size four avoids this problem, and improves efficiency almost as much as matching. For this reason, stratification with strata of size four is a prudent strategy for balancing potential confounders across study conditions because it is almost as efficient as matching and it does not limit the range of analyses that can be applied to the data.

These studies are individually randomized group treatment trials (IRGTs), sometimes called partially clustered designs. While these trials are common, most investigators do not recognize the implications of this design. IRGTs always have a hierarchical structure in the intervention condition. Participants may receive some of their treatment in groups, or they may receive their intervention individually, but through a common interventionist or facilitator, whether in person or through a video or other virtual connection (

). There may or may not be a similar structure in the control condition, depending on the nature of the control condition. Whether it exists in one or both study conditions, the hierarchical structure requires that the positive ICC expected in the data be accounted for in the sample size estimation and in the data analysis. Any analysis that ignores the positive ICC or what may be limited df will have a type 1 error rate that is inflated, often badly ( ; ; ; ; ; ; ; ; ; ).The recommended solution to these challenges is similar to the solution recommended for GRTs. It is important to employ *a priori* matching or stratification to balance potential confounders if the number of assignment units is limited, to reflect the hierarchical or partially hierarchical structure of the design in the analytic plan, and to estimate the sample size for the IRGT based on realistic and data-based estimates of the ICC and the other parameters indicated by the analytic plan. Extra variation and limited df always reduce power, so it is essential to consider these factors while the study is being planned, and particularly as part of the sample size estimation.

No. In public health and medicine, ICCs in group- or cluster-randomized trials are often small, usually ranging from 0.01–0.05 (

). While it is tempting to ignore such small correlations, doing so risks an inflated type I error rate, and the risk is substantial both in GRTs ( ; ; ; ; ) and in IRGTs ( ; ; ; ; ; ; ; ; ; ). The prudent course is to reflect all nested factors as random effects and to plan the study to have sufficient power given a proper analysis.No. That is another tempting strategy that can risk an inflated type I error rate. The standard error for the variance component is not well estimated when the value is close to zero, and if the df are limited, the power will be limited. As such, it is likely that the result will suggest that the ICC or variance component is negligible, when ignoring it will inflate the type I error rate. The prudent course is to reflect all nested factors as random effects and to plan the study to have sufficient power given a proper analysis.