METHODS: We performed a cross-sectional study involving data from 46 participating practices with 106 physicians collected using self-administered questionnaires and a chart audit of 100 randomly selected charts per practice. The population was health service organizations (HSOs) located in Southern Ontario. We analyzed performance data for 13 preventive maneuvers determined by chart review and used analysis of variance to determine the intraclass correlation coefficient. An index of “up-to-datedness” was computed for each physician and practice as the number of a recommended preventive measures done divided by the number of eligible patients. An index called “inappropriatness” was computed in the same manner for the not-recommended measures. The intraclass correlation coefficients for the 2 key study outcomes (up-to-datedness and inappropriateness) were also calculated and compared.
RESULTS: The mean up-to-datedness score for the practices was 53.5% (95% confidence interval [CI], 51.0%-56.0%), and the mean inappropriateness score was 21.5% (95% CI, 18.1%-24.9%). The intraclass correlation for up-to-datedness was 0.0365 compared with inappropriateness at 0.1790. The intraclass correlation for preventive maneuvers ranged from 0.005 for blood pressure measurement to 0.66 for chest radiographs of smokers, and as a consequence required that the sample size ranged from 20 to 42 physicians per group.
CONCLUSIONS: Randomizing by practice clusters and analyzing at the level of the physician has important implications for sample size requirements. Larger intraclass correlations indicate interdependence among the physicians within a cluster; as a consequence, variability within clusters is reduced and the required sample size increased. The key finding that many potential outcome measures perform differently in terms of the intracluster correlation reinforces the need for researchers to carefully consider the selection of outcome measures and adjust sample sizes accordingly when the unit of analysis and randomization are not the same.
In conducting research with community-based primary care practices it is often not feasible to randomize individual physicians to the treatment conditions. This is due to problems of potential contamination between intervention and control subjects in the same practice setting or because the success of the intervention demands that all physicians in the practice setting adhere to a guideline. As a result, the practice itself is randomized to the conditions.
The randomization of physicians in groups, rather than each individual separately, has important consequences for sample size, interpretation, and analysis.1-3 It is argued that groups of physicians are likely to be heterogeneous,4 giving rise to a component of variation that one must take into account in the analysis and that one can control only by studying many groups of physicians rather than many physicians within each group.4
Randomizing physicians by cluster and then analyzing the data by physician or patient has the potential to introduce possible bias in the results. It has been noted that many studies randomized groups of health professionals (cluster randomization) but analyzed the results by physician, thus resulting in a possible overestimation of the significance of the observed effects (unit of analysis error).5 Divine and colleagues6 observed that 38 out of 54 studies of physicians’ patient care practices had not appropriately accounted for the clustered nature of the study data. Similarly, Simpson and coworkers7 found that only 4 out of 21 primary prevention trials included sample size calculations or discussions of power that allowed for clustering, while 12 out of 21 took clustering into account in the statistical analysis. When the effect size of the intervention is small to moderate, analyzing results by individual without adjusting for the cluster phenomena can lead to false conclusions about the significance of the effectiveness of the intervention. For example, Donner and Klar8 show that for the data of Murray and colleagues9 the P value would be .03 if the effect of clustering were ignored, while it was greater than .1 after adjusting for the effect of clustering.
Using baseline data from a successful randomized controlled trial of primary care practices in Southern Ontario, Canada,10 we will explain the intracluster correlation coefficient (ICC) in determining the required sample size of physicians. The ICC is a measure of variation within and between clusters of physicians. It is a measure of the clustering effect or the lack of independence among the physicians that make up the cluster. The smaller the ICC, the more likely the physicians in the cluster behave independently, and analysis at the level of the physician can proceed without significant adjustment to sample size. The higher the ICC, the more closely the measure quantifies class or group rather than the individual physician, and the effective sample size is decreased to the number of classes rather than the number of individuals. Our objective was to provide information on the cluster effect of measuring the performance of various preventive maneuvers between groups of physicians to enable other researchers in the area of primary care prevention to avoid errors.