User login
Original research
Susan D. Mathias MPH
Abstract
The Brief Pain Inventory–Short Form (BPI-SF) is widely used for assessing pain in clinical and research studies. The worst pain rating is often the primary outcome of interest; yet, no published data are available on its minimally important difference (MID). Breast cancer patients with bone metastases enrolled in a randomized, double-blind, phase III study comparing denosumab with zoledronic acid for preventing skeletal related events and completed the BPI-SF, FACT-B, and EQ-5D at baseline, week 5, and monthly through the end of the study. Anchor- and distribution-based MID estimates were computed. Data from 1,564 patients were available. Spearman correlation coefficients for anchors ranged from 0.33–0.65. Mean change scores for worst pain ratings corresponding to one-category improvement in each anchor were 0.26–1.04 for BPI-SF current pain, −1.40 to −2.42 for EQ-5D Index score, 1.71–1.98 for EQ-5D Pain item, −2.22 to −0.51 for FACT-B TOI, −1.61 to −0.16 for FACT-G Physical, and −1.31 to −0.12 for FACT-G total. Distribution-based results were 1 SEM = 1.6, 0.5 effect size = 1.4, and Guyatt's statistic = 1.4. Combining anchor- and distribution-based results yielded a two-point MID estimate. An MID estimate of two points is useful for interpreting how much change in worst pain is considered clinically meaningful.
Article Outline
- Methods
- Study Design
- Outcome Measures and Assessment Intervals
- Anchor-Based Analysis
- Distribution-Based Analysis
- Integrating Anchor-Based and Distribution-Based Mid Estimates
The MID may be estimated through distribution-based methods and/or anchor-based methods. Distribution-based methods are based on the distribution of the data. Examples of distribution-based methods include effect size measures, the standard error of measurement (SEM), one-half times the standard deviation, and the responsiveness index.[2] and [3] Anchor-based methods are based on the association between the PRO measure and an interpretable external measure, such as a global rating of change or a response to treatment. These methods may result in somewhat different estimates, and no particular estimate is considered the most valid.[2], [3] and [4] Therefore, researchers are encouraged to use more than one method and to present a range of MID estimates.
A frequently used PRO measure for the assessment of pain is the Brief Pain Inventory–Short Form (BPI-SF). The foundation of the BPI-SF is the Wisconsin Brief Pain Questionnaire, which was developed over 25 years ago based on interviews with cancer patients, expert opinion, and then-current psychometric standards.5 Over time, the Wisconsin Brief Pain Questionnaire evolved into the Brief Pain Inventory, which was later reduced to a shorter version, the BPI-SF. Today, the BPI-SF is the standard for clinical and research use. It has been used in over 400 studies, including psychometric evaluations and clinical applications with a wide range of conditions (e.g., cancer pain, fibromyalgia, neuropathic pain, and joint diseases).6
The BPI-SF includes two domains: pain severity and pain interference. The pain severity domain, the focus of this report, includes items specific to pain at “worst,” “least,” “average,” and “now” (current pain), with a numerical response scale ranging from 0 (no pain) to 10 (pain as bad as you can imagine). In clinical trials, the worst pain item has been used alone as a measure of pain severity.6 Its use as a single item is supported by a consensus panel on outcome measures for chronic pain clinical trials.7 In addition, the Food and Drug Administration's (FDA) guidance on PROs states that a single-item PRO measure of pain severity is appropriate for assessing the effect of a treatment on pain.8 Although extensive psychometric evaluation of the BPI-SF has been conducted, no estimates of the MID are available for the BPI-SF worst pain item. Establishing the MID for the BPI-SF worst pain item is important because it will provide a clinically relevant reference to interpret changes in pain scores. Therefore, the objective of this current report was to estimate the MID of the worst pain item of the BPI-SF.
Methods
Study Design
Patients with advanced breast cancer and bone metastases were enrolled in an international, randomized, double-blind, double-dummy, active-controlled phase III study comparing denosumab with zoledronic acid for delaying or preventing skeletal related events. Patients were eligible to participate if they had histologically or cytologically confirmed breast adenocarcinoma; current or prior radiologic, computed tomography, or magnetic resonance imaging evidence of at least one bone metastasis; and an Eastern Cooperative Oncology Group (ECOG) performance status of 0, 1, or 2. Patients with current or prior intravenous bisphosphonate administration were excluded. Patients completed PRO assessments, including the BPI-SF, at baseline, week 5, and every 4 weeks thereafter until the end of the study. Assessments were scheduled to take place prior to any study procedures and prior to study drug administration. Although data collection continued, PRO analyses for efficacy were truncated when approximately 30% of patients dropped out of the study due to death, disease progression, or withdrawn consent.
Outcome Measures and Assessment Intervals
A number of outcome measures were assessed in the study and considered for use as anchors for evaluating the MID of the BPI-SF worst pain item, including one clinician-reported measure (ECOG Performance Status) and several PRO measures: the EuroQoL 5 Dimensions (EQ-5D) Index score, the Functional Assessment of Cancer Therapy-Breast Cancer (FACT-B), and the BPI-SF current pain rating.
The ECOG Performance Status, which assesses how a patient's disease or its treatment is progressing and how the disease affects the daily living abilities of the patient, is a single-item, six-point, clinician-rated assessment of performance ranging from 0 (fully active, no restrictions) to 5 (dead).9 The EQ-5D Index score is a measure of health status, which assesses five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension is comprised of three response options: no problems, some/moderate problems, and extreme problems. Responses are converted to a weighted health state index, with scores ranging from −0.594 (worst health) to 1.0 (full health). The single item on pain from the EQ-5D was also evaluated separately as an anchor. The FACT-B includes the four core FACT-General (FACT-G) dimensions of physical well-being, social/family well-being, emotional well-being, and functional well-being, for which scale scores and a total score can be computed. In addition, the FACT-B includes a breast cancer–specific subscale.10 The FACT-B Trial Outcome Index (TOI) is the sum of the physical well-being score, the functional well-being score, and the breast cancer subscale. The four FACT-G scale scores, the FACT-G total score, the FACT-B TOI, and a single-item overall quality-of-life (QOL) rating from the functional well-being section were all evaluated as potential anchors. The single-item overall QOL item from the functional well-being scale was selected to balance out the single item on pain that was selected from the EQ-5D, by serving as a more general potential anchor in breadth and scope. For all of these FACT outcome measures, a higher score indicates better health-related QOL. Finally, the current pain rating from the BPI-SF, ranging from 0 (no pain) to 10 (pain as bad as you can imagine), was also considered as an anchor because it was hypothesized to be highly correlated with the worst pain rating and because it would assist in understanding the behavior of other potential anchors.
Several assessment intervals were considered for evaluation of the MID for the BPI-SF worst pain item: baseline to week 5, baseline to week 13, and baseline to week 25. The analysis for each time interval included only those patients with complete baseline and end-of-interval (i.e., week 5, week 13, or week 25) assessments on the BPI-SF worst pain item and the relevant anchor of interest. In addition, a post hoc confirmatory analysis was conducted using a longer interval of time, from baseline to week 49. No imputation of missing data was performed. Analysis was performed on pooled data, regardless of treatment assignment.
Anchor-Based Analysis
The usefulness of an anchor depends on the correlation of the PRO change score and the anchor.11 Therefore, to select the most appropriate anchors and time interval for estimating the MID for the BPI-SF worst pain item, Spearman correlation coefficients were calculated between changes in the BPI-SF worst pain rating and changes in potential anchors across each of the potential time intervals. The time interval with the highest correlations and the anchors with statistically significant (P < 0.05) a priori specified correlations above 0.30 were selected for inclusion in the MID analysis.12
A one-category change was defined as a one-point change for the BPI-SF current pain item, a one-point change for the EQ-5D pain item, a three-point change for the FACT-G Physical Well-Being scale,13 a six-point change for the FACT-G total and FACT-B TOI scores,14 and a 0.20 change for the EQ-5D Index score. For the selected interval and anchors, the mean change in BPI-SF worst pain item that corresponds to a one-category increase and decrease in each anchor was calculated. In addition, ordinary least squares regression models were used to regress changes in BPI-SF worst pain ratings on changes of each of the anchors.[15] and [16] The regression models included main effects for change in each anchor and an interaction term expressing the change in anchor-by-baseline anchor.
Distribution-Based Analysis
The following distribution-based measures were calculated for the BPI-SF worst pain item: (1) the SEM, (2) effect size (Cohen's d), and (3) Guyatt's statistic. The SEM is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. While the standard deviation and the reliability of a measure are sample-dependent, their relationship (and hence the SEM) remains relatively constant across samples. Therefore, the SEM is considered to be an attribute of the measure and not a characteristic of the sample per se.17 Threshold values of 1 SEM have been suggested for defining clinically meaningful differences.18 The reliability coefficient was estimated for the BPI-SF worst pain item by calculating the intraclass correlation coefficients (ICCs) using two intervals of time. One used 7 days (days 1–8), a more typical interval for assessing reproducibility, while the other approach used a later interval, from week 105 to week 109. (Note: The 1-month interval was dictated by the schedule of assessments.) For both ICC values, only those patients whose FACT-B overall QOL ratings changed by 10% or less during the respective intervals were included. The 10% criterion was selected after reviewing the full distribution of change scores and their associated sample sizes, to arrive at a reasonable sample size of approximately 100 subjects.
Cohen's d, alternatively referred to as the “standardized effect size,” is calculated by dividing the difference between the baseline and week-25 scores by the standard deviation at baseline.19 The effect size represents individual change in terms of the number of baseline standard deviations. A value of 0.20 is a small effect, 0.50 is a medium effect, and 0.80 is a large effect. Effect sizes of 0.20, 0.50, and 0.80 were calculated in this study.
Guyatt's statistic, also referred to as the “responsiveness statistic,” is calculated by dividing the difference between baseline and week-25 change by the standard deviation of change observed for a group of stable patients.20 The denominator of the responsiveness statistics adjusts for spurious change due to measurement error. Values of 0.20 and 0.50 have been used to represent “small” and “medium” changes, respectively.21 Values representing 0.20 and 0.50 were calculated in this study. Stable patients were defined as those whose ECOG Performance rating did not change during the assessment interval. A different variable was used in defining the stable population for purposes of calculating the SEM and Guyatt's statistic because both variables were not consistently collected on the same schedule of assessments.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The minimal detectable change (MDC) for the worst pain item was established by comparing distribution-based estimates. The MDC represents the smallest change that can be reliably distinguished from random fluctuation and, thus, the lower bound for establishing the MID.11 If the MID were lower than the MDC, then the instrument would not be capable of distinguishing the MID. The SEM was considered the primary distribution-based estimate because it takes into account the reliability of the measure and, thus, estimates the precision of the instrument.11 Other distribution-based measures were also considered in establishing the MDC. Standardized effect size was considered a secondary distribution-based estimate because of its reliance on interperson variability, which is generally higher and less consistent than intraperson variability. Anchor-based estimates of the MID range were then compared. A final MID range was established that is greater than the MDC and integrates estimates from the various anchors.
Results
Patient Population
Demographic and clinical characteristics for patients included in the baseline to week 25 interval are presented in Table 1. Data from 1,564 of 2,049 patients who participated in the study and had valid (i.e., nonmissing) baseline and end-of-interval scores for the BPI-SF and anchors were used in these analyses. Patients were predominantly female with an average age of 57.2 ± 11.2 years. The majority of patients were white (80.9%). Average pain scores at baseline were 2.45 ± 2.51, with a full range of scores (0–10) being used. Clinical results from the study have been presented previously.22
CHARACTERISTIC, n (%) | STUDY SAMPLE (n = 1,564) |
---|---|
Gender | |
Female | 1,550 (99.1) |
Male | 14 (0.9) |
Age, mean years ± SD (range) | 57.2 ± 11.2 (27.1–91.2) |
Race | |
White | 1,265 (80.9) |
Black | 38 (2.4) |
Hispanic | 92 (5.9) |
Japanese | 119 (7.6) |
Asian | 28 (1.8) |
Other | 22 (1.4) |
Demographic characteristics including the breakdown by gender, age, and race for the study sample are shown.
Anchor-Based Analysis
Spearman correlations between changes in the BPI-SF worst pain item and changes in potential anchors are presented in Table 2. For all potential anchors, the highest correlations with the BPI-SF worst pain rating were obtained at the baseline to week 25 interval. All potential anchors correlated significantly (P < 0.001) with the BPI-SF worst pain rating with the exception of the FACT-G Social/Family Well-Being scale. However, correlations were low (<0.30) for several potential anchors: ECOG Performance Status, FACT-B Overall QOL item, FACT-G Emotional Well-Being, and FACT-G Functional Well-Being. Therefore, the week 25 interval and the following anchors were selected for the MID analysis: BPI-SF current pain rating, EQ-5D Index score, EQ-5D Pain item, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total score. Correlation coefficients between the changes in the selected anchors and changes in the BPI-SF worst pain ratings range from 0.329–0.647.
Bolded correlations represent the highest correlations with anchors where correlation r ≥ 0.300.
Spearman correlation coefficients between changes in BPI-SF worst pain rating and changes in each of the 11 potential anchors that were considered are provided. The data are displayed for three intervals of time including baseline to week 5, baseline to week 13, and baseline to week 25. Using a cut point of r ≥ 0.300, only those correlations that are bolded meet the criteria of acceptability.
Mean changes in the BPI-SF worst pain rating that correspond to a one-category change in anchors from baseline to week 25 are presented in Table 3. BPI-SF current pain ratings >5 and EQ-5D Index scores <0.40 were excluded from their respective analysis due to small sample sizes. A one-category increase in the anchor scores was associated with an absolute value of change in the BPI-SF worst pain item ranging from 0.26–2.42. A one-category decrease in the anchor score was associated with an absolute value of change in the BPI-SF worst pain item ranging from 0.56–3.16. Changes associated with improvement and worsening in anchors were not symmetrical, nor was there a consistent trend across anchors. For example, for the EQ-5D pain item, the magnitude of change in BPI-SF worst pain was greater for a one-category increase in the anchor than for a one-category decrease in the anchor. In contrast, for the EQ-5D Index score, the magnitude of change in BPI-SF worst pain was greater for a one-category decrease in the anchor than for a one-category increase in the anchor.
ANCHOR | ONE CATEGORYA INCREASE IN ANCHOR | ONE CATEGORY DECREASE IN ANCHOR |
---|---|---|
BPI-SF Current Pain rating | 0.26–1.04 | −0.89 to −1.66 |
EQ 5D Index score | −2.42 to −1.40 | 0.56–1.63 |
EQ 5D Pain item | 1.71–1.98 | −3.16 to −2.56 |
FACT-B TOI | −2.22 to −0.51 | −0.56 to 0.77 |
FACT-G Physical Well-Being | −1.61 to −0.16 | −0.79 to 0.46 |
FACT-G total | −1.31 to −0.12 | −0.97 to 0.57 |
The range of mean changes in BPI-SF worst pain ratings (using the interval from baseline to week 25) for the six anchors that met the correlation criteria in Table 2 are provided. Mean changes are displayed for one-category increases and one-category decreases in anchor.
a One category (increase or decrease) represents 0.20 points for EQ-5D Index score, one point for BPI-SF current pain rating and EQ-5D pain item, three points for FACT-G Physical Well-Being, and six points for FACT-G total and FACT-B TOI.
The regression of changes in anchors on changes in the BPI-SF worst pain item is shown in Table 4. Changes in each anchor are significantly (P < 0.05) associated with changes in BPI-SF worst pain rating. A one-point increase in BPI-SF current pain rating and EQ-5D Pain item is associated with a 0.817 and 1.805 increase in BPI-SF worst pain, respectively, while a one-point increase in EQ-5D Index score, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total is associated with a 3.548, 0.098, 0.163, and 0.048 decrease in BPI-SF worst pain rating, respectively. Likewise, a two-point increase in BPI-SF current pain rating and EQ-5D Pain item is associated with a 1.634 and 3.610 increase in BPI-SF worst pain, respectively, while a two-point increase in EQ-5D Index score, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total is associated with a 7.096, 0.196, 0.326, and 0.096 decrease in BPI-SF worst pain rating, respectively. The change in anchor-by-baseline anchor interaction was statistically significant only for BPI current pain and FACT-G Physical Well-Being. The interaction tests whether the anchor–BPI-SF slope differs as a function of baseline anchor score; therefore, a lack of significance suggests that the association between BPI-SF worst pain and other anchors does not differ by baseline anchor rating.
VARIABLE | PREDICTOR | b | β | SIG. |
---|---|---|---|---|
Change in BPI current pain | Main effect | 0.817 | 0.724 | <0.001 |
Interaction with baseline anchor | −0.024 | −0.107 | 0.001 | |
Change in EQ-5D Health State Index | Main effect | −3.548 | −0.349 | <0.001 |
Interaction with baseline anchor | 0.220 | 0.021 | 0.465 | |
Change in EQ-5D Pain item | Main effect | 1.805 | 0.352 | <0.001 |
Interaction with baseline anchor | 0.207 | 0.080 | 0.261 | |
Change in FACT-B TOI | Main effect | −0.098 | −0.406 | <0.001 |
Interaction with baseline anchor | 0.000 | 0.028 | 0.756 | |
Change in FACT-G Physical Well-Being | Main effect | −0.163 | −0.321 | <0.001 |
Interaction with baseline anchor | −0.004 | −0.133 | 0.024 | |
Change in FACT-G total score | Main effect | −0.048 | −0.231 | 0.025 |
Interaction with baseline anchor | 0.000 | −0.130 | 0.209 |
b, regression coefficient; β, standardized regression coefficient; Sig., significance level.
Possible ranges: BPI Pain Right Now 0 (least) to 10 (most), EQ-5D Health State Index scores −0.594 (worst) to 1.00 (best), EQ-5D Pain item scores 1 (none) to 3 (severe), FACT-B TOI scores 4 (worst) to 92 (best), FACT-G Physical Well-Being scores 0 (worst) to 28 (best), FACT-G total score 8 (worst) to 108 (best), BPI Worst Pain item 0 (least) to 10 (most).
Changes in all anchors are significantly (P < 0.05) associated with changes in BPI-SF worst pain ratings. A one-point increase in BPI-SF current pain rating and EQ-5D pain item is associated with increases (positive b score) in the BPI-SF worst pain rating, and a one-point increase in EQ-5D Index, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total scores is associated with decreases (negative b score) in the BPI-SF worst pain ratings. The change in anchor-by-baseline anchor interaction was statistically significant only for the BPI current pain and FACT-G PWB items.
A post hoc confirmatory analysis was done replicating these analyses using data from the baseline to week 49 interval (n = 1,250). Results indicate a slightly stronger correlation between the anchors and the change scores. (Spearman's correlations range from 0.372 for FACT-TOI to 0.644 for BPI-SF current pain rating.) Mean change scores of BPI-SF worst pain ratings by each of the six anchors and regression coefficients were similar to those for the baseline to week 25 interval. For instance, mean change scores for the EQ-5D Pain item for stable patients ranged from 0.25–0.56, 1.58–295 for an improvement of one category, and 1.75–2.80 for a worsening of one category compared with 0.50–0.51, 1.71–1.98, and 2.56–3.16, respectively, for the baseline to week 25 interval.
Distribution-Based Analysis
The distribution-based estimates for the BPI-SF worst pain rating are presented in Table 5. There appears to be consistency with the 1 SEM estimates, the 0.50 effect size, and the 0.50 Guyatt's statistic.
The results from the three distribution-based approaches presented in this table will be combined with those of the anchor-based results to estimate the MID.
a The standard error of measurement is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. Intraclass correlation coefficients (ICCs) for BPI-SF worst pain rating from day 1 to day 8 and week 105 to week 109 in patients whose FACT-B overall QOL ratings change by <10% are 0.685 (n = 926) and 0.800 (n = 109), respectively.b Alternatively referred to as Cohen's d, the effect size is calculated by dividing the difference between the pretest and posttest scores by the standard deviation at pretest. The standard deviation of BPI-SF worst pain rating at baseline (n = 1,877) is 2.849.c Alternatively referred to as the responsiveness statistic, Guyatt's statistic is calculated by dividing the difference between pretest and posttest changes by the standard deviation of change observed for a group of stable patients. The standard deviation of change in BPI-SF worst pain rating from baseline to week 25 in patients whose ECOG performance rating does not change (n = 1,120) is 2.833.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The distribution-based analyses suggest that the MDC for the worst pain rating, defined as the smallest change that can be reliably differentiated from random fluctuation, is between 1.3 and 1.6 points (see Table 5). This represents the lower bound for establishing the MID.
The results from regression analyses can be used to translate changes between anchors and corresponding changes in BPI-SF worst pain. This strategy can be particularly informative when the MID for an anchor is known. This is the case for the EQ-5D Health State Index, where the MID has been estimated at 0.06 for U.S. Index scores and 0.07 for U.K. Index scores.23 A one-point change in EQ-5D Index translates to a change of −3.548 in BPI-SF worst pain, so a 0.07-point change in EQ-5D Index (the MID for the measure) corresponds to a change of −0.248 in BPI-SF worst pain. In contrast, a one-point change in BPI-SF worst pain (which is smaller than the MID based upon the distribution-based analyses) translates to a change of 0.036 for the EQ-5D Index score (considerably smaller than the MID of 0.07). However, a two-point change in BPI-SF worst pain rating corresponds to a 0.072 change in EQ-5D Index score, which is almost identical to the MID for that measure. This suggests that a two-point change may be a reasonable estimate for the MID of the BPI-SF worst pain rating.
Discussion
Data from both distribution-based and anchor-based approaches were used to develop estimates of the MID for the BPI-SF worst pain rating. Results from these approaches are similar, providing reasonably strong support for establishing a two-point MID for the BPI-SF worst pain rating. Further, the results suggest that this estimate of MID is, for the most part, independent of baseline BPI-SF worst pain ratings. However, there is some evidence to suggest that the direction of change (improvement or worsening) may be important to consider. A number of reports have suggested that a smaller change may be required to be considered clinically important when a patient is improving compared with worsening.13 Also, when considered as a percentage, a one-point change in any scale has a different value for an increase versus a decrease; eg, a change from 2 to 3 is an increase of 50%, while a change from 3 to 2 is a decrease of 33%. Nonetheless, these findings provide important information to researchers for interpreting changes in the BPI-SF worst pain ratings.
In addition, although not specific to the BPI worst pain rating, the findings of this study are consistent with other published MID analyses for a similar item. A recent review of three studies concluded that, for a numerical rating scale of pain intensity ranging 0–10 similar in content to the BPI-SF worst pain rating, changes of around two points represent “meaningful,” “much better,” or “much improved” reductions in chronic pain.24
Several factors contribute to the overall strength of the current results. First, as frequently recommended in the literature,11 both anchor-based and distribution-based methods were used to estimate the MID for the worst pain rating. Second, analyses were based on a large sample, totaling over 1,500 patients for the baseline to week 25 assessment interval. A larger sample size will generally provide a broader distribution of responses, which will likely increase the generalizability of the results. Third, multiple anchors were used to evaluate changes in BPI-SF worst pain ratings. Fourth, analyses were performed across several assessment intervals to determine the strongest relationship between BPI-SF ratings and other anchors. Finally, the regression analyses provide important information about whether baseline differences influence the relationship between BPI-SF and other PRO measures.
Nevertheless, these analyses are not without certain limitations. The sample for the current analyses consisted entirely of breast cancer patients. It is unclear to what extent these results will be relevant for other patient populations. Further research is needed to determine whether the MID for the BPI-SF worst pain rating established in this sample has broader applicability. Also, it must be noted that the recall period varied across assessments. The BPI-SF focuses on the past 24 hours, the FACT uses the past week, and the EQ-5D uses the present moment. It is unclear to what extent these differences in recall periods may have influenced the current results. Finally, the baseline to week 25 interval was used to determine the MID for the BPI-SF worst pain rating based on the higher correlations for this interval. Data from baseline to week 49 are consistent with these results, providing some confirmatory evidence to suggest that these MID estimates are stable.
In conclusion, the findings of the present analyses suggest that the MID estimate for the BPI-SF worst pain rating is two points. This value provides guidance to researchers using the BPI-SF worst pain rating on how to interpret baseline differences as well as change scores in the BPI-SF worst pain rating. Additional analyses could be done in other populations to confirm these findings.
References1
1 K.W. Wyrwich, M. Bullinger and N. Aaronson et al., Estimating clinically significant differences in quality of life outcomes, Qual Life Res 14 (2005), pp. 285–295. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (119)
2 S.D. Mathias, S.K. Gao, M. Rutstein, C.F. Snyder, A.W. Wu and D. Cella, Evaluating clinically meaningful change on the ITP-PAQ: preliminary estimates of minimal important differences, Curr Med Res Opin 25 (2) (2009), pp. 375–383. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (4)
3 K.J. Yost, M.V. Sorensen, E.A. Hahn, G.A. Glendenning, A. Gnanasakthy and D. Cella, Using multiple anchor- and distribution-based estimates to evaluate clinically meaningful change on the Functional Assessment of Cancer Therapy-Biologic Response Modifiers (FACT-BRM) instrument, Value Health 8 (2) (2005), pp. 117–127. Abstract | | Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (28)
4 R.D. Hays and J.M. Woolley, The concept of clinically meaningful difference in health-related quality-of-life research: How meaningful is it?, Pharmacoeconomics 18 (5) (2000), pp. 419–423. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (177)
5 R.L. Daut, C.S. Cleeland and R.C. Flanery, Development of the Wisconsin Brief Pain Questionnaire to assess pain in cancer and other diseases, Pain 17 (2) (1983), pp. 197–210. Abstract | | View Record in Scopus | Cited By in Scopus (543)
6 C. Cleeland, Brief Pain Inventory User Guide, University of Texas M. D. Anderson Cancer Center, Houston (2009).
7 R.H. Dworkin, D.C. Turk and J.T. Farrar et al., Core outcome measures for chronic pain clinical trials: IMMPACT recommendations, Pain 113 (1–2) (2005), pp. 9–19. Article | | View Record in Scopus | Cited By in Scopus (380)
8 U.S. Department of Health and Human Services Food and Drug Administration (FDA), Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims, FDA, Silver Spring, MD (2009).
9 M.M. Oken, R.H. Creech and D.C. Tormey et al., Toxicity and response criteria of the Eastern Cooperative Oncology Group, Am J Clin Oncol 5 (6) (1982), pp. 649–655. View Record in Scopus | Cited By in Scopus (1968)
10 M.J. Brady, D.F. Cella, F. Mo and A.E. Bonomi et al., Reliability and validity of the Functional Assessment of Cancer Therapy–Breast Cancer Quality of Life instrument, J Clin Oncol 15 (1997), pp. 974–986. View Record in Scopus | Cited By in Scopus (360)
11 R.D. Crosby, R.L. Kolotkin and G.R. Williams, Defining clinically meaningful change in health-related quality of life, J Clin Epidemiol 56 (5) (2003), pp. 395–407. Article | | View Record in Scopus | Cited By in Scopus (233)
12 D. Revicki, R.D. Hays, D. Cella and J. Sloan, Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes, J Clin Epidemiol 61 (2) (2008), pp. 102–109. Article | | View Record in Scopus | Cited By in Scopus (121)
13 D. Cella, E.A. Hahn and K. Dineen, Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening, Qual Life Res 11 (3) (2002), pp. 207–221. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (137)
14 D.T. Eton, D. Cella and K.J. Yost et al., A combination of distribution- and anchor-based approaches determined the minimally important differences (MIDs) for four endpoints in a breast cancer scale, J Clin Epidemiol 57 (2004), pp. 898–910. Article | | View Record in Scopus | Cited By in Scopus (68)
15 S. Weibe, S. Matijevic, M. Eliasziw and P.A. Derry, Clinically important change in quality of life in epilepsy, J Neurol Neurosurg Psychiatry 73 (2002), pp. 116–120.
16 K.L. Miller, J.G. Walt and D.R. Mink et al., Minimal clinically important difference for the ocular surface disease index, Arch Ophthalmol 128 (1) (2010), pp. 94–101. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (10)
17 K.W. Wyrwich, W.M. Tierney and F.D. Wolinsky, Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life, J Clin Epidemiol 52 (9) (1999), pp. 861–873. Article | | View Record in Scopus | Cited By in Scopus (272)
18 F.D. Wolinsky, G.J. Wan and W.M. Tierney, Changes in the SF-36 in 12 months in a clinical sample of disadvantaged older adults, Med Care 36 (11) (1998), pp. 1589–1598. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (33)
19 J. Cohen, Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Lawrence Erlbaum, Hillsdale, NJ (1988).
20 G.H. Guyatt, C. Bombardier and P.X. Tugwell, Measuring disease-specific quality of life in clinical trials, CMAJ 134 (8) (1986), pp. 889–895. View Record in Scopus | Cited By in Scopus (324)
21 G.R. Norman, P. Stratford and G. Regehr, Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach, J Clin Epidemiol 50 (8) (1997), pp. 869–879. Article | | View Record in Scopus | Cited By in Scopus (230)
22 A. Stopeck, J. Body and Y. Fujiwara et al., Denosumab versus zoledronic acid for the treatment of breast cancer patients with bone metastases: results of a randomized phase 3 study, Eur J Cancer Suppl 7 (2009), p. 2. Abstract |
23 A.S. Pickard, M.P. Neary and D. Cella, Estimation of minimally important differences in EQ-5D utility and VAS scores in cancer, Health Qual Life Outcomes 5 (2007), p. 70. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (17)
24 R.H. Dworkin, D.C. Turk and K.W. Wyrwich et al., Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations, J Pain 9 (2) (2008), pp. 105–121. Article | | View Record in Scopus | Cited By in Scopus (190)
Correspondence to: Susan D. Mathias, Health Outcomes Solutions, PO Box 2343, Winter Park, FL 32790; telephone: (407) 643-9016; fax: (866) 384-0194
Original research
Susan D. Mathias MPH
Abstract
The Brief Pain Inventory–Short Form (BPI-SF) is widely used for assessing pain in clinical and research studies. The worst pain rating is often the primary outcome of interest; yet, no published data are available on its minimally important difference (MID). Breast cancer patients with bone metastases enrolled in a randomized, double-blind, phase III study comparing denosumab with zoledronic acid for preventing skeletal related events and completed the BPI-SF, FACT-B, and EQ-5D at baseline, week 5, and monthly through the end of the study. Anchor- and distribution-based MID estimates were computed. Data from 1,564 patients were available. Spearman correlation coefficients for anchors ranged from 0.33–0.65. Mean change scores for worst pain ratings corresponding to one-category improvement in each anchor were 0.26–1.04 for BPI-SF current pain, −1.40 to −2.42 for EQ-5D Index score, 1.71–1.98 for EQ-5D Pain item, −2.22 to −0.51 for FACT-B TOI, −1.61 to −0.16 for FACT-G Physical, and −1.31 to −0.12 for FACT-G total. Distribution-based results were 1 SEM = 1.6, 0.5 effect size = 1.4, and Guyatt's statistic = 1.4. Combining anchor- and distribution-based results yielded a two-point MID estimate. An MID estimate of two points is useful for interpreting how much change in worst pain is considered clinically meaningful.
Article Outline
- Methods
- Study Design
- Outcome Measures and Assessment Intervals
- Anchor-Based Analysis
- Distribution-Based Analysis
- Integrating Anchor-Based and Distribution-Based Mid Estimates
The MID may be estimated through distribution-based methods and/or anchor-based methods. Distribution-based methods are based on the distribution of the data. Examples of distribution-based methods include effect size measures, the standard error of measurement (SEM), one-half times the standard deviation, and the responsiveness index.[2] and [3] Anchor-based methods are based on the association between the PRO measure and an interpretable external measure, such as a global rating of change or a response to treatment. These methods may result in somewhat different estimates, and no particular estimate is considered the most valid.[2], [3] and [4] Therefore, researchers are encouraged to use more than one method and to present a range of MID estimates.
A frequently used PRO measure for the assessment of pain is the Brief Pain Inventory–Short Form (BPI-SF). The foundation of the BPI-SF is the Wisconsin Brief Pain Questionnaire, which was developed over 25 years ago based on interviews with cancer patients, expert opinion, and then-current psychometric standards.5 Over time, the Wisconsin Brief Pain Questionnaire evolved into the Brief Pain Inventory, which was later reduced to a shorter version, the BPI-SF. Today, the BPI-SF is the standard for clinical and research use. It has been used in over 400 studies, including psychometric evaluations and clinical applications with a wide range of conditions (e.g., cancer pain, fibromyalgia, neuropathic pain, and joint diseases).6
The BPI-SF includes two domains: pain severity and pain interference. The pain severity domain, the focus of this report, includes items specific to pain at “worst,” “least,” “average,” and “now” (current pain), with a numerical response scale ranging from 0 (no pain) to 10 (pain as bad as you can imagine). In clinical trials, the worst pain item has been used alone as a measure of pain severity.6 Its use as a single item is supported by a consensus panel on outcome measures for chronic pain clinical trials.7 In addition, the Food and Drug Administration's (FDA) guidance on PROs states that a single-item PRO measure of pain severity is appropriate for assessing the effect of a treatment on pain.8 Although extensive psychometric evaluation of the BPI-SF has been conducted, no estimates of the MID are available for the BPI-SF worst pain item. Establishing the MID for the BPI-SF worst pain item is important because it will provide a clinically relevant reference to interpret changes in pain scores. Therefore, the objective of this current report was to estimate the MID of the worst pain item of the BPI-SF.
Methods
Study Design
Patients with advanced breast cancer and bone metastases were enrolled in an international, randomized, double-blind, double-dummy, active-controlled phase III study comparing denosumab with zoledronic acid for delaying or preventing skeletal related events. Patients were eligible to participate if they had histologically or cytologically confirmed breast adenocarcinoma; current or prior radiologic, computed tomography, or magnetic resonance imaging evidence of at least one bone metastasis; and an Eastern Cooperative Oncology Group (ECOG) performance status of 0, 1, or 2. Patients with current or prior intravenous bisphosphonate administration were excluded. Patients completed PRO assessments, including the BPI-SF, at baseline, week 5, and every 4 weeks thereafter until the end of the study. Assessments were scheduled to take place prior to any study procedures and prior to study drug administration. Although data collection continued, PRO analyses for efficacy were truncated when approximately 30% of patients dropped out of the study due to death, disease progression, or withdrawn consent.
Outcome Measures and Assessment Intervals
A number of outcome measures were assessed in the study and considered for use as anchors for evaluating the MID of the BPI-SF worst pain item, including one clinician-reported measure (ECOG Performance Status) and several PRO measures: the EuroQoL 5 Dimensions (EQ-5D) Index score, the Functional Assessment of Cancer Therapy-Breast Cancer (FACT-B), and the BPI-SF current pain rating.
The ECOG Performance Status, which assesses how a patient's disease or its treatment is progressing and how the disease affects the daily living abilities of the patient, is a single-item, six-point, clinician-rated assessment of performance ranging from 0 (fully active, no restrictions) to 5 (dead).9 The EQ-5D Index score is a measure of health status, which assesses five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension is comprised of three response options: no problems, some/moderate problems, and extreme problems. Responses are converted to a weighted health state index, with scores ranging from −0.594 (worst health) to 1.0 (full health). The single item on pain from the EQ-5D was also evaluated separately as an anchor. The FACT-B includes the four core FACT-General (FACT-G) dimensions of physical well-being, social/family well-being, emotional well-being, and functional well-being, for which scale scores and a total score can be computed. In addition, the FACT-B includes a breast cancer–specific subscale.10 The FACT-B Trial Outcome Index (TOI) is the sum of the physical well-being score, the functional well-being score, and the breast cancer subscale. The four FACT-G scale scores, the FACT-G total score, the FACT-B TOI, and a single-item overall quality-of-life (QOL) rating from the functional well-being section were all evaluated as potential anchors. The single-item overall QOL item from the functional well-being scale was selected to balance out the single item on pain that was selected from the EQ-5D, by serving as a more general potential anchor in breadth and scope. For all of these FACT outcome measures, a higher score indicates better health-related QOL. Finally, the current pain rating from the BPI-SF, ranging from 0 (no pain) to 10 (pain as bad as you can imagine), was also considered as an anchor because it was hypothesized to be highly correlated with the worst pain rating and because it would assist in understanding the behavior of other potential anchors.
Several assessment intervals were considered for evaluation of the MID for the BPI-SF worst pain item: baseline to week 5, baseline to week 13, and baseline to week 25. The analysis for each time interval included only those patients with complete baseline and end-of-interval (i.e., week 5, week 13, or week 25) assessments on the BPI-SF worst pain item and the relevant anchor of interest. In addition, a post hoc confirmatory analysis was conducted using a longer interval of time, from baseline to week 49. No imputation of missing data was performed. Analysis was performed on pooled data, regardless of treatment assignment.
Anchor-Based Analysis
The usefulness of an anchor depends on the correlation of the PRO change score and the anchor.11 Therefore, to select the most appropriate anchors and time interval for estimating the MID for the BPI-SF worst pain item, Spearman correlation coefficients were calculated between changes in the BPI-SF worst pain rating and changes in potential anchors across each of the potential time intervals. The time interval with the highest correlations and the anchors with statistically significant (P < 0.05) a priori specified correlations above 0.30 were selected for inclusion in the MID analysis.12
A one-category change was defined as a one-point change for the BPI-SF current pain item, a one-point change for the EQ-5D pain item, a three-point change for the FACT-G Physical Well-Being scale,13 a six-point change for the FACT-G total and FACT-B TOI scores,14 and a 0.20 change for the EQ-5D Index score. For the selected interval and anchors, the mean change in BPI-SF worst pain item that corresponds to a one-category increase and decrease in each anchor was calculated. In addition, ordinary least squares regression models were used to regress changes in BPI-SF worst pain ratings on changes of each of the anchors.[15] and [16] The regression models included main effects for change in each anchor and an interaction term expressing the change in anchor-by-baseline anchor.
Distribution-Based Analysis
The following distribution-based measures were calculated for the BPI-SF worst pain item: (1) the SEM, (2) effect size (Cohen's d), and (3) Guyatt's statistic. The SEM is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. While the standard deviation and the reliability of a measure are sample-dependent, their relationship (and hence the SEM) remains relatively constant across samples. Therefore, the SEM is considered to be an attribute of the measure and not a characteristic of the sample per se.17 Threshold values of 1 SEM have been suggested for defining clinically meaningful differences.18 The reliability coefficient was estimated for the BPI-SF worst pain item by calculating the intraclass correlation coefficients (ICCs) using two intervals of time. One used 7 days (days 1–8), a more typical interval for assessing reproducibility, while the other approach used a later interval, from week 105 to week 109. (Note: The 1-month interval was dictated by the schedule of assessments.) For both ICC values, only those patients whose FACT-B overall QOL ratings changed by 10% or less during the respective intervals were included. The 10% criterion was selected after reviewing the full distribution of change scores and their associated sample sizes, to arrive at a reasonable sample size of approximately 100 subjects.
Cohen's d, alternatively referred to as the “standardized effect size,” is calculated by dividing the difference between the baseline and week-25 scores by the standard deviation at baseline.19 The effect size represents individual change in terms of the number of baseline standard deviations. A value of 0.20 is a small effect, 0.50 is a medium effect, and 0.80 is a large effect. Effect sizes of 0.20, 0.50, and 0.80 were calculated in this study.
Guyatt's statistic, also referred to as the “responsiveness statistic,” is calculated by dividing the difference between baseline and week-25 change by the standard deviation of change observed for a group of stable patients.20 The denominator of the responsiveness statistics adjusts for spurious change due to measurement error. Values of 0.20 and 0.50 have been used to represent “small” and “medium” changes, respectively.21 Values representing 0.20 and 0.50 were calculated in this study. Stable patients were defined as those whose ECOG Performance rating did not change during the assessment interval. A different variable was used in defining the stable population for purposes of calculating the SEM and Guyatt's statistic because both variables were not consistently collected on the same schedule of assessments.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The minimal detectable change (MDC) for the worst pain item was established by comparing distribution-based estimates. The MDC represents the smallest change that can be reliably distinguished from random fluctuation and, thus, the lower bound for establishing the MID.11 If the MID were lower than the MDC, then the instrument would not be capable of distinguishing the MID. The SEM was considered the primary distribution-based estimate because it takes into account the reliability of the measure and, thus, estimates the precision of the instrument.11 Other distribution-based measures were also considered in establishing the MDC. Standardized effect size was considered a secondary distribution-based estimate because of its reliance on interperson variability, which is generally higher and less consistent than intraperson variability. Anchor-based estimates of the MID range were then compared. A final MID range was established that is greater than the MDC and integrates estimates from the various anchors.
Results
Patient Population
Demographic and clinical characteristics for patients included in the baseline to week 25 interval are presented in Table 1. Data from 1,564 of 2,049 patients who participated in the study and had valid (i.e., nonmissing) baseline and end-of-interval scores for the BPI-SF and anchors were used in these analyses. Patients were predominantly female with an average age of 57.2 ± 11.2 years. The majority of patients were white (80.9%). Average pain scores at baseline were 2.45 ± 2.51, with a full range of scores (0–10) being used. Clinical results from the study have been presented previously.22
CHARACTERISTIC, n (%) | STUDY SAMPLE (n = 1,564) |
---|---|
Gender | |
Female | 1,550 (99.1) |
Male | 14 (0.9) |
Age, mean years ± SD (range) | 57.2 ± 11.2 (27.1–91.2) |
Race | |
White | 1,265 (80.9) |
Black | 38 (2.4) |
Hispanic | 92 (5.9) |
Japanese | 119 (7.6) |
Asian | 28 (1.8) |
Other | 22 (1.4) |
Demographic characteristics including the breakdown by gender, age, and race for the study sample are shown.
Anchor-Based Analysis
Spearman correlations between changes in the BPI-SF worst pain item and changes in potential anchors are presented in Table 2. For all potential anchors, the highest correlations with the BPI-SF worst pain rating were obtained at the baseline to week 25 interval. All potential anchors correlated significantly (P < 0.001) with the BPI-SF worst pain rating with the exception of the FACT-G Social/Family Well-Being scale. However, correlations were low (<0.30) for several potential anchors: ECOG Performance Status, FACT-B Overall QOL item, FACT-G Emotional Well-Being, and FACT-G Functional Well-Being. Therefore, the week 25 interval and the following anchors were selected for the MID analysis: BPI-SF current pain rating, EQ-5D Index score, EQ-5D Pain item, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total score. Correlation coefficients between the changes in the selected anchors and changes in the BPI-SF worst pain ratings range from 0.329–0.647.
Bolded correlations represent the highest correlations with anchors where correlation r ≥ 0.300.
Spearman correlation coefficients between changes in BPI-SF worst pain rating and changes in each of the 11 potential anchors that were considered are provided. The data are displayed for three intervals of time including baseline to week 5, baseline to week 13, and baseline to week 25. Using a cut point of r ≥ 0.300, only those correlations that are bolded meet the criteria of acceptability.
Mean changes in the BPI-SF worst pain rating that correspond to a one-category change in anchors from baseline to week 25 are presented in Table 3. BPI-SF current pain ratings >5 and EQ-5D Index scores <0.40 were excluded from their respective analysis due to small sample sizes. A one-category increase in the anchor scores was associated with an absolute value of change in the BPI-SF worst pain item ranging from 0.26–2.42. A one-category decrease in the anchor score was associated with an absolute value of change in the BPI-SF worst pain item ranging from 0.56–3.16. Changes associated with improvement and worsening in anchors were not symmetrical, nor was there a consistent trend across anchors. For example, for the EQ-5D pain item, the magnitude of change in BPI-SF worst pain was greater for a one-category increase in the anchor than for a one-category decrease in the anchor. In contrast, for the EQ-5D Index score, the magnitude of change in BPI-SF worst pain was greater for a one-category decrease in the anchor than for a one-category increase in the anchor.
ANCHOR | ONE CATEGORYA INCREASE IN ANCHOR | ONE CATEGORY DECREASE IN ANCHOR |
---|---|---|
BPI-SF Current Pain rating | 0.26–1.04 | −0.89 to −1.66 |
EQ 5D Index score | −2.42 to −1.40 | 0.56–1.63 |
EQ 5D Pain item | 1.71–1.98 | −3.16 to −2.56 |
FACT-B TOI | −2.22 to −0.51 | −0.56 to 0.77 |
FACT-G Physical Well-Being | −1.61 to −0.16 | −0.79 to 0.46 |
FACT-G total | −1.31 to −0.12 | −0.97 to 0.57 |
The range of mean changes in BPI-SF worst pain ratings (using the interval from baseline to week 25) for the six anchors that met the correlation criteria in Table 2 are provided. Mean changes are displayed for one-category increases and one-category decreases in anchor.
a One category (increase or decrease) represents 0.20 points for EQ-5D Index score, one point for BPI-SF current pain rating and EQ-5D pain item, three points for FACT-G Physical Well-Being, and six points for FACT-G total and FACT-B TOI.
The regression of changes in anchors on changes in the BPI-SF worst pain item is shown in Table 4. Changes in each anchor are significantly (P < 0.05) associated with changes in BPI-SF worst pain rating. A one-point increase in BPI-SF current pain rating and EQ-5D Pain item is associated with a 0.817 and 1.805 increase in BPI-SF worst pain, respectively, while a one-point increase in EQ-5D Index score, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total is associated with a 3.548, 0.098, 0.163, and 0.048 decrease in BPI-SF worst pain rating, respectively. Likewise, a two-point increase in BPI-SF current pain rating and EQ-5D Pain item is associated with a 1.634 and 3.610 increase in BPI-SF worst pain, respectively, while a two-point increase in EQ-5D Index score, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total is associated with a 7.096, 0.196, 0.326, and 0.096 decrease in BPI-SF worst pain rating, respectively. The change in anchor-by-baseline anchor interaction was statistically significant only for BPI current pain and FACT-G Physical Well-Being. The interaction tests whether the anchor–BPI-SF slope differs as a function of baseline anchor score; therefore, a lack of significance suggests that the association between BPI-SF worst pain and other anchors does not differ by baseline anchor rating.
VARIABLE | PREDICTOR | b | β | SIG. |
---|---|---|---|---|
Change in BPI current pain | Main effect | 0.817 | 0.724 | <0.001 |
Interaction with baseline anchor | −0.024 | −0.107 | 0.001 | |
Change in EQ-5D Health State Index | Main effect | −3.548 | −0.349 | <0.001 |
Interaction with baseline anchor | 0.220 | 0.021 | 0.465 | |
Change in EQ-5D Pain item | Main effect | 1.805 | 0.352 | <0.001 |
Interaction with baseline anchor | 0.207 | 0.080 | 0.261 | |
Change in FACT-B TOI | Main effect | −0.098 | −0.406 | <0.001 |
Interaction with baseline anchor | 0.000 | 0.028 | 0.756 | |
Change in FACT-G Physical Well-Being | Main effect | −0.163 | −0.321 | <0.001 |
Interaction with baseline anchor | −0.004 | −0.133 | 0.024 | |
Change in FACT-G total score | Main effect | −0.048 | −0.231 | 0.025 |
Interaction with baseline anchor | 0.000 | −0.130 | 0.209 |
b, regression coefficient; β, standardized regression coefficient; Sig., significance level.
Possible ranges: BPI Pain Right Now 0 (least) to 10 (most), EQ-5D Health State Index scores −0.594 (worst) to 1.00 (best), EQ-5D Pain item scores 1 (none) to 3 (severe), FACT-B TOI scores 4 (worst) to 92 (best), FACT-G Physical Well-Being scores 0 (worst) to 28 (best), FACT-G total score 8 (worst) to 108 (best), BPI Worst Pain item 0 (least) to 10 (most).
Changes in all anchors are significantly (P < 0.05) associated with changes in BPI-SF worst pain ratings. A one-point increase in BPI-SF current pain rating and EQ-5D pain item is associated with increases (positive b score) in the BPI-SF worst pain rating, and a one-point increase in EQ-5D Index, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total scores is associated with decreases (negative b score) in the BPI-SF worst pain ratings. The change in anchor-by-baseline anchor interaction was statistically significant only for the BPI current pain and FACT-G PWB items.
A post hoc confirmatory analysis was done replicating these analyses using data from the baseline to week 49 interval (n = 1,250). Results indicate a slightly stronger correlation between the anchors and the change scores. (Spearman's correlations range from 0.372 for FACT-TOI to 0.644 for BPI-SF current pain rating.) Mean change scores of BPI-SF worst pain ratings by each of the six anchors and regression coefficients were similar to those for the baseline to week 25 interval. For instance, mean change scores for the EQ-5D Pain item for stable patients ranged from 0.25–0.56, 1.58–295 for an improvement of one category, and 1.75–2.80 for a worsening of one category compared with 0.50–0.51, 1.71–1.98, and 2.56–3.16, respectively, for the baseline to week 25 interval.
Distribution-Based Analysis
The distribution-based estimates for the BPI-SF worst pain rating are presented in Table 5. There appears to be consistency with the 1 SEM estimates, the 0.50 effect size, and the 0.50 Guyatt's statistic.
The results from the three distribution-based approaches presented in this table will be combined with those of the anchor-based results to estimate the MID.
a The standard error of measurement is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. Intraclass correlation coefficients (ICCs) for BPI-SF worst pain rating from day 1 to day 8 and week 105 to week 109 in patients whose FACT-B overall QOL ratings change by <10% are 0.685 (n = 926) and 0.800 (n = 109), respectively.b Alternatively referred to as Cohen's d, the effect size is calculated by dividing the difference between the pretest and posttest scores by the standard deviation at pretest. The standard deviation of BPI-SF worst pain rating at baseline (n = 1,877) is 2.849.c Alternatively referred to as the responsiveness statistic, Guyatt's statistic is calculated by dividing the difference between pretest and posttest changes by the standard deviation of change observed for a group of stable patients. The standard deviation of change in BPI-SF worst pain rating from baseline to week 25 in patients whose ECOG performance rating does not change (n = 1,120) is 2.833.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The distribution-based analyses suggest that the MDC for the worst pain rating, defined as the smallest change that can be reliably differentiated from random fluctuation, is between 1.3 and 1.6 points (see Table 5). This represents the lower bound for establishing the MID.
The results from regression analyses can be used to translate changes between anchors and corresponding changes in BPI-SF worst pain. This strategy can be particularly informative when the MID for an anchor is known. This is the case for the EQ-5D Health State Index, where the MID has been estimated at 0.06 for U.S. Index scores and 0.07 for U.K. Index scores.23 A one-point change in EQ-5D Index translates to a change of −3.548 in BPI-SF worst pain, so a 0.07-point change in EQ-5D Index (the MID for the measure) corresponds to a change of −0.248 in BPI-SF worst pain. In contrast, a one-point change in BPI-SF worst pain (which is smaller than the MID based upon the distribution-based analyses) translates to a change of 0.036 for the EQ-5D Index score (considerably smaller than the MID of 0.07). However, a two-point change in BPI-SF worst pain rating corresponds to a 0.072 change in EQ-5D Index score, which is almost identical to the MID for that measure. This suggests that a two-point change may be a reasonable estimate for the MID of the BPI-SF worst pain rating.
Discussion
Data from both distribution-based and anchor-based approaches were used to develop estimates of the MID for the BPI-SF worst pain rating. Results from these approaches are similar, providing reasonably strong support for establishing a two-point MID for the BPI-SF worst pain rating. Further, the results suggest that this estimate of MID is, for the most part, independent of baseline BPI-SF worst pain ratings. However, there is some evidence to suggest that the direction of change (improvement or worsening) may be important to consider. A number of reports have suggested that a smaller change may be required to be considered clinically important when a patient is improving compared with worsening.13 Also, when considered as a percentage, a one-point change in any scale has a different value for an increase versus a decrease; eg, a change from 2 to 3 is an increase of 50%, while a change from 3 to 2 is a decrease of 33%. Nonetheless, these findings provide important information to researchers for interpreting changes in the BPI-SF worst pain ratings.
In addition, although not specific to the BPI worst pain rating, the findings of this study are consistent with other published MID analyses for a similar item. A recent review of three studies concluded that, for a numerical rating scale of pain intensity ranging 0–10 similar in content to the BPI-SF worst pain rating, changes of around two points represent “meaningful,” “much better,” or “much improved” reductions in chronic pain.24
Several factors contribute to the overall strength of the current results. First, as frequently recommended in the literature,11 both anchor-based and distribution-based methods were used to estimate the MID for the worst pain rating. Second, analyses were based on a large sample, totaling over 1,500 patients for the baseline to week 25 assessment interval. A larger sample size will generally provide a broader distribution of responses, which will likely increase the generalizability of the results. Third, multiple anchors were used to evaluate changes in BPI-SF worst pain ratings. Fourth, analyses were performed across several assessment intervals to determine the strongest relationship between BPI-SF ratings and other anchors. Finally, the regression analyses provide important information about whether baseline differences influence the relationship between BPI-SF and other PRO measures.
Nevertheless, these analyses are not without certain limitations. The sample for the current analyses consisted entirely of breast cancer patients. It is unclear to what extent these results will be relevant for other patient populations. Further research is needed to determine whether the MID for the BPI-SF worst pain rating established in this sample has broader applicability. Also, it must be noted that the recall period varied across assessments. The BPI-SF focuses on the past 24 hours, the FACT uses the past week, and the EQ-5D uses the present moment. It is unclear to what extent these differences in recall periods may have influenced the current results. Finally, the baseline to week 25 interval was used to determine the MID for the BPI-SF worst pain rating based on the higher correlations for this interval. Data from baseline to week 49 are consistent with these results, providing some confirmatory evidence to suggest that these MID estimates are stable.
In conclusion, the findings of the present analyses suggest that the MID estimate for the BPI-SF worst pain rating is two points. This value provides guidance to researchers using the BPI-SF worst pain rating on how to interpret baseline differences as well as change scores in the BPI-SF worst pain rating. Additional analyses could be done in other populations to confirm these findings.
References1
1 K.W. Wyrwich, M. Bullinger and N. Aaronson et al., Estimating clinically significant differences in quality of life outcomes, Qual Life Res 14 (2005), pp. 285–295. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (119)
2 S.D. Mathias, S.K. Gao, M. Rutstein, C.F. Snyder, A.W. Wu and D. Cella, Evaluating clinically meaningful change on the ITP-PAQ: preliminary estimates of minimal important differences, Curr Med Res Opin 25 (2) (2009), pp. 375–383. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (4)
3 K.J. Yost, M.V. Sorensen, E.A. Hahn, G.A. Glendenning, A. Gnanasakthy and D. Cella, Using multiple anchor- and distribution-based estimates to evaluate clinically meaningful change on the Functional Assessment of Cancer Therapy-Biologic Response Modifiers (FACT-BRM) instrument, Value Health 8 (2) (2005), pp. 117–127. Abstract | | Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (28)
4 R.D. Hays and J.M. Woolley, The concept of clinically meaningful difference in health-related quality-of-life research: How meaningful is it?, Pharmacoeconomics 18 (5) (2000), pp. 419–423. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (177)
5 R.L. Daut, C.S. Cleeland and R.C. Flanery, Development of the Wisconsin Brief Pain Questionnaire to assess pain in cancer and other diseases, Pain 17 (2) (1983), pp. 197–210. Abstract | | View Record in Scopus | Cited By in Scopus (543)
6 C. Cleeland, Brief Pain Inventory User Guide, University of Texas M. D. Anderson Cancer Center, Houston (2009).
7 R.H. Dworkin, D.C. Turk and J.T. Farrar et al., Core outcome measures for chronic pain clinical trials: IMMPACT recommendations, Pain 113 (1–2) (2005), pp. 9–19. Article | | View Record in Scopus | Cited By in Scopus (380)
8 U.S. Department of Health and Human Services Food and Drug Administration (FDA), Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims, FDA, Silver Spring, MD (2009).
9 M.M. Oken, R.H. Creech and D.C. Tormey et al., Toxicity and response criteria of the Eastern Cooperative Oncology Group, Am J Clin Oncol 5 (6) (1982), pp. 649–655. View Record in Scopus | Cited By in Scopus (1968)
10 M.J. Brady, D.F. Cella, F. Mo and A.E. Bonomi et al., Reliability and validity of the Functional Assessment of Cancer Therapy–Breast Cancer Quality of Life instrument, J Clin Oncol 15 (1997), pp. 974–986. View Record in Scopus | Cited By in Scopus (360)
11 R.D. Crosby, R.L. Kolotkin and G.R. Williams, Defining clinically meaningful change in health-related quality of life, J Clin Epidemiol 56 (5) (2003), pp. 395–407. Article | | View Record in Scopus | Cited By in Scopus (233)
12 D. Revicki, R.D. Hays, D. Cella and J. Sloan, Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes, J Clin Epidemiol 61 (2) (2008), pp. 102–109. Article | | View Record in Scopus | Cited By in Scopus (121)
13 D. Cella, E.A. Hahn and K. Dineen, Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening, Qual Life Res 11 (3) (2002), pp. 207–221. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (137)
14 D.T. Eton, D. Cella and K.J. Yost et al., A combination of distribution- and anchor-based approaches determined the minimally important differences (MIDs) for four endpoints in a breast cancer scale, J Clin Epidemiol 57 (2004), pp. 898–910. Article | | View Record in Scopus | Cited By in Scopus (68)
15 S. Weibe, S. Matijevic, M. Eliasziw and P.A. Derry, Clinically important change in quality of life in epilepsy, J Neurol Neurosurg Psychiatry 73 (2002), pp. 116–120.
16 K.L. Miller, J.G. Walt and D.R. Mink et al., Minimal clinically important difference for the ocular surface disease index, Arch Ophthalmol 128 (1) (2010), pp. 94–101. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (10)
17 K.W. Wyrwich, W.M. Tierney and F.D. Wolinsky, Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life, J Clin Epidemiol 52 (9) (1999), pp. 861–873. Article | | View Record in Scopus | Cited By in Scopus (272)
18 F.D. Wolinsky, G.J. Wan and W.M. Tierney, Changes in the SF-36 in 12 months in a clinical sample of disadvantaged older adults, Med Care 36 (11) (1998), pp. 1589–1598. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (33)
19 J. Cohen, Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Lawrence Erlbaum, Hillsdale, NJ (1988).
20 G.H. Guyatt, C. Bombardier and P.X. Tugwell, Measuring disease-specific quality of life in clinical trials, CMAJ 134 (8) (1986), pp. 889–895. View Record in Scopus | Cited By in Scopus (324)
21 G.R. Norman, P. Stratford and G. Regehr, Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach, J Clin Epidemiol 50 (8) (1997), pp. 869–879. Article | | View Record in Scopus | Cited By in Scopus (230)
22 A. Stopeck, J. Body and Y. Fujiwara et al., Denosumab versus zoledronic acid for the treatment of breast cancer patients with bone metastases: results of a randomized phase 3 study, Eur J Cancer Suppl 7 (2009), p. 2. Abstract |
23 A.S. Pickard, M.P. Neary and D. Cella, Estimation of minimally important differences in EQ-5D utility and VAS scores in cancer, Health Qual Life Outcomes 5 (2007), p. 70. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (17)
24 R.H. Dworkin, D.C. Turk and K.W. Wyrwich et al., Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations, J Pain 9 (2) (2008), pp. 105–121. Article | | View Record in Scopus | Cited By in Scopus (190)
Correspondence to: Susan D. Mathias, Health Outcomes Solutions, PO Box 2343, Winter Park, FL 32790; telephone: (407) 643-9016; fax: (866) 384-0194
Original research
Susan D. Mathias MPH
Abstract
The Brief Pain Inventory–Short Form (BPI-SF) is widely used for assessing pain in clinical and research studies. The worst pain rating is often the primary outcome of interest; yet, no published data are available on its minimally important difference (MID). Breast cancer patients with bone metastases enrolled in a randomized, double-blind, phase III study comparing denosumab with zoledronic acid for preventing skeletal related events and completed the BPI-SF, FACT-B, and EQ-5D at baseline, week 5, and monthly through the end of the study. Anchor- and distribution-based MID estimates were computed. Data from 1,564 patients were available. Spearman correlation coefficients for anchors ranged from 0.33–0.65. Mean change scores for worst pain ratings corresponding to one-category improvement in each anchor were 0.26–1.04 for BPI-SF current pain, −1.40 to −2.42 for EQ-5D Index score, 1.71–1.98 for EQ-5D Pain item, −2.22 to −0.51 for FACT-B TOI, −1.61 to −0.16 for FACT-G Physical, and −1.31 to −0.12 for FACT-G total. Distribution-based results were 1 SEM = 1.6, 0.5 effect size = 1.4, and Guyatt's statistic = 1.4. Combining anchor- and distribution-based results yielded a two-point MID estimate. An MID estimate of two points is useful for interpreting how much change in worst pain is considered clinically meaningful.
Article Outline
- Methods
- Study Design
- Outcome Measures and Assessment Intervals
- Anchor-Based Analysis
- Distribution-Based Analysis
- Integrating Anchor-Based and Distribution-Based Mid Estimates
The MID may be estimated through distribution-based methods and/or anchor-based methods. Distribution-based methods are based on the distribution of the data. Examples of distribution-based methods include effect size measures, the standard error of measurement (SEM), one-half times the standard deviation, and the responsiveness index.[2] and [3] Anchor-based methods are based on the association between the PRO measure and an interpretable external measure, such as a global rating of change or a response to treatment. These methods may result in somewhat different estimates, and no particular estimate is considered the most valid.[2], [3] and [4] Therefore, researchers are encouraged to use more than one method and to present a range of MID estimates.
A frequently used PRO measure for the assessment of pain is the Brief Pain Inventory–Short Form (BPI-SF). The foundation of the BPI-SF is the Wisconsin Brief Pain Questionnaire, which was developed over 25 years ago based on interviews with cancer patients, expert opinion, and then-current psychometric standards.5 Over time, the Wisconsin Brief Pain Questionnaire evolved into the Brief Pain Inventory, which was later reduced to a shorter version, the BPI-SF. Today, the BPI-SF is the standard for clinical and research use. It has been used in over 400 studies, including psychometric evaluations and clinical applications with a wide range of conditions (e.g., cancer pain, fibromyalgia, neuropathic pain, and joint diseases).6
The BPI-SF includes two domains: pain severity and pain interference. The pain severity domain, the focus of this report, includes items specific to pain at “worst,” “least,” “average,” and “now” (current pain), with a numerical response scale ranging from 0 (no pain) to 10 (pain as bad as you can imagine). In clinical trials, the worst pain item has been used alone as a measure of pain severity.6 Its use as a single item is supported by a consensus panel on outcome measures for chronic pain clinical trials.7 In addition, the Food and Drug Administration's (FDA) guidance on PROs states that a single-item PRO measure of pain severity is appropriate for assessing the effect of a treatment on pain.8 Although extensive psychometric evaluation of the BPI-SF has been conducted, no estimates of the MID are available for the BPI-SF worst pain item. Establishing the MID for the BPI-SF worst pain item is important because it will provide a clinically relevant reference to interpret changes in pain scores. Therefore, the objective of this current report was to estimate the MID of the worst pain item of the BPI-SF.
Methods
Study Design
Patients with advanced breast cancer and bone metastases were enrolled in an international, randomized, double-blind, double-dummy, active-controlled phase III study comparing denosumab with zoledronic acid for delaying or preventing skeletal related events. Patients were eligible to participate if they had histologically or cytologically confirmed breast adenocarcinoma; current or prior radiologic, computed tomography, or magnetic resonance imaging evidence of at least one bone metastasis; and an Eastern Cooperative Oncology Group (ECOG) performance status of 0, 1, or 2. Patients with current or prior intravenous bisphosphonate administration were excluded. Patients completed PRO assessments, including the BPI-SF, at baseline, week 5, and every 4 weeks thereafter until the end of the study. Assessments were scheduled to take place prior to any study procedures and prior to study drug administration. Although data collection continued, PRO analyses for efficacy were truncated when approximately 30% of patients dropped out of the study due to death, disease progression, or withdrawn consent.
Outcome Measures and Assessment Intervals
A number of outcome measures were assessed in the study and considered for use as anchors for evaluating the MID of the BPI-SF worst pain item, including one clinician-reported measure (ECOG Performance Status) and several PRO measures: the EuroQoL 5 Dimensions (EQ-5D) Index score, the Functional Assessment of Cancer Therapy-Breast Cancer (FACT-B), and the BPI-SF current pain rating.
The ECOG Performance Status, which assesses how a patient's disease or its treatment is progressing and how the disease affects the daily living abilities of the patient, is a single-item, six-point, clinician-rated assessment of performance ranging from 0 (fully active, no restrictions) to 5 (dead).9 The EQ-5D Index score is a measure of health status, which assesses five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension is comprised of three response options: no problems, some/moderate problems, and extreme problems. Responses are converted to a weighted health state index, with scores ranging from −0.594 (worst health) to 1.0 (full health). The single item on pain from the EQ-5D was also evaluated separately as an anchor. The FACT-B includes the four core FACT-General (FACT-G) dimensions of physical well-being, social/family well-being, emotional well-being, and functional well-being, for which scale scores and a total score can be computed. In addition, the FACT-B includes a breast cancer–specific subscale.10 The FACT-B Trial Outcome Index (TOI) is the sum of the physical well-being score, the functional well-being score, and the breast cancer subscale. The four FACT-G scale scores, the FACT-G total score, the FACT-B TOI, and a single-item overall quality-of-life (QOL) rating from the functional well-being section were all evaluated as potential anchors. The single-item overall QOL item from the functional well-being scale was selected to balance out the single item on pain that was selected from the EQ-5D, by serving as a more general potential anchor in breadth and scope. For all of these FACT outcome measures, a higher score indicates better health-related QOL. Finally, the current pain rating from the BPI-SF, ranging from 0 (no pain) to 10 (pain as bad as you can imagine), was also considered as an anchor because it was hypothesized to be highly correlated with the worst pain rating and because it would assist in understanding the behavior of other potential anchors.
Several assessment intervals were considered for evaluation of the MID for the BPI-SF worst pain item: baseline to week 5, baseline to week 13, and baseline to week 25. The analysis for each time interval included only those patients with complete baseline and end-of-interval (i.e., week 5, week 13, or week 25) assessments on the BPI-SF worst pain item and the relevant anchor of interest. In addition, a post hoc confirmatory analysis was conducted using a longer interval of time, from baseline to week 49. No imputation of missing data was performed. Analysis was performed on pooled data, regardless of treatment assignment.
Anchor-Based Analysis
The usefulness of an anchor depends on the correlation of the PRO change score and the anchor.11 Therefore, to select the most appropriate anchors and time interval for estimating the MID for the BPI-SF worst pain item, Spearman correlation coefficients were calculated between changes in the BPI-SF worst pain rating and changes in potential anchors across each of the potential time intervals. The time interval with the highest correlations and the anchors with statistically significant (P < 0.05) a priori specified correlations above 0.30 were selected for inclusion in the MID analysis.12
A one-category change was defined as a one-point change for the BPI-SF current pain item, a one-point change for the EQ-5D pain item, a three-point change for the FACT-G Physical Well-Being scale,13 a six-point change for the FACT-G total and FACT-B TOI scores,14 and a 0.20 change for the EQ-5D Index score. For the selected interval and anchors, the mean change in BPI-SF worst pain item that corresponds to a one-category increase and decrease in each anchor was calculated. In addition, ordinary least squares regression models were used to regress changes in BPI-SF worst pain ratings on changes of each of the anchors.[15] and [16] The regression models included main effects for change in each anchor and an interaction term expressing the change in anchor-by-baseline anchor.
Distribution-Based Analysis
The following distribution-based measures were calculated for the BPI-SF worst pain item: (1) the SEM, (2) effect size (Cohen's d), and (3) Guyatt's statistic. The SEM is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. While the standard deviation and the reliability of a measure are sample-dependent, their relationship (and hence the SEM) remains relatively constant across samples. Therefore, the SEM is considered to be an attribute of the measure and not a characteristic of the sample per se.17 Threshold values of 1 SEM have been suggested for defining clinically meaningful differences.18 The reliability coefficient was estimated for the BPI-SF worst pain item by calculating the intraclass correlation coefficients (ICCs) using two intervals of time. One used 7 days (days 1–8), a more typical interval for assessing reproducibility, while the other approach used a later interval, from week 105 to week 109. (Note: The 1-month interval was dictated by the schedule of assessments.) For both ICC values, only those patients whose FACT-B overall QOL ratings changed by 10% or less during the respective intervals were included. The 10% criterion was selected after reviewing the full distribution of change scores and their associated sample sizes, to arrive at a reasonable sample size of approximately 100 subjects.
Cohen's d, alternatively referred to as the “standardized effect size,” is calculated by dividing the difference between the baseline and week-25 scores by the standard deviation at baseline.19 The effect size represents individual change in terms of the number of baseline standard deviations. A value of 0.20 is a small effect, 0.50 is a medium effect, and 0.80 is a large effect. Effect sizes of 0.20, 0.50, and 0.80 were calculated in this study.
Guyatt's statistic, also referred to as the “responsiveness statistic,” is calculated by dividing the difference between baseline and week-25 change by the standard deviation of change observed for a group of stable patients.20 The denominator of the responsiveness statistics adjusts for spurious change due to measurement error. Values of 0.20 and 0.50 have been used to represent “small” and “medium” changes, respectively.21 Values representing 0.20 and 0.50 were calculated in this study. Stable patients were defined as those whose ECOG Performance rating did not change during the assessment interval. A different variable was used in defining the stable population for purposes of calculating the SEM and Guyatt's statistic because both variables were not consistently collected on the same schedule of assessments.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The minimal detectable change (MDC) for the worst pain item was established by comparing distribution-based estimates. The MDC represents the smallest change that can be reliably distinguished from random fluctuation and, thus, the lower bound for establishing the MID.11 If the MID were lower than the MDC, then the instrument would not be capable of distinguishing the MID. The SEM was considered the primary distribution-based estimate because it takes into account the reliability of the measure and, thus, estimates the precision of the instrument.11 Other distribution-based measures were also considered in establishing the MDC. Standardized effect size was considered a secondary distribution-based estimate because of its reliance on interperson variability, which is generally higher and less consistent than intraperson variability. Anchor-based estimates of the MID range were then compared. A final MID range was established that is greater than the MDC and integrates estimates from the various anchors.
Results
Patient Population
Demographic and clinical characteristics for patients included in the baseline to week 25 interval are presented in Table 1. Data from 1,564 of 2,049 patients who participated in the study and had valid (i.e., nonmissing) baseline and end-of-interval scores for the BPI-SF and anchors were used in these analyses. Patients were predominantly female with an average age of 57.2 ± 11.2 years. The majority of patients were white (80.9%). Average pain scores at baseline were 2.45 ± 2.51, with a full range of scores (0–10) being used. Clinical results from the study have been presented previously.22
CHARACTERISTIC, n (%) | STUDY SAMPLE (n = 1,564) |
---|---|
Gender | |
Female | 1,550 (99.1) |
Male | 14 (0.9) |
Age, mean years ± SD (range) | 57.2 ± 11.2 (27.1–91.2) |
Race | |
White | 1,265 (80.9) |
Black | 38 (2.4) |
Hispanic | 92 (5.9) |
Japanese | 119 (7.6) |
Asian | 28 (1.8) |
Other | 22 (1.4) |
Demographic characteristics including the breakdown by gender, age, and race for the study sample are shown.
Anchor-Based Analysis
Spearman correlations between changes in the BPI-SF worst pain item and changes in potential anchors are presented in Table 2. For all potential anchors, the highest correlations with the BPI-SF worst pain rating were obtained at the baseline to week 25 interval. All potential anchors correlated significantly (P < 0.001) with the BPI-SF worst pain rating with the exception of the FACT-G Social/Family Well-Being scale. However, correlations were low (<0.30) for several potential anchors: ECOG Performance Status, FACT-B Overall QOL item, FACT-G Emotional Well-Being, and FACT-G Functional Well-Being. Therefore, the week 25 interval and the following anchors were selected for the MID analysis: BPI-SF current pain rating, EQ-5D Index score, EQ-5D Pain item, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total score. Correlation coefficients between the changes in the selected anchors and changes in the BPI-SF worst pain ratings range from 0.329–0.647.
Bolded correlations represent the highest correlations with anchors where correlation r ≥ 0.300.
Spearman correlation coefficients between changes in BPI-SF worst pain rating and changes in each of the 11 potential anchors that were considered are provided. The data are displayed for three intervals of time including baseline to week 5, baseline to week 13, and baseline to week 25. Using a cut point of r ≥ 0.300, only those correlations that are bolded meet the criteria of acceptability.
Mean changes in the BPI-SF worst pain rating that correspond to a one-category change in anchors from baseline to week 25 are presented in Table 3. BPI-SF current pain ratings >5 and EQ-5D Index scores <0.40 were excluded from their respective analysis due to small sample sizes. A one-category increase in the anchor scores was associated with an absolute value of change in the BPI-SF worst pain item ranging from 0.26–2.42. A one-category decrease in the anchor score was associated with an absolute value of change in the BPI-SF worst pain item ranging from 0.56–3.16. Changes associated with improvement and worsening in anchors were not symmetrical, nor was there a consistent trend across anchors. For example, for the EQ-5D pain item, the magnitude of change in BPI-SF worst pain was greater for a one-category increase in the anchor than for a one-category decrease in the anchor. In contrast, for the EQ-5D Index score, the magnitude of change in BPI-SF worst pain was greater for a one-category decrease in the anchor than for a one-category increase in the anchor.
ANCHOR | ONE CATEGORYA INCREASE IN ANCHOR | ONE CATEGORY DECREASE IN ANCHOR |
---|---|---|
BPI-SF Current Pain rating | 0.26–1.04 | −0.89 to −1.66 |
EQ 5D Index score | −2.42 to −1.40 | 0.56–1.63 |
EQ 5D Pain item | 1.71–1.98 | −3.16 to −2.56 |
FACT-B TOI | −2.22 to −0.51 | −0.56 to 0.77 |
FACT-G Physical Well-Being | −1.61 to −0.16 | −0.79 to 0.46 |
FACT-G total | −1.31 to −0.12 | −0.97 to 0.57 |
The range of mean changes in BPI-SF worst pain ratings (using the interval from baseline to week 25) for the six anchors that met the correlation criteria in Table 2 are provided. Mean changes are displayed for one-category increases and one-category decreases in anchor.
a One category (increase or decrease) represents 0.20 points for EQ-5D Index score, one point for BPI-SF current pain rating and EQ-5D pain item, three points for FACT-G Physical Well-Being, and six points for FACT-G total and FACT-B TOI.
The regression of changes in anchors on changes in the BPI-SF worst pain item is shown in Table 4. Changes in each anchor are significantly (P < 0.05) associated with changes in BPI-SF worst pain rating. A one-point increase in BPI-SF current pain rating and EQ-5D Pain item is associated with a 0.817 and 1.805 increase in BPI-SF worst pain, respectively, while a one-point increase in EQ-5D Index score, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total is associated with a 3.548, 0.098, 0.163, and 0.048 decrease in BPI-SF worst pain rating, respectively. Likewise, a two-point increase in BPI-SF current pain rating and EQ-5D Pain item is associated with a 1.634 and 3.610 increase in BPI-SF worst pain, respectively, while a two-point increase in EQ-5D Index score, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total is associated with a 7.096, 0.196, 0.326, and 0.096 decrease in BPI-SF worst pain rating, respectively. The change in anchor-by-baseline anchor interaction was statistically significant only for BPI current pain and FACT-G Physical Well-Being. The interaction tests whether the anchor–BPI-SF slope differs as a function of baseline anchor score; therefore, a lack of significance suggests that the association between BPI-SF worst pain and other anchors does not differ by baseline anchor rating.
VARIABLE | PREDICTOR | b | β | SIG. |
---|---|---|---|---|
Change in BPI current pain | Main effect | 0.817 | 0.724 | <0.001 |
Interaction with baseline anchor | −0.024 | −0.107 | 0.001 | |
Change in EQ-5D Health State Index | Main effect | −3.548 | −0.349 | <0.001 |
Interaction with baseline anchor | 0.220 | 0.021 | 0.465 | |
Change in EQ-5D Pain item | Main effect | 1.805 | 0.352 | <0.001 |
Interaction with baseline anchor | 0.207 | 0.080 | 0.261 | |
Change in FACT-B TOI | Main effect | −0.098 | −0.406 | <0.001 |
Interaction with baseline anchor | 0.000 | 0.028 | 0.756 | |
Change in FACT-G Physical Well-Being | Main effect | −0.163 | −0.321 | <0.001 |
Interaction with baseline anchor | −0.004 | −0.133 | 0.024 | |
Change in FACT-G total score | Main effect | −0.048 | −0.231 | 0.025 |
Interaction with baseline anchor | 0.000 | −0.130 | 0.209 |
b, regression coefficient; β, standardized regression coefficient; Sig., significance level.
Possible ranges: BPI Pain Right Now 0 (least) to 10 (most), EQ-5D Health State Index scores −0.594 (worst) to 1.00 (best), EQ-5D Pain item scores 1 (none) to 3 (severe), FACT-B TOI scores 4 (worst) to 92 (best), FACT-G Physical Well-Being scores 0 (worst) to 28 (best), FACT-G total score 8 (worst) to 108 (best), BPI Worst Pain item 0 (least) to 10 (most).
Changes in all anchors are significantly (P < 0.05) associated with changes in BPI-SF worst pain ratings. A one-point increase in BPI-SF current pain rating and EQ-5D pain item is associated with increases (positive b score) in the BPI-SF worst pain rating, and a one-point increase in EQ-5D Index, FACT-B TOI, FACT-G Physical Well-Being, and FACT-G total scores is associated with decreases (negative b score) in the BPI-SF worst pain ratings. The change in anchor-by-baseline anchor interaction was statistically significant only for the BPI current pain and FACT-G PWB items.
A post hoc confirmatory analysis was done replicating these analyses using data from the baseline to week 49 interval (n = 1,250). Results indicate a slightly stronger correlation between the anchors and the change scores. (Spearman's correlations range from 0.372 for FACT-TOI to 0.644 for BPI-SF current pain rating.) Mean change scores of BPI-SF worst pain ratings by each of the six anchors and regression coefficients were similar to those for the baseline to week 25 interval. For instance, mean change scores for the EQ-5D Pain item for stable patients ranged from 0.25–0.56, 1.58–295 for an improvement of one category, and 1.75–2.80 for a worsening of one category compared with 0.50–0.51, 1.71–1.98, and 2.56–3.16, respectively, for the baseline to week 25 interval.
Distribution-Based Analysis
The distribution-based estimates for the BPI-SF worst pain rating are presented in Table 5. There appears to be consistency with the 1 SEM estimates, the 0.50 effect size, and the 0.50 Guyatt's statistic.
The results from the three distribution-based approaches presented in this table will be combined with those of the anchor-based results to estimate the MID.
a The standard error of measurement is a measure of the precision of a test instrument. It is calculated on the basis of sample data using the sample standard deviation and the sample reliability coefficient. Intraclass correlation coefficients (ICCs) for BPI-SF worst pain rating from day 1 to day 8 and week 105 to week 109 in patients whose FACT-B overall QOL ratings change by <10% are 0.685 (n = 926) and 0.800 (n = 109), respectively.b Alternatively referred to as Cohen's d, the effect size is calculated by dividing the difference between the pretest and posttest scores by the standard deviation at pretest. The standard deviation of BPI-SF worst pain rating at baseline (n = 1,877) is 2.849.c Alternatively referred to as the responsiveness statistic, Guyatt's statistic is calculated by dividing the difference between pretest and posttest changes by the standard deviation of change observed for a group of stable patients. The standard deviation of change in BPI-SF worst pain rating from baseline to week 25 in patients whose ECOG performance rating does not change (n = 1,120) is 2.833.
Integrating Anchor-Based and Distribution-Based Mid Estimates
The distribution-based analyses suggest that the MDC for the worst pain rating, defined as the smallest change that can be reliably differentiated from random fluctuation, is between 1.3 and 1.6 points (see Table 5). This represents the lower bound for establishing the MID.
The results from regression analyses can be used to translate changes between anchors and corresponding changes in BPI-SF worst pain. This strategy can be particularly informative when the MID for an anchor is known. This is the case for the EQ-5D Health State Index, where the MID has been estimated at 0.06 for U.S. Index scores and 0.07 for U.K. Index scores.23 A one-point change in EQ-5D Index translates to a change of −3.548 in BPI-SF worst pain, so a 0.07-point change in EQ-5D Index (the MID for the measure) corresponds to a change of −0.248 in BPI-SF worst pain. In contrast, a one-point change in BPI-SF worst pain (which is smaller than the MID based upon the distribution-based analyses) translates to a change of 0.036 for the EQ-5D Index score (considerably smaller than the MID of 0.07). However, a two-point change in BPI-SF worst pain rating corresponds to a 0.072 change in EQ-5D Index score, which is almost identical to the MID for that measure. This suggests that a two-point change may be a reasonable estimate for the MID of the BPI-SF worst pain rating.
Discussion
Data from both distribution-based and anchor-based approaches were used to develop estimates of the MID for the BPI-SF worst pain rating. Results from these approaches are similar, providing reasonably strong support for establishing a two-point MID for the BPI-SF worst pain rating. Further, the results suggest that this estimate of MID is, for the most part, independent of baseline BPI-SF worst pain ratings. However, there is some evidence to suggest that the direction of change (improvement or worsening) may be important to consider. A number of reports have suggested that a smaller change may be required to be considered clinically important when a patient is improving compared with worsening.13 Also, when considered as a percentage, a one-point change in any scale has a different value for an increase versus a decrease; eg, a change from 2 to 3 is an increase of 50%, while a change from 3 to 2 is a decrease of 33%. Nonetheless, these findings provide important information to researchers for interpreting changes in the BPI-SF worst pain ratings.
In addition, although not specific to the BPI worst pain rating, the findings of this study are consistent with other published MID analyses for a similar item. A recent review of three studies concluded that, for a numerical rating scale of pain intensity ranging 0–10 similar in content to the BPI-SF worst pain rating, changes of around two points represent “meaningful,” “much better,” or “much improved” reductions in chronic pain.24
Several factors contribute to the overall strength of the current results. First, as frequently recommended in the literature,11 both anchor-based and distribution-based methods were used to estimate the MID for the worst pain rating. Second, analyses were based on a large sample, totaling over 1,500 patients for the baseline to week 25 assessment interval. A larger sample size will generally provide a broader distribution of responses, which will likely increase the generalizability of the results. Third, multiple anchors were used to evaluate changes in BPI-SF worst pain ratings. Fourth, analyses were performed across several assessment intervals to determine the strongest relationship between BPI-SF ratings and other anchors. Finally, the regression analyses provide important information about whether baseline differences influence the relationship between BPI-SF and other PRO measures.
Nevertheless, these analyses are not without certain limitations. The sample for the current analyses consisted entirely of breast cancer patients. It is unclear to what extent these results will be relevant for other patient populations. Further research is needed to determine whether the MID for the BPI-SF worst pain rating established in this sample has broader applicability. Also, it must be noted that the recall period varied across assessments. The BPI-SF focuses on the past 24 hours, the FACT uses the past week, and the EQ-5D uses the present moment. It is unclear to what extent these differences in recall periods may have influenced the current results. Finally, the baseline to week 25 interval was used to determine the MID for the BPI-SF worst pain rating based on the higher correlations for this interval. Data from baseline to week 49 are consistent with these results, providing some confirmatory evidence to suggest that these MID estimates are stable.
In conclusion, the findings of the present analyses suggest that the MID estimate for the BPI-SF worst pain rating is two points. This value provides guidance to researchers using the BPI-SF worst pain rating on how to interpret baseline differences as well as change scores in the BPI-SF worst pain rating. Additional analyses could be done in other populations to confirm these findings.
References1
1 K.W. Wyrwich, M. Bullinger and N. Aaronson et al., Estimating clinically significant differences in quality of life outcomes, Qual Life Res 14 (2005), pp. 285–295. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (119)
2 S.D. Mathias, S.K. Gao, M. Rutstein, C.F. Snyder, A.W. Wu and D. Cella, Evaluating clinically meaningful change on the ITP-PAQ: preliminary estimates of minimal important differences, Curr Med Res Opin 25 (2) (2009), pp. 375–383. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (4)
3 K.J. Yost, M.V. Sorensen, E.A. Hahn, G.A. Glendenning, A. Gnanasakthy and D. Cella, Using multiple anchor- and distribution-based estimates to evaluate clinically meaningful change on the Functional Assessment of Cancer Therapy-Biologic Response Modifiers (FACT-BRM) instrument, Value Health 8 (2) (2005), pp. 117–127. Abstract | | Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (28)
4 R.D. Hays and J.M. Woolley, The concept of clinically meaningful difference in health-related quality-of-life research: How meaningful is it?, Pharmacoeconomics 18 (5) (2000), pp. 419–423. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (177)
5 R.L. Daut, C.S. Cleeland and R.C. Flanery, Development of the Wisconsin Brief Pain Questionnaire to assess pain in cancer and other diseases, Pain 17 (2) (1983), pp. 197–210. Abstract | | View Record in Scopus | Cited By in Scopus (543)
6 C. Cleeland, Brief Pain Inventory User Guide, University of Texas M. D. Anderson Cancer Center, Houston (2009).
7 R.H. Dworkin, D.C. Turk and J.T. Farrar et al., Core outcome measures for chronic pain clinical trials: IMMPACT recommendations, Pain 113 (1–2) (2005), pp. 9–19. Article | | View Record in Scopus | Cited By in Scopus (380)
8 U.S. Department of Health and Human Services Food and Drug Administration (FDA), Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims, FDA, Silver Spring, MD (2009).
9 M.M. Oken, R.H. Creech and D.C. Tormey et al., Toxicity and response criteria of the Eastern Cooperative Oncology Group, Am J Clin Oncol 5 (6) (1982), pp. 649–655. View Record in Scopus | Cited By in Scopus (1968)
10 M.J. Brady, D.F. Cella, F. Mo and A.E. Bonomi et al., Reliability and validity of the Functional Assessment of Cancer Therapy–Breast Cancer Quality of Life instrument, J Clin Oncol 15 (1997), pp. 974–986. View Record in Scopus | Cited By in Scopus (360)
11 R.D. Crosby, R.L. Kolotkin and G.R. Williams, Defining clinically meaningful change in health-related quality of life, J Clin Epidemiol 56 (5) (2003), pp. 395–407. Article | | View Record in Scopus | Cited By in Scopus (233)
12 D. Revicki, R.D. Hays, D. Cella and J. Sloan, Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes, J Clin Epidemiol 61 (2) (2008), pp. 102–109. Article | | View Record in Scopus | Cited By in Scopus (121)
13 D. Cella, E.A. Hahn and K. Dineen, Meaningful change in cancer-specific quality of life scores: differences between improvement and worsening, Qual Life Res 11 (3) (2002), pp. 207–221. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (137)
14 D.T. Eton, D. Cella and K.J. Yost et al., A combination of distribution- and anchor-based approaches determined the minimally important differences (MIDs) for four endpoints in a breast cancer scale, J Clin Epidemiol 57 (2004), pp. 898–910. Article | | View Record in Scopus | Cited By in Scopus (68)
15 S. Weibe, S. Matijevic, M. Eliasziw and P.A. Derry, Clinically important change in quality of life in epilepsy, J Neurol Neurosurg Psychiatry 73 (2002), pp. 116–120.
16 K.L. Miller, J.G. Walt and D.R. Mink et al., Minimal clinically important difference for the ocular surface disease index, Arch Ophthalmol 128 (1) (2010), pp. 94–101. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (10)
17 K.W. Wyrwich, W.M. Tierney and F.D. Wolinsky, Further evidence supporting an SEM-based criterion for identifying meaningful intra-individual changes in health-related quality of life, J Clin Epidemiol 52 (9) (1999), pp. 861–873. Article | | View Record in Scopus | Cited By in Scopus (272)
18 F.D. Wolinsky, G.J. Wan and W.M. Tierney, Changes in the SF-36 in 12 months in a clinical sample of disadvantaged older adults, Med Care 36 (11) (1998), pp. 1589–1598. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (33)
19 J. Cohen, Statistical Power Analysis for the Behavioral Sciences (2nd ed.), Lawrence Erlbaum, Hillsdale, NJ (1988).
20 G.H. Guyatt, C. Bombardier and P.X. Tugwell, Measuring disease-specific quality of life in clinical trials, CMAJ 134 (8) (1986), pp. 889–895. View Record in Scopus | Cited By in Scopus (324)
21 G.R. Norman, P. Stratford and G. Regehr, Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach, J Clin Epidemiol 50 (8) (1997), pp. 869–879. Article | | View Record in Scopus | Cited By in Scopus (230)
22 A. Stopeck, J. Body and Y. Fujiwara et al., Denosumab versus zoledronic acid for the treatment of breast cancer patients with bone metastases: results of a randomized phase 3 study, Eur J Cancer Suppl 7 (2009), p. 2. Abstract |
23 A.S. Pickard, M.P. Neary and D. Cella, Estimation of minimally important differences in EQ-5D utility and VAS scores in cancer, Health Qual Life Outcomes 5 (2007), p. 70. Full Text via CrossRef | View Record in Scopus | Cited By in Scopus (17)
24 R.H. Dworkin, D.C. Turk and K.W. Wyrwich et al., Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations, J Pain 9 (2) (2008), pp. 105–121. Article | | View Record in Scopus | Cited By in Scopus (190)
Correspondence to: Susan D. Mathias, Health Outcomes Solutions, PO Box 2343, Winter Park, FL 32790; telephone: (407) 643-9016; fax: (866) 384-0194