|
|
||||
|
|
||||
|
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Imaging |
Division of Medical Oncology (E.G., J.C.B.), Division of Biostatistics (M.J.M., K.V.B.), and Department of Radiology (R.S., B.J.E.), Mayo Clinic, Rochester, MN 55905; and Ochsner Clinic Foundation, New Orleans, LA 70121 (R.C.); USA
2 Address correspondence to evanthia Galanis, Division of Medical oncology, Mayo clinic, 200 First Street SW, rochester, MN 55905 (galanis.evanthia{at}mayo.edu).
| Abstract |
|---|
|
|
|---|
Key Words: gliomas Response Evaluation Criteria in Solid Tumors (RECIST) tumor response WHO criteria
| Introduction |
|---|
|
|
|---|
Assessment of response to treatment in clinical trials has been traditionally performed by using the WHO criteria (Miller et al., 1981; WHO, 1979), with tumor size estimation based on bidimensional measurements. Several problems have been identified with use of WHO criteria, including variances among research groups in the minimum lesion size and number of lesions to be recorded, variability in the definition of progressive disease, and the need to incorporate newer technology such as CT or MRI three-dimensional (3D)3 measurements into response assessment. The Response Evaluation Criteria in Solid Tumors (RECIST) criteria were the product of collaboration of WHO, the National Cancer Institute of the United States, the European Organization for Research and Treatment of Cancer, and the National Cancer Institute of Canada Clinical Trials Group; they are based on unidimensional tumor measurements and were recently introduced in an attempt to standardize and simplify assessment of response to treatment in cancer clinical trials (Therasse et al., 2000). The initial series of patients on whom the development of RECIST guidelines was based included breast, lung, ovary, melanoma, and sarcoma patients for the majority of cases, as well as 31 brain tumor patients from the National Cancer Institute of Canada Clinical Trial Group phase 2 and 3 trials (Therasse et al., 2000). Subsequent comparisons of RECIST and WHO response criteria in different patient populations appear to support their equivalence in common tumor types such as breast (Park et al., 2003; Prasad et al., 2003) and lung (Park et al., 2003). Nevertheless, for other tumor types with inadequate representation in the cohort of patients on whom development of RECIST guidelines was based, such as mesothelioma (Byrne and Nowak, 2004; Monetti et al., 2004) and pediatric tumors (McHugh and Kao, 2003), concerns have been expressed, and the issue has been raised of possible modification of the RECIST criteria in order to assess response more accurately. In a series of 32 pediatric patients (130 MRI scans), Warren et al. (2001) showed high concordance among 1D, 2D, and 3D methods in detecting partial response, but estimating time to disease progression appeared to be method dependent for childhood brain tumors. In this latter series, only 10 patients had high-grade gliomas. Given the small number of adult patients with brain tumors in the initial RECIST analysis, and the challenges associated with response assessment in primary brain tumors, there is a need for further comparative assessment of RECIST criteria prior to their routine incorporation into glioma trials and the neuro-oncology practice.
Our study compared the unidimensional RECIST criteria with the WHO bidimensional criteria as tools for assessing response in patients with newly diagnosed glioma. In addition, we investigated the value of incorporating computer-calculated area and volume measurements in the follow-up and assessment of response in this patient population.
| Materials and Methods |
|---|
|
|
|---|
10-mm diameter) (Therasse et al., 2000). All imaging studies were performed with the same technique (5-mm fixed slices with 2.5-mm gap, with precontrast T1-weighted, postcontrast T1-weighted, conventional spin-echo proton-density, and T2-weighted images in the oblique-axial plane). All studies were de-identified to random numbers. The contours of the tumors were drawn on the T2/photon-density and postgadolinium T1 images by using semiautomated methods. The same neuroradiologist (B.J.E.) adjusted thresholds and seed points to define the outer margins of the tumor T2 image and gadolinium-enhanced images. The same threshold values were used for all slices of a particular sequence. These maps (both T1 and enhancement) were then analyzed to determine the major axis length on a slice (a 1D, or RECIST, measurement), the product of the major and minor axis (a 2D measurement), the greatest area of any single image (area), and the volume. Major axis, minor axis, area, and volume were defined as follows:
Definition of Response
For all measurements, the patients were classified according to response status. Partial response (regression, REGR) was determined by comparison to the baseline scan, and progression (PROG) was determined by comparison to the prior scan with the smallest tumor measurement.
The following cutoffs were employed in order to define REGR or PROG and were based on the WHO (1979) and RECIST (Therasse et al., 2000) definitions of partial response and their correlation with volume (Therasse et al., 2000).
Complete response was defined as complete disappearance of the patient's tumor. However, none of the 67 patients met radiographic criteria for complete response. Regression or complete response also required the patient to be on stable, decreased dose or off corticosteroids. Patients who did not meet the criteria for REGR or PROG were classified as stable (STAB).
Statistical Analysis
Categorical patient characteristics were summarized with the observed frequency and percent; age was summarized with the mean ± standard deviation as well as the median (minimum, maximum) age. Assessment of the amount of agreement among all distinct pairs of the eight measurements was summarized with observed frequency and percent, as well as with a weighted kappa statistic (Cohen, 1968) and its 95% confidence interval (CI). The kappa statistic measures the amount of agreement (i.e., correlation) between two measurements; a kappa value of 1 indicates perfect agreement, a value of 0 indicates lack of agreement, and a value of -1 indicates perfect disagreement. When the 95% CI for the kappa statistic does not contain 0, this indicates a statistically significant agreement between the two measures at the 0.05 levelthat is, P < 0.05. The amount of agreement was determined for two variables: best objective response assessed at four months (PROG vs. STAB vs. REGR) and response/nonresponse at four months (REGR vs. STAB or PROG).
Outcome variables of interest were survival, progression-free survival, and duration of a response. In this study, all patients were part of clinical trials. No scans were available after a patient went off study because of progression. Survival was measured from time of study enrollment until death or last follow-up. Progression-free survival was measured from time of study enrollment until progression (as determined by the measurement method) or last follow-up. Duration of response was measured from time of an initial objective response of REGR until progression or last follow-up. All time-to-event measures were summarized with curves obtained from KaplanMeier estimates (Kaplan and Meier, 1958), and the median time and 95% CI for the median time are reported. KaplanMeier estimates for the time-to-event variables among the eight different tumor assessment measures were not directly compared because each measure was applied on the same set of patients.
The association between each tumor assessment method and patient outcome was evaluated by using Cox proportional hazards models. For this analysis, the response status at four months after starting study treatment was determined for each tumor assessment method. The four-month time point allows for evaluation of two eight-week treatment cycles; most observed antitumor activity will have occurred by this time. All patients in the study were alive at the four-month assessment; thus, a landmark analysis may be used for overall survival, where the response status is examined for predicting future survival (Hess et al., 1999). For each assessment method, comparisons to survival were made between patients classified as responders (REGR) and those classified as nonresponders (STAB or PROG) at the four-month time point, as well as between patients classified as progressors (PROG) and those who did not have progressive disease at four months (REGR or STAB). In addition, time-dependent Cox models were used to examine the relationship between survival and response (progression) status across the entire follow-up period. Specifically, patients changed from no response (or no progression) to response (or progression) at the time of the evaluation for which the tumor measurement satisfied the response (or progression) criterion; if a response (or progression) was not observed for a patient during follow-up, the patient remained in the no-response (no-progression) group throughout the follow-up period. All statistical tests were two-sided, and a P of <0.05 was considered to be statistically significant.
All analyses were done on the combined set of tumors, as well as on groups of tumors stratified by enhancement status (enhancing vs. nonenhancing) and by tumor grade (high vs. low). Grade 1 and 2 tumors were considered low grade, and grade 3 and 4 tumors were considered high grade.
| Results |
|---|
|
|
|---|
|
Assessment of Agreement Among the Different Response Measures
To evaluate agreement among the different measurement methods in assessing the best objective response, we computed the number (and percent) of patients for each pair of methods in which there was agreement between the response assessments (REGR, STAB, or PROG) at four months, and we determined the distribution of response assessment at four months produced by each measurement method (Table 2). From Table 2, it can be seen that, for enhancing tumors, the measures on Gd-enhanced images were more likely to have a response classification of REGR than the corresponding measures on the T2 images. The area and volume measurements were more likely to produce a response of STAB than the 1D and 2D measures within a particular image type (T2 or Gd enhanced). For the nonenhancing tumors, area and volume measures were again more likely to produce a response status of STAB than the 1D and 2D measures; the 1D and 2D measures were more likely to have a response assessment of PROG.
|
Figure 1 shows the values of the weighted kappa estimates, and the corresponding 95% CIs, for the agreement between pairs of methods by response status (REGR, STAB, and PROG) at four months. In general, there was substantial agreement between the 1D and 2D measurements for both T2-enhanced and Gd-enhanced images. There was also strong agreement between area and volume measurements for both T2-enhanced and Gd-enhanced images. Although there was a statistically significant agreement between some pairs of measurements where one measure was on a T2-enhanced image and the other on a Gd-enhanced image, the agreement was generally weaker. Analyses on all tumors pooled and stratified by tumor grade yielded analogous results.
|
|
There were considerably more patients who experienced tumor progression as determined by the various tumor measurement techniques (Table 4) than there were patients who had tumor responses. For the enhancing tumors, the majority of the progressions occurred within four months. This was not the case for the nonenhancing tumors. Note that tumor progression as determined by 1D T2 and 2D T2 was found not to be significantly associated with survival for both enhancing and nonenhancing tumors. There was a statistically significant association between tumor progressions, as determined by 1D Gd, 2D Gd, area Gd, and volume Gd, and survival in the enhancing tumors; this was the case for both the progression status at four months and the time-dependent progression status variable. Individuals who were determined to have a progression on the basis of their tumor measurement had a worse survival. There also was an association between progression status at four months, as determined by area T2 and volume T2 measurements, and survival in nonenhancing tumors; however, this association was weaker and did not quite achieve statistical significance when progression status was treated as a time-dependent variable.
|
Determinations of Time to Response, Duration of Response, and Time to Progression
Estimates of the median time to response (i.e., to a tumor status assessment of REGR), the median duration of response (i.e., time from radiographic status of REGR to progression), and the median time to progression (i.e., time from study enrollment to progression) were determined for each of the eight measurements (Table 5). Area T2 and volume T2 tended to have a longer median time until a response was declared than all the other methods, regardless of image type and tumor enhancement status. The 1D and 2D median times to response also appeared similar within image type for both enhancing and nonenhancing tumors. In general, 1D and 2D measurements tended to have the shortest median response durations, regardless of image type and whether the tumor enhanced. Although the median duration of response appeared similar among all four measurements for nonenhancing tumors, this was not the case for the enhancing tumors. Finally, for the median time to progression, the volume and area measurements tended to have a longer median time to progression than the 1D and 2D measurements across image types and tumor enhancement status. This was most pronounced for the nonenhancing tumors, with the median time to progression for area T2 and volume T2 (15.8 and 47.3 months, respectively) being considerably longer than for the 1D T2 and 2D T2 measurements (2.5 and 4.1 months, respectively).
|
| Discussion |
|---|
|
|
|---|
When assessing response in gliomas, use of either WHO or RECIST response criteria is complicated by a more fundamental question, which is whether conventional oncological criteria of response when translated into CNS tumors represent a useful measure and a true reflection of treatment efficacy. The poor correlation between response as measured in phase 2 studies and survival in adjuvant studies suggests that there may be methodological flaws. Within the brain, a reduction in the size of an enhancing abnormality may represent either loss of tumor cells or other processes such as an alteration in the properties of the bloodbrain barrier. Even if decreased size is indicative of tumor cell death, the assessment of radiological response is difficult. Although agreement on response definition provides a common language, it is not always clear whether it is accurately associated with the principal end point: survival. The goal of this study was to address some of these issues by comparing 1D, 2D, area, and volume measurements in patients with newly diagnosed glioma and correlate them with outcome.
In our study, RECIST 1D measurements were comparable to 2D measurements in determining time to response, duration of response, and time to progression (Table 5). Furthermore, there was agreement between RECIST and 2D measurements both in Gd-enhanced and T2 images (Fig. 1; kappa = 0.87 [CI, 0.731.00] and kappa = 0.81 [CI, 0.680.94], respectively). These data are consistent with the comparative analysis of the two methodologies performed on 30 brain tumor patients, which was also taken into account for the development of RECIST guidelines, and they support replacement of 2D with RECIST response criteria in neuro-oncology clinical trials.
Although there was good agreement between RECIST 1D and 2D measurements, as Table 2 indicates (86% and 90% for enhancing and nonenhancing tumors, respectively), as shown in Tables 3 and 4, neither measurement appears to predict outcome. Specifically, no association was found between response, as assessed by these two methods, and survival. In this respect, neither method appears superior to the other. Nevertheless, the small number of responders could have significantly decreased the likelihood of identifying existing associations.
As it pertains to volumetric measurements, there was good agreement between volume and 1D and 2D measurements in Gd-enhanced images (75% agreement with both 1D and 2D measurements [Table 2]). The agreement was much weaker, however, in the T2 images (Table 2 and Fig. 1). Furthermore, according to this set of data, neither Gd nor T2 volume measurements appeared to predict outcome for either enhancing or nonenhancing tumors.
In our data set, response at four months was not predictive of overall survival for any assessment method (all P > 0.14), while the only significant association between response and survival in time-dependent Cox models pertained to 2D measurements in Gd-enhanced images (P = 0.02). The small number of responders prevents definitive conclusions, however. In contrast, the 1D Gdmeasured and the 2D Gdmeasured progression at four months was predictive of overall survival (P = 0.0009 and 0.001, respectively). There was no such association for 1D T2 and 2D T2 measurements, however (P > 0.23 for all enhancing and nonenhancing tumors [Tables 3 and 4]).
These results emphasize significant methodological problems associated with assessment of response of nonenhancing tumors such as low-grade gliomas to treatment: Responses based on 2D T2 images do not associate well with patient outcome, and 1D (RECIST) T2 images fare equally poorly. There is an important need to incorporate and prospectively validate imaging methodology that can better predict outcome of nonenhancing tumors in low-grade glioma trials.
The other point that these data emphasize is that when time to progression is used as the primary outcome, results may vary widely, depending on the imaging methodology used. This is illustrated by the shorter time to progression of nonenhancing tumors when assessment is performed by 1D T2 or 2D T2 images as compared to area T2 or volume T2 images. It is particularly pertinent for low-grade tumors, in which T2 measurements represent the mainstay for assessment of treatment efficacy. In contrast, 1D, 2D, area, and volume Gd measurements perform similarly with regard to duration of response and time to progression.
A frequent concern when bidimensional or unidimensional measurements are employed on imaging studies pertains to intraobserver and interobserver variability. This can be quite high (Hopper et al., 1996; Lavin and Flowerdew, 1980; Quoix et al., 1988; Thiesse et al., 1997; Warr et al., 1984), presumably because of the subjectivity involved in defining the exact margins of the lesion and determining the lesion's largest diameter and its largest perpendicular diameter (Fornage, 1993). Such variability can have a significant impact on the assessment of an individual patient's tumor response to a given therapy, as well as the determination of the efficacy of a new antitumor therapy (Lavin and Flowerdew, 1980; Thiesse et al., 1997; Warr et al., 1984). Schwartz and coworkers (2000) have shown that tumor size can be obtained more accurately and consistently by readers using an automated autocontour technique than by those using handheld or electronic calipers. Autocontouring in their series was performed with the radiologist placing a cursor in the center of the lesion and the computer determining the border of the lesion on the basis of density differences. In our set of data, the area-T2 and area-Gd determinations were based on computer determination of the tumor area, which was based on the largest contiguous group of pixels on any slide. It is of note that in our study, both for enhancing and for nonenhancing tumors, a progression status that was defined by computer-calculated tumor area or volume measurements in T2 images at four months performed significantly better in predicting survival than did a progression status that was defined by 1D T2 or 2D T2 measurements (Tables 3 and 4). A possible explanation for this could be the higher sensitivity of the density-based area determination approach in assessing the real extent of the lesion in the absence of enhancement.
Our series includes only patients with newly diagnosed glioma. Although our conclusions could also be applicable for patients with recurrent glioma, additional methodological difficulties apply, especially when assessing the response to newer treatment modalities, such as biologics or molecular targeted therapies. The value of RECIST versus 2D measurements versus the added value of other methodology, that is, area-based or volume-based determinations of response, versus use of functional imaging such as thallium 201 single-proton-emission computer tomography (Vos et al., 2003) in assessing response to treatment in patients with recurrent/progressive disease will need to be further evaluated. We are currently performing such an analysis.
In summary, our analysis results support the conclusion that RECIST could be used instead of conventional 2D imaging in trials with patients who have newly diagnosed glioma. Overall responses as determined by any tumor measurement method did not correlate with patient survival for either enhancing or nonenhancing tumors, although the small number of responders limits definitive conclusions. In time-dependent Cox models, progression as determined by 1D, 2D, area, and volume measurements in Gd-enhanced images was predictive of survival of patients with enhancing tumors.
| Footnotes |
|---|
3 Abbreviations used are as follows: 1D, 2D, and 3D: one-, two-, and three-dimensional, respectively; CI, confidence interval; PROG, progression; RECIST, Response Evaluation Criteria in Solid Tumors; REGR, partial response; STAB, stable. ![]()
Received for publication April 5, 2005. Accepted for publication October 19, 2005.
| References |
|---|
|
|
|---|
Brada, M., and Sharpe, G. (1996) chemotherapy of high-grade gliomas: beginning a new era or the end of the old? Eur. J. Cancer 32A, 2193-2194.[CrossRef]
Byrne, M.J., and Nowak, A.K. (2004) Modified RECIST criteria for assessment of response in malignant pleural mesothelioma. Ann. Oncol. 15, 257-260.
Cohen, J. (1968) Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychol. Bull. 70, 213-220.[CrossRef][ISI]
Fornage, b.D. (1993) Measuring masses on cross-sectional images. Radiology 187, 289 (letter).
Hess, K.r., Wong, e.t., Jaeckle, K.A., Kyritsis, A.P., levin, V.A., Prados, M.D., and Yung, W.K.A. (1999) response and progression in recurrent malignant glioma. Neuro-Oncology 1, 282-288.[Abstract]
Hopper, K.D., Kasales, c.J., Van Slyke, M.A., Schwartz, t.A., TenHave, t.r., and Jozefiak, J.A. (1996) Analysis of interobserver and intraobserver variability in ct tumor measurements. AJR. Am. J. Roentgenol. 167, 851-854.
James, K., eisenhauer, e., christian, M., terenziani, M., Vena, D., Muldal, A., and therasse, P. (1999) Measuring response in solid tumors: Unidimensional versus bidimensional measurement. J. Natl. Cancer Inst. 91, 523-528.
Kaplan, e.l., and Meier, P. (1958) Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457-481.[CrossRef][ISI]
lavin, P.t., and Flowerdew, G. (1980) Studies in variation associated with the measurement of solid tumors. Cancer 46, 1286-1290.[CrossRef][ISI][Medline]
Macdonald, D.r., cascino, t.l., Schold, S.c., Jr., and cairncross, J.G. (1990) response criteria for phase II studies of supratentorial malignant glioma. J. Clin. Oncol. 8, 1277-1280.[Abstract]
McHugh, K., and Kao, S. (2003) response evaluation criteria in solid tumours (RECIST): Problems and need for modifications in paediatric oncology? Br. J. Radiol. 76, 433-436.
Miller, A.b., Hoogstraten, b., Staquet, M., and Winkler, A. (1981) reporting results of cancer treatment. Cancer 47, 207-214.[CrossRef][ISI][Medline]
Monetti, F., casanova, S., Grass, A., cafferata, M.A., Ardizzoni, A., and Neumaier, c.e. (2004) Inadequacy of the new response evaluation criteria in Solid tumors (RECIST) in patients with malignant pleural mesothelioma: report of four cases. Lung Cancer 43, 71-74.[CrossRef][ISI][Medline]
Padhani, A.r., and Husband, J.e. (2000) commentary: Are current tumour response criteria relevant for the 21st century? Br. J. Radiol. 73, 1031-1033.[ISI][Medline]
Park, J.o., lee, S.I., Song, S.Y., Kim, K., Kim, W.S., Jung, c.W., Park, Y.S., Im, Y.H., Kang, W.K., lee, M.H., lee, K.S., and Park, K. (2003) Measuring response in solid tumors: comparison of RECIST and WHo response criteria. Jpn. J. Clin. Oncol. 33, 533-537.
Prasad, S.r., Saini, S., Sumner, J.e, Hahn, P.F., Sahani, D., and boland, G.W. (2003) radiological measurement of breast cancer metastases to lung and liver: comparison between WHo (bidimensional) and RECIST (unidimensional) guidelines. J. Comput. Assist. Tomogr. 27, 380-384.[CrossRef][ISI][Medline]
Quoix, e., Wolkove, N., Hanley, J., and Kreisman, H. (1988) Problems in radiographic estimation of response to chemotherapy and radiotherapy in small cell lung cancer. Cancer 62, 489-493.[CrossRef][ISI][Medline]
Schwartz, l.H., Ginsberg, M.S., Decorato, D., rothenberg, l.N., einstein, S., Kijewski, P., and Panicek, D.M. (2000) evaluation of tumor measurements in oncology: Use of film-based and electronic techniques. J. Clin. Oncol. 18, 2179-2184.
Therasse, P. (2002) Measuring the clinical response. What does it mean? [erratum in Eur. J. Cancer [2003] 39, 1489] Eur. J. Cancer 38, 1817-1823.[CrossRef][ISI][Medline]
Therasse, P., Arbuck, S.G., eisenhauer, e.A., Wanders, J., Kaplan, r.S., rubinstein, l., Verweij, J., Van Glabbeke, M., van oosterom, A.t., christian, M.c., and Gwyther, S.G. (2000) New guidelines to evaluate the response to treatment in solid tumors. J. Natl. Cancer Inst. 92, 205-216.
Thiesse, P., ollivier, l., Di Stefano-louineau, D., Negrier, S., Savary, J., Pignard, K., lasset, c., and escudier, b. (1997) response rate accuracy in oncology trials: reasons for interobserver variability. J. Clin. Oncol. 15, 3507-3514.
Vos, M.J., Hoekstra, o.S., barkhof, F., berkhof, J. Heimans, J.J., van Groeningen, c.J., Vandertop, W.P., Slotman, b.J., and Postma, t.J. (2003) thallium-201 single-photon emission computed tomography as an early predictor of outcome in recurrent glioma. J. Clin. Oncol. 21, 3559-3565.
Warr, D., McKinney, S., and tannock, I. (1984) Influence of measurement error on assessment of response to anticancer chemotherapy: Proposal for new criteria on tumor response. J. Clin. Oncol. 2, 1040-1046.[Abstract]
Warren, K.e., Patronas, K., Aikin, A.A., Albert, P.S., and balis, F.M. (2001) comparison of one-, two-, and three-dimensional measurements of childhood brain tumors. J. Natl. Cancer Inst. 93, 1401-1405
WHo, World Health organization (1979) Handbook for Reporting Results of Cancer Treatment (WHo offset Publication No. 48). Geneva: World Health organization.
This article has been cited by other articles:
![]() |
J.W. Henson, S. Ulmer, and G.J. Harris Brain Tumor Imaging in Clinical Trials AJNR Am. J. Neuroradiol., March 1, 2008; 29(3): 419 - 424. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. U. Lin and E. P. Winer Brain Metastases: The HER2 Paradigm Clin. Cancer Res., March 15, 2007; 13(6): 1648 - 1655. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|