Surveying and Optimizing the Predictors for Ependymoma Specific Survival using SEER Data

The Surveillance Epidemiology and End Results (SEER) cancer registry data have been extensively used to model outcome prediction models for ependymoma (McGuire et al., 2009) (Rodriguez et al., 2009) (Amirian et al., 2012) (Bishop et al., 2012). These studies are important in identifying disparity in treatment and build models for selecting patients for clinical trials. The cause specific survival rates for both childhood and adult with ependymoma are about 80% ((McGuire et al., 2009) (Rodriguez et al., 2009) (Amirian et al., 2012) (Bishop et al., 2012), and this study). Thus there is room for improvement. This study used receiver operating characteristic curve to analyze SEER ependymoma outcome data. The aim of this study was to identify and optimize predictive ependymoma models to aid treatment and patient selection. This study also examined why some predictive models may not work as expected. Surveillance Epidemiology and End Results (SEER) (http://seer.cancer.gov/) is a public use cancer registry of United States of America (US). SEER is funded by National Cancer Institute and Center for Disease Control to cover 28% of all oncology cases in US. SEER started


Introduction
The Surveillance Epidemiology and End Results (SEER) cancer registry data have been extensively used to model outcome prediction models for ependymoma (McGuire et al., 2009) (Rodriguez et al., 2009) (Amirian et al., 2012) (Bishop et al., 2012). These studies are important in identifying disparity in treatment and build models for selecting patients for clinical trials. The cause specific survival rates for both childhood and adult with ependymoma are about 80% ((McGuire et al., 2009) (Rodriguez et al., 2009) (Amirian et al., 2012) (Bishop et al., 2012), and this study). Thus there is room for improvement. This study used receiver operating characteristic curve to analyze SEER ependymoma outcome data. The aim of this study was to identify and optimize predictive ependymoma models to aid treatment and patient selection. This study also examined why some predictive models may not work as expected.
Surveillance Epidemiology and End Results (SEER) (http://seer.cancer.gov/) is a public use cancer registry of United States of America (US). SEER is funded by National Cancer Institute and Center for Disease Control to cover 28% of all oncology cases in US. SEER started

Surveying and Optimizing the Predictors for Ependymoma Specific Survival using SEER Data
Min Rex Cheung collecting data in 1973 for 7 states and cosmopolitan registries. Its main purpose is through collecting and distributing data on cancer, it strives to decrease the burden of cancer. SEER data are used widely as a benchmark data source for studying cancer outcomes in US and in other countries. The extensive ground coverage by the SEER data is ideal for identifying the disparity in oncology outcome and treatment in different geographical and cultural areas for cancers. In addition to the biological staging factors and the treatment factors, this database also contains a large number of county level socio-economic factors data. This study aimed to identify barriers to good treatment outcome that may be discernable from a national database.

Materials and Methods
SEER registry has massive amount of data available for analysis, however, manipulating this data pipeline could be challenging. SEER Clinical Outcome Prediction Expert (SCOPE) (Cheung, 2012) was used mine SEER data and construct accurate and efficient prediction models. The data were obtained from SEER 18 database. SEER is a public use database that can be used for analysis with no internal review board approval needed. SEER*Stat (http:// seer.cancer.gov/seerstat/) was used for listing the cases. The filter used was: Site and Morphology. ICCC site recode ICD-O-3 = ' III(a) Eppendymomaendymomas and choroid plexus tumor'. This study explored a long list of socio-economic, staging and treatment factors that were available in the SEER database.
The codes of SCOPE are posted on Matlab Central (www.mathworks.com). SCOPE has a number of utility programs that are adapted to handle the large SEER data pipeline. All statistics and programming were performed in Matlab (www.mathworks.com). Each risk factor was fitted by a Generalized Linear Model to predict the outcome (cause of death: brain and other nervous system as coded in SEER). The areas under the receiver operating characteristic curve (ROC) were computed. Similar strata were fused to make more efficient models if the ROC performance did not degrade (Cheung, 2012). In addition, it also implemented binary fusion and optimization to streamline the risk stratification by combining risk strata when possible. SCOPE uses Monte Carlo sampling and replacement to estimate the modeling errors and allows t-testing of the areas under the ROC. SCOPE provides SEER-adapted programs for user friendly exploratory studies, univariate recoding and parsing.

Results
There were 3500 patients included in this study ( Table  1). The follow up (S.D.) was 79.8 (82.3) months. 46.5% of the patients were female. The mean (S.D.) age was 34.4 (22.8) years. There were 60% ependymoma patients listed from SEER database were adults. Patients younger than 20 years old has 27% risk of cause specific death compared with 12.5% for older patients (Table 2). Complete staging was nearly not done as a common practice. There is no significant female to male difference in risk of cause specific death (Table 2). A third of the patients had supratentorial ependymoma ('Brain NOS' and cerebral lobes). They had a 26.8% risk of cause specific death. Most of the tumors were not graded. Unknown grade has a 15% risk of cause specific death compared to 9% for Grade I and II, 36% for grade III and IV. A 5-tiered Grade model (with a ROC area 0.48) was optimized to a 3-tiered model (with a ROC area of 0.53) by SCOPE (Cheung, 2012). This ROC area tied for the second with what was obtained for surgery (Table 1). Among the socioeconomic factors, African American patients had 21.5% risk of death compared with 16.6% of the others. However, this level of difference did not significantly increase the ROC area. For socioeconomic factors, county family income, county education attainment, and rural-urban were not a predictor of poor outcome (Tables 1 and 2). African-American patients had 21.5% risk of death compared with 16.6% of the others (Table 2).
There is about 17% risk of ependymoma death in SEER localized disease ependymoma patients (Table  2). Age older than 20 years old did correlate with higher percentage mortality during this study period from 1973 to 2009 (Table 1 and Table 2). Use of radiotherapy was not associated with a lower risk of cause specific mortality in the overall cohort (Table 2), but surgery was. Specifically, RT was associated with 23% risk of death, and 12.3% risk of death without RT, 61% patients had RT had cerebellar or   (Figure 1). And his drop surprising is mostly associated with the very low usage of RT in the adult population (Figure 2) that may correlate with their poor outcome.

Discussion
This study is interested in constructing models that will aid patient and treatment selection for ependymoma cancer patients. To that end, this study examined the ROC models (Hanley and McNeil, 1982) of a long list of potential explanatory factors (Table 1). ROC models take into account both sensitivity and specificity of the prediction. Ideal model would have a ROC area of 1 and a random model is expected to have an area of 0.5 (Hanley and McNeil, 1982). For example, a clinical ROC model can be used to predict if a patient receiving the recommended treatment will die from the disease. Grade was the most predictive of patient outcome (Table 1). Grade has ROC of 0.61 was higher than the 0.53 of surgery. Thus making grade available in the pathology report may improve patient selection and council.
After binary fusion by SCOPE, the 5 tiered grade was reduced to a 3 tiered grade based on ROC area calculations (Table 1). Unk grade was associated with intermediate risk of cause specific death (Table 2). However, there is no a priori reason to put it between grade I and IV. Thus it was left as a high risk factor. The solution to the uncertainty of placement of these cases is to complete the grading. The binary fusion was performed to demonstrate how a complex predictive model could be numerically optimized to a much simpler model that may also be useful.
When there are competing prediction or prognostic models, the most efficient (i.e. the simplest) model is thought to prevail (D'Amico et al., 1998). This has an information theoretic under-pinning. For practical purposes, simpler models require fewer patients for a randomized trials because fewer risk strata need to be balanced. In the clinic, simpler models are easier to use. SCOPE streamlined ROC models by binary fusion (Table  1). Two adjacent strata were tested iteratively to see if they could be combined without sacrificing the higher predictive power usually belong to the more complex models. This study has shown that SCOPE can built efficient and accurate prediction models.
For radiotherapy, the ROC area of 0.58 was modestly more than 0.5. For a point of reference, using we computed the prostate risk model was 0.75 in its accuracy of predicting biochemical failure (Cheung et al., 2001a, b). Low ROC areas imply the information content (i.e. the staging accuracy) of the models may be limited. It is consistent with the fact that most patients did not have complete grading or staging (Table 2). This is an area for potential improvement.
Ependymoma is an aggressive disease, there was a 17% risk of ependymoma death (Table 2) despite treatments. There was only 30% use of RT (Figure 1) even when the indication for RT was clear as for the cerebellar and spinal cord ependymoma (Koshy et al., 2011). Furthermore, more adult patients than pediatric patients did not get RT ( Figure 2) and therefore did not get the benefit of RT. Thus radiation oncologists should be more attentive in recommending RT for these patients. For the pediatric populations, proton use is expected to improve  the outcome of these patients by primarily decreasing the rate of secondary cancers ( Miralbell et al., 2002;Cohen et al., 2005;DeLaney, 2007;Kuhlthau et al., 2012). The high under-staging rates may have prevented patients from selecting definitive local therapy. The poor rates of radiotherapy after surgery use may have contributed to the poor outcome in these patients with this aggressive disease.