Optimization of Predictors of Ewing Sarcoma Cause-specific Survival : A Population Study

BACKGROUND
This study used receiver operating characteristic curve to analyze Surveillance, Epidemiology and End RESULTS (SEER) Ewing sarcoma (ES) outcome data. The aim of this study was to identify and optimize ES-specific survival prediction models and sources of survival disparities.


MATERIALS AND METHODS
This study analyzed socio-economic, staging and treatment factors available in the SEER database for ES. 1844 patients diagnosed between 1973-2009 were used for this study. For the risk modeling, each factor was fitted by a Generalized Linear Model to predict the outcome (bone and joint specific death, yes/no). The area under the receiver operating characteristic curve (ROC) was computed. Similar strata were combined to construct the most parsimonious models.


RESULTS
The mean follow up time (S.D.) was 74.48 (89.66) months. 36% of the patients were female. The mean (S.D.) age was 18.7 (12) years. The SEER staging has the highest ROC (S.D.) area of 0.616 (0.032) among the factors tested. We simplified the 4-layered risk levels (local, regional, distant, un-staged) to a simpler non-metastatic (I and II) versus metastatic (III) versus un-staged model. The ROC area (S.D.) of the 3-tiered model was 0.612 (0.008). Several other biologic factors were also predictive of ES-specific survival, but not the socio-economic factors tested here.


CONCLUSIONS
ROC analysis measured and optimized the performance of ES survival prediction models. Optimized models will provide a more efficient way to stratify patients for clinical trials.


Introduction
Ewing sarcoma (ES) is the second most common bone sarcoma of children and young adults (Herzog, 2005;Schrager et al., 2011).The ES-specific survival is about 75% overall [ (Schrager et al., 2011) and this study].EW is under active research to identify biological, pathological, and socio-economic barriers to improvement of the clinical outcome (Friedman et al., 2010;Sultan et al., 2010;Worch et al., 2010;2011;Jawad et al., 2011;Mukherjee et al.,

Optimization of Predictors of Ewing Sarcoma Cause-specific Survival: A Population Study
Min Rex Cheung distributing data on cancer, it strives to decrease the burden of cancer.SEER data are used widely as a bench-mark data source for studying ES cancer outcomes in US and in other countries.The extensive ground coverage by the SEER data is ideal for identifying the disparity in oncology outcome and treatment in different geographical and cultural areas.In addition to the biological staging factors and the treatment factors, this database also contains a large number of county level socio-economic factors data.This study aimed to identify barriers to good treatment outcome that may be discernable from a national database.
SEER registry has massive amount of data available for analysis, however, manipulating this data pipeline could be challenging.SEER Clinical Outcome Prediction Expert (SCOPE) is designed and implemented to mine SEER data and construct accurate and efficient prediction models.

Materials and Methods
The data were obtained from SEER 18 database.SEER is a public use database that can be used for analysis with no internal review board approval needed.SEER*Stat (http://seer.cancer.gov/seerstat/) was used for listing the cases.The filter used was: 'Site and Morphology.AYA site recode'='Site and Morphology.ICCC site recode ICD-O-3'='VIII(c) Ewing tumor and related sarcomas of bone'.This study explored a long list of socio-economic, staging and treatment factors that were available in the SEER database.We have designed and implemented SEER Clinical Outcome Prediction Expert (SCOPE) for this purpose (Cheung, 2012).The codes of SCOPE have been posted on Matlab Central (www.mathworks.com).SCOPE has a number of utility programs that are adapted to handle the large SEER data pipeline.All statistics and programming were performed in Matlab.Each risk factor was fitted by a Generalized Linear Model to predict the outcome (SEER cause of death: Bones and Joints).The areas under the receiver operating characteristic curve (ROC) were computed.Similar strata were fused to make more efficient models if the ROC performance did not degrade (Cheung et al., 2001a;2001b).In addition, it also implemented binary fusion and optimization to streamline the risk stratification by combining risk strata when possible.SCOPE uses Monte Carlo sampling and replacement to estimate the modeling errors and allows t-testing of the areas under the ROC.SCOPE provides SEER-adapted programs for user friendly exploratory studies, univariate recoding and parsing.

Results
There were 1844 patients included in this study (Table 1).The mean follow up time (S.D.) was 74.48(89.66)months.36% of the patients were female.The mean (S.D.) age was 18.7 (12) years.The SEER staging has the highest ROC (S.D.) area of 0.616 (0.032) among the factors tested.Based on ROC areas, the other ES specific survival models included age, sex, radiation treatment, surgery (Table 1).County level percent college graduate, family income and rural-metropolitan continuum had ROC areas close to the 0.5 expected for random variables (Table 1).
To correlate with the ROC analysis, the percent ESspecific mortality was computed (Table 2).There were 44 black patients, the ROC area was close to 0.5.However, when the risk of ES death was calculated (Table 2), African American patients had worse outcome.RT was related to worse outcome marginally.Surgery improved the treatment outcome.
SEER Clinical Outcome Prediction Expert (SCOPE) simplified the 4-layered risk levels (local, regional, distant, un-staged) to a simpler non-metastatic (I and II) versus metastatic (III) versus un-staged model.The ROC area (S.D.) of the 3-tiered model was 0.612 (0.008).Several other biologic factors were also predictive of ES-specific survival, but not the socio-economic factors tested here.SCOPE was used to perform ROC curve and area under the curve calculations.In this example, the ROC area of the 3-tiered SEER staging model as computed for 5 random samples (Table 1).The results are shown in the upper panels.In the lower panels, SCOPE simplified the 4-layered risk levels (local, regional, distant, and un-staged) to a simpler non-metastatic (I and II) versus metastatic (III) versus un-staged model.The ROC area (S.D.) of the 4-tiered model was 0.616 based on 5 random samples with replacement from the SEER data.The optimized model had a more efficient 3-layered structure without losing ROC area (Table 1, Figure 1).

Discussion
This study is interested in constructing models that will aid patient and treatment selection for ES cancer patients.To that end, this study examined the ROC models (Hanley and McNeil, 1982) of a long list of potential explanatory factors (Table 1).ROC analysis takes into account both sensitivity and specificity of the prediction.Ideal model would have a ROC area of 1 and a random model is expected to have an area of 0.5 (Hanley and McNeil, 1982).The SEER staging is most predictive of patient outcome (Table 1).After binary fusion, it reduces to non-metastatic versus metastatic classification versus un-staged model (Table 1).When there are competing prediction or prognostic models, the most efficient (i.e. the simplest) model is thought to prevail (D'Amico et al., 1998).This has an information theoretic (D'Amico et al., 1998) under-pinning.However, for practical purposes, simpler models require fewer patients for a randomized trials because fewer risk strata need to be balanced.In the clinic, simpler models are easier to use.SCOPE streamlined ROC models by binary fusion (Table 1).Two adjacent strata were tested iteratively to see if they could be combined without sacrificing the predictive power usually belong to the more complex models.This study has shown that SCOPE can built efficient and accurate prediction models.
For surgery and radiotherapy, the ROC areas were modest 0.57 and a 0.54 respectively.Low ROC areas imply the information content (i.e. the staging accuracy) of the models may be limited.Curiously, radiotherapy was associated with higher risk of ES mortality (Table 2).Although, the overall percentage of patients completed their staging was just over 90% (Table 1).Figure 1 shows the percentage of ES-specific mortality by stage.Un-staged patients had a rate in between the early and metastatic stages of ES.Since the staging is accurate and aids treatment selection, eliminating the 9.5% un-staged rate may better pair patients with their treatments.This in turn may improve the treatment outcome.
In conclusion, this study has identified the staging models that are the most prognostic of treatment outcomes of Ewing sarcoma patients.The high under-staging rates may have prevented patients from selecting definitive local therapy.The poor rates of radiotherapy after surgery use may have contributed to the poor outcome in these patients with this aggressive disease.

Figure 1 .
Figure 1.The Mortality Rate of Ewing Sarcoma Patients by Stage of Original and Optimized Models SEER stage