Exploring Factors Related to Metastasis Free Survival in Breast Cancer Patients Using Bayesian Cure Models

Cancer is one of the most important life-threatening diseases worldwide and is considered as the first and second leading cause of death in developed and developing countries, respectively (Jemal et al., 2011). Among all cancer sites, breast cancer is the most frequently diagnosed cancer with 23% out of the total new cancer cases and 14% of the total cancer deaths in females in 2008, globally. It is the most common cancer in western countries and the second most common cause of cancer death in US women (Foster et al., 2011). Similar status is present in western Asia; breast cancer comprises 27.2% of all new cancer cases and 19% of all deaths due to cancer in females and it is the first site and cause of death among all types of cancer in 2008 (American Cancer Society, 2011). Breast cancer has increasing trend of incidence especially in countries with a low rate of incidence (Montazeri et al., 2008). Among Iranian women this trend was increasing from 1965 to 2000, changing the


Introduction
Cancer is one of the most important life-threatening diseases worldwide and is considered as the first and second leading cause of death in developed and developing countries, respectively (Jemal et al., 2011). Among all cancer sites, breast cancer is the most frequently diagnosed cancer with 23% out of the total new cancer cases and 14% of the total cancer deaths in females in 2008, globally. It is the most common cancer in western countries and the second most common cause of cancer death in US women (Foster et al., 2011). Similar status is present in western Asia; breast cancer comprises 27.2% of all new cancer cases and 19% of all deaths due to cancer in females and it is the first site and cause of death among all types of cancer in 2008 (American Cancer Society, 2011).
Breast cancer has increasing trend of incidence especially in countries with a low rate of incidence (Montazeri et al., 2008). Among Iranian women this trend was increasing from 1965 to 2000, changing the

Exploring Factors Related to Metastasis Free Survival in Breast Cancer Patients Using Bayesian Cure Models
Tohid Jafari-Koshki 1,2 , Marjan Mansourian 2,3 , Fariborz Mokarian 4 rank of incidence from the second to the first diagnosed cancer during these years, and it is currently the fifth most common cause of cancer death in women (Lamyian et al., 2007;Mousavi et al., 2007;Babu et al., 2011;Noroozi et al., 2011;Taghavi et al., 2012).
Metastatic also known as secondary or advanced breast cancer occurs when breast cancer cells spread from the primary tumor in the breast through the lymphatic or blood system to other parts of the body. Breast cancer spreads to a variety of organs depending on the type of breast cancer (Senkus et al., 2013). These affected sites include bone, lung, liver, brain, skin, and other organs (Weigelt et al., 2005). Due to poor survival and the low quality of life for the patients with secondary breast metastasis, it is of interest to define relevant prognostic factors to help the surgeons and clinicians in disease management. Several studies have used different statistical methods to determine the prognostic factors of secondary breast metastasis. According to the literature, estrogen receptor (ER), progesterone receptor (PR), lymph node involvement ratio (LNR), cathepsin-D (Cath-D), tumor protein 53 (p53) , human epidermal growth factor receptor type 2 (Her2 also known as ERBB2 or HER-2/neu), tumor size, age at diagnosis, chemotherapy, and axillary nodal status at the time of diagnosis of the primary tumor are the most important prognostic factors with different impact on progression and metastases of breast cancer to various organs (Rochefort et al., 1990;Veronesi et al., 1993;Garcia et al., 1996;Inoue et al., 1998;Balleine et al., 1999;Rochefort and Liaudent-coopman, 1999;Selzner et al., 2000;Evans et al., 2004;Colleoni et al., 2005;Truong et al., 2005;Voogd et al., 2005;Weigelt et al., 2005;Palmieria et al., 2006Palmieria et al., , 2007Altundag et al., 2007;Sezgin et al., 2007;Wadasadawala et al., 2007;Gao et al., 2009;Koizumi et al., 2010;Ruiterkamp et al., 2011;Senkus et al., 2013).
Globally, one-third of patients diagnosed with primary breast cancer will continue to develop metastatic breast cancer (Mayer et al., 2010). 3-10% of patients have distant metastases and, in general, there is no cure for these patients and all treatments are palliative in nature (Ruiterkamp et al., 2011). These rates are higher for developing countries where the first diagnosis is at later stages. Half a million deaths each year are attributable to metastatic breast cancer, and the median survival time from diagnosis of secondary disease is approximately 3 years (Johnston, 2010). Time to metastasis from breast to other sites is short with poor prognosis (Eichler et al., 2008;Oltean et al., 2009;Ruiterkamp et al., 2011). Due to high burden of secondary metastases of breast, new prognostic markers are urgently needed to identify patients who are at the highest risk for developing metastases, which might enable oncologists to begin individualized treatments (Weigelt et al., 2005;Foster et al., 2011).
When the outcome of interest is the time to an event, such as time for primary breast cancer to secondary metastasis, survival analysis plays a crucial role in the modeling of the process and finding potential factors important in duration between occurrences of initiating and terminating events. Survival analysis is one of the most appealing techniques used in modeling time to metastasis in breast cancer.
Due to flexibility in allowing for censoring, Cox proportional hazards model and parametric survival models have received a great attention in the study of time to an event such as metastasis in the literature amongst the other methods such as life tables and log-rank test (Veronesi et al., 1993;Andre et al., 2004;Voogd et al., 2005;Eichler et al., 2008;Largillier et al., 2008). These models suppose that all individuals who enter the study are at risk and capable of experiencing the event until they achieve it or are censored. This assumption does not hold for some diseases such as cancer, however. In such cases, it is more rational to consider a situation in which some patients will be cured by initial therapy and treatments and are not any more at risk of events following primary cancer such as recurrence, metastasis, or death due to cancer. In such cases, cure models that allow for this possibility are popular and widely used. Sposto (2002) used cure models to study data from children's cancer group. Cure models have been widely used in the literature. Santen et al. (2008) used Bayesian cure models to overcome the problem of lack of statistical sensitivity to detect actually significant efficacy of antidepressant drugs. Basu and Tiwari used cure models in conjunction with competing risks models to study breast cancer data (Basu and Tiwari, 2010). Rama et al. (2010) compared performance of various cure models in breast cancer data.
Studies with a long term of follow up and a plateau in their Kaplan-Meier (K-M) curves are suitable for cure rate models, as this long tail indicates that a proportion of population has a long survivorship that may correspond to cured individuals. This situation is common in oncology studies and Othus et al. (2012) discuss the preference of cure models over Cox model for analyzing such data (Yin and Ibrahim, 2005;Othus et al., 2012b).
No one can be cured of death, therefore term "long survivors" is sometimes used instead of "cured patients" as this long survivorship causes a K-M curve with a plateau tail (Othus et al., 2012a).
Since metastatic breast cancer is considered to be incurable, exploring the factors associated with the long term from primary to metastatic breast cancer would be of great value for clinicians and oncology researchers. Most breast cancer studies have focused on the prognostic factors of survival from diagnosis or metastasis time to death (Yamasaki et al., 1992;Selzner et al., 2000;Nieder et al., 2008;Morris et al., 2012;Minisini et al., 2013). Risk factors of metastasis are routinely determined by using log-rank test or stratified analysis and survival analysis is used for modeling the overall survivorship. However, studies that approach the problem in a survival analytic framework are not frequent.
To overcome the drawbacks of Cox and parametric survival models, a plausible situation in present study, and to settle issues related to small sample size relative to complexity of the model, we used non-mixture cure models via Bayesian approach to assess the prognostic factors of metastasis free survival of primary breast cancer patients registered in a 12-year cohort.

Patients
In this study we used a 12-year cohort registry database on 1085 women diagnosed with breast cancer in Isfahan Sayed-o-Shohada research center. History of therapy and clinical conditions of patients was recorded until death or lost to follow up. Demographic variables and tumor characteristics have been recorded by interview and reported pathology results. Time to metastasis was based on the physicians' opinion. From potential risk factors of metastasis, we used information available on variables: age at diagnosis of breast cancer, primary tumor size, lymph node involvement ratio (LNR) defined as ratio of positive to dissected lymph nodes, and binary covariates of being estrogen receptor positive (ER+), being progesterone receptor positive (PR+), being epidermal growth factor receptor-2 positive (Her2+), and being cathepsin-D positive (Cath-D+) patient. 88 subjects were excluded from analysis because of

Methods
Mixture and non-mixture models are two major approaches used in cure rate models. Mixture cure models, so-called standard cure rate models, treat the population of patients as a mixture of cured and non-cured groups. Thus the model is defined in two stages. The first models the probability of belonging to each group and the second models the survival in uncured patients: pop is the survivor function for entire population and S* is the survivor function for uncured group and p is the cured fraction that could be linked to covariates, c. Here, various links are possible for p such as popular link logit and S* can take a variety of survival distribution. Even though this model is intuitively appealing and widely used, violation of desirable structure of proportional hazards and improper posteriors are major drawbacks of these models (Ibrahim et al., 2001). Chen et al. (1999) proposed non-mixture models that approach the problem in a very different way and have a clinical justification. This model, also called parametric cure rate models, assumes a number of metastasiscomponent tumor cells for each cancerous patient. After initial treatments, these cells are completely deleted in cured patients. However, some of these cells are left in the body of the other patients and by passing the time cause the metastasis in these patients of uncured group (Ibrahim et al., 2001). It is obvious that metastasis free survival is affected by both the number of left over cells and their activity. In this setting, metastasis free survival of a patient is determined in terms of the time to detecting the first sign of metastasis produced by metastasis-component tumor cells. The number of cells, N, is assumed to follow a Poisson distribution with mean q. Conditional on N, by assuming a distribution F for latent variable z of time to promotion of a metastasis-component cell, survival function can be written as

pop (t)= e -qF(t) .
Various forms of F can be used to model the time from entering study to detecting metastasis.
It can be shown that following relationship exists between S (1) pop and S (2) pop .

S (2) pop (t)= e -q (1-e -q )S (1) pop (t).
Hence, non-mixture model has a form of mixture ones. There is a major difference between these approaches, however. When covariates c are introduced to models via q, proportionality holds for non-mixture model of S (2) pop only (Ibrahim et al., 2001). Unlike non-mixture models, odds ratio (OR), rather than hazard ratio (HR), is used in mixture models to assess the impact of a covariate on cure fractions. In both cases, impact of a covariate on survival is assessed in HR scale (Othus et al., 2012a). The cured fraction of the population is obtained by e -q , where q is regressed on covariates and hence reflects their overall estimated impact on this fraction.
There is another possibility in non-mixture models where mean of latent Poisson process q is regressed on covariates of each individual via a link such as q=exp(c'b). In this framework, there is an opportunity to assess two different types of covariate impact on survival and being cured (Ibrahim et al., 2001). In this format, a covariate can be assessed in terms of being effective or not both in survival and being cured. However, a statistical issue of identifiability may be questionable in these models as susceptibility responses of metastatic-component cell counts are unobservable themselves (Congdon, 2010).
In this study, we used both standard parametric survival models and non-mixture cure rate model with two widely used and flexible distributions of log-logistic (k, l) and Weibull (k, l) in a Bayesian setting to model the time from diagnosis of breast cancer to onset of metastasis symptoms. In both cases, l is regressed on covariates as l=exp(c'b) wher e positive values of each b indicates larger hazad rates and adverse effect on survival for larger values of corresponding covariate. k is the shape parameter and not considered to be dependent on covariates. Credible Interval (CrI) is a Bayesian equivalent to ordinary confidence interval and was used for hypothesis tests about parameters.
Also we compared the results by alternative frailty models (Congdon, 2010). These models allow for unobservable or unmeasured individual specific characteristics that may influence the survival function and cure fraction. The results were compared using deviance information criterion (DIC), a smaller value of which indicates better fit and performance.
Before running the models, we scaled the metastasis free survival time to be in years. Also we standardized tumor size and age at diagnosis to avoid convergence problems. We let N(0, 100) priors for regression coefficients b and Gamma (1, 0.01) for priors of both k and q. All Bayesian models were implemented in open-access software OpenBUGS 3.2.2 (Lunn et al., 2009). DIC, descriptive statistics, and test results were obtained using free software R available on CRAN at www.r-project.org. Test results were considered to be significant at 5% level.

Results
Out of 245 patients, 62 (25.3%) cases have experienced metastasis during 12 years of follow up and the remaining was censored. The mean age at diagnosis was  Based on chi-square test results, proportions of ER+, Her2+, and Cat-D+ patients were not different in metastasis and metastasis free patients (pvalue>0.05). But there was a mild relationship between PR+ and metastasis (pvalue=0.055). Also, the age mean was not different for two groups.
Kaplan-Meier survival curve levels off within the study period and has plateau at its tail as shown in figure  1 and justifies the relevance of cure models. This figure suggests that the period of follow up is adequate to apply cure models (Yu et al., 2013). Table 1 shows the posterior estimates and related 95% CrI for standard log-logistic and Weibull models.
Since credible intervals for LNR and PR+ do not include zero, it can be inferred that these covariates are significant in both models. The other covariates do not have significant effects on metastasis free survival. DIC is smaller for log-logistic model that indicates it has fit the data better than Weibull. Even though the coefficients of both LNR and PR+ for this model are greater than that of Weibull, no changes are present in the significance of the covariates.
LNR has direct and PR+ has reverse effect on hazard rate. That is lymph node involvement and being PR+ breast cancer have adverse and beneficial effects on long term of metastasis free survival, respectively. Table 2 summarizes the results for cure models. Both models again suggest adverse effect of lymph node involvement on metastasis free survival. Effect of being PR+ is near significant. Posterior probability (PP), a p-value counterpart in Bayesian analysis, was 0.97 for this coefficient. PP values greater than 0.90 is considered to be significant and it can be inferred that PR+ status is positively related to longer metastasis free survival.
Cured fraction in current study implies a proportion of patients who have escaped from metastasis of breast cancer to other their organs and are expected to have a longer life and of higher quality. This fraction is estimated to be exp (-0.72)=0.48 and exp (-0.70)=0.49 for loglogistic and Weibull models, respectively.
Cure model with Weibull distribution has the smallest DIC and fits the data best. All CrI's are narrower for this model than those of log-logistic. However, this model has no additional significant covariate.
We repeated the aforementioned non-mixture models and this time let the parameter be regressed on the covariates. This will enable us to reach a proportional hazard structure and explain the impact of the covariates in terms of hazard ratios. But this model did not converge due to small sample size relative to complexity of the model. Also we tried both models in the frailty framework and let to depend on unobserved individual specific covariates by adding a normally distributed N(0s 2 ) noise term to c b. The result is shown in table 3.
The results are essentially same as previous models. Small estimated value of s 2 suggests that adding the noise term was not successful in improving the model and no significant changes were present.

Discussion
Our results are in accord with previous studies. Balleine     (1999) reported that the absence of PR expression in primary breast cancer is associated with disease progression and metastasis to other sites. Koscielny et al. (1984) found a direct and indirect relationship between lymph node involvement and metastatic dissemination probability of breast cancer via the primary tumor size. Carter et al. reported that lymph node status serves as an indicator of the tumor's ability to spread (Carter et al., 1989). There is a common belief that lymph node involvement is the best marker to recognize metastatic potential of breast cancer as the 5-year survival rate decreases by approximately 40% in the presence of lymph node involvement (van der Wal et al., 2002;Weigelt et al., 2005). The cut off value of 25% for this factor has been reported to be a useful in determining high risk groups of patients (Truong et al., 2005). The results provide a guide for clinicians in the prognosis and assessing progression of the disease. The larger the proportion of dissected lymph nodes is involved, the shorter metastasis free survival is expected. Also a patient of PR+ at the time of diagnosis is expected to have a longer time to experience a metastasis and therefore is expected to have a higher chance of longer and healthier life. Cure models are extensively used in survival analysis and various models and approaches have been applied in this framework. It has been shown that performance these models is as well as Cox and parametric survival models and is better under certain conditions (Sposto, 2002;Yu et al., 2013). These models are appropriate for studies with adequate follow up term and especially in cases that proportional hazards (PH) assumption is not met. Even being sound, these models are not recommended when survival is relatively poor or relatively good (Yu et al., 2013). In these cases, assumed survival distributions such as Weibull fail to capture survival shape appropriately and fit poorly or do not converge.
K-M curve levels off approximately before 75% of the study period is passed. Hence adequacy of follow up term necessary for metastasis is settled. Supremum tests revealed that proportional hazards assumption does not hold for lymph node involvement ratio (p-values=0.03). Also, metastasis free survival of patients in the study was not shifted to any side. These considerations make cure models suitable for the analyzing these data.
Large sample size is a desirable property for cure models and can reduce the problems related to convergence and fit of the model. Despite rather small sample size, current study approaches analysis of metastasis free survival from a more realistic perspective and conducting studies in larger populations as well as exploring prognostic value of covariates in metastasis to each site separately is recommended.