Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach.

We used to LASSO-Cox method for determining prognostic factors of male breast cancer survival and showed the superiority of this method compared to Cox proportional hazard model in low sample size setting. In order to identify and estimate exactly the relative hazard of the most important factors effective for the survival duration of male breast cancer, the LASSO-Cox method has been used. Our data includes the information of male breast cancer patients in Fars province, south of Iran, from 1989 to 2008. Cox proportional hazard and LASSO-Cox models were fitted for 20 classified variables. To reduce the impact of missing data, the multiple imputation method was used 20 times through the Markov chain Mont Carlo method and the results were combined with Rubin's rules. In 50 patients, the age at diagnosis was 59.6 (SD=12.8) years with a minimum of 34 and maximum of 84 years and the mean of survival time was 62 months. Three, 5 and 10 year survival were 92%, 77% and 26%, respectively. Using the LASSO-Cox method led to eliminating 8 low effect variables and also decreased the standard error by 2.5 to 7 times. The relative efficiency of LASSO-Cox method compared with the Cox proportional hazard method was calculated as 22.39. The19 years follow of male breast cancer patients show that the age, having a history of alcohol use, nipple discharge, laterality, histological grade and duration of symptoms were the most important variables that have played an effective role in the patient's survival. In such situations, estimating the coefficients by LASSO-Cox method will be more efficient than the Cox's proportional hazard method.


Introduction
Male breast cancer (MBC) is a rare disease with an incidence rate of less than 1% of the female breast cancer (FBC). In a large population-based study in northern European counties and Singapore, the world standardized incidence rates of breast cancer were 66.7 per 10 5 personyears in women and 0.40 per 10 5 person-years in men. Women were diagnosed earlier than men by about 8 years (Miao et al., 2011). In Iran, the incidence rate of FBC was about 148 per 10 5 (Zare et al., 2013) and the mean age was about 48 years (Zare et al., 2012(Zare et al., , 2013 while in men the mean age was about 60 years (Salehi et al., 2011). Some demographic and clinico-pathologic prognostic factors such as age at diagnosis, tumor grade and lymph node status have been shown to be associated with overall survival (Miao et al., 2011;Salehi et al., 2011;Zare et al., 2012Zare et al., , 2013Soliman et al., 2014).
Because of the rarity of this disease, the management of MBC patients is generalized from FBC which suffer from lack of evidence -based data to support this female to male extrapolation. The smallness of the existing sample size in MBS studies (Egypt, Turkish, Iran) resulting from

Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach
Hadi Raeisi Shahraki 1 , Alireza Salehi 2 , Najaf Zare 3 * the scarcity of male breast cancer on the one hand, and the high number of the independent variables that can potentially influence the patients' survival on the other hand, has been challenging and resulted in imprecise findings (Gui and Li, 2005).
In the studies where the dependency of survival time with regard to the independent variables is desirable, the Cox proportional hazard model is used to estimate the survival time. However, in the settings where the number of independent variables is high and sample size is low, analysis of the survival data is faced with a serious challenge. Problems such as multicollinearity, reduction in estimation precision, lack of a sparse model, and noninterpretability of the coefficients obtained from the Cox proportional hazard method have made this method an inefficient and invalid one in dealing with such data (Cox, 1972;Tibshirani, 1997). Even the numerous techniques of variable selection in regression which have also been generalized in Cox method aiming to keep all the variables in the model become inefficient (Gui and Li, 2005).
In 1997, in order to choose the most important variables under the Cox proportional hazards model by adding a penalized function to the estimation of the partial maximum likelihood, Tibshirani introduced his method titled LASSO (Least Absolute Shrinkage and Selection Operator)-. Through placing a constraint on the absolute value of the regression coefficients, this penalized function causes many of the coefficients to get smaller and also some of the coefficients to become exactly zero. By omitting additional and redundant variables and making a brief bias in the model if necessary, LASSO-Cox method controls multicollinearity and is also simply applicable even in the settings where the number of variables is higher than the sample size (Tibshirani, 1996;Tibshirani, 1997).
In this research, we aimed to identify the most probable prognostic factors effective on survival of male breast cancer through the method of LASSO-Cox.

Materials and Methods
In this study, the data were obtained from the cancer registry of Vice-Chancellor for Health Affairs of Shiraz University of Medical Sciences and Shiraz hospitals during January 1, 1989 and January 1, 2008. During the study period, 63 histological proven MBCs were identified. The attainable probable prognostic factors were as follow: age at diagnosis, residence, history of alcohol use, nipple discharge, nipple ulceration, nipple retraction, skin fixation, skin redness, laterality, location of tumor, tumor size, axillary lymph node involvement, chest wall invasion, duration of symptoms, staging, and grading.
The information about the patients' survival was obtained from the Death Registry of Vice-Chancellor for Health Affairs of Shiraz University of Medical Sciences and telephone contacts were made to complete the information. After excluding the individuals for whom the survival time had not been recorded, the number of patients reduced from 63 to 50. Appropriate classification was done for all variables and also dummy variables were used to represent the data as zero and one.
A multiple Cox proportional hazards model was used to develop a predictive model of overall and disease-free survival, based on demographic and clinical covariates. The model has the following form h(t|x)= h 0 (t)exp{β T X}, Where X=( X 1 , X 2 , …, X p ) are covariates, h(t|x) is the hazard at time t, h0(t) is the unspecified baseline hazard function, and β = (β 1 , …, β p ) is the vector of regression coefficients (Cox, 1972).
Due to the noted shortcomings of the Cox proportional hazard model and the high correlation between some covariates, the variable selection was done by incorporating a LASSO (least absolute shrinkage and selection operator), or L1 penalty, on the regression coefficients β 1 , …, β p . The LASSO penalizes the size of the parameter vector, β, so that unimportant variables (variables whose β coefficients are close to zero) are removed from the model. This results in a penalized log partial likelihood function of the form l(β) -∑ j=1 λ|β j | p , where l(β) denotes the standard Cox log partial likelihood. The maximum likelihood estimates β are those which maximize this penalized likelihood. The parameter λ is the shrinkage parameter and determines the extent of variable selection, with larger values corresponding to a larger penalty and a greater number of variables removed. The optimal value for λ was determined using 5-fold cross-validation.
A multiple Cox proportional hazards model was used to develop a predictive model of overall and disease-free survival, based on demographic and clinical covariates. The model has the following form h(t|x)= h 0 (t)exp{β T X}, where X=(X 1 , X 2 , …, X p ) are covariates, h(t|x) is the hazard at time t, h 0 (t) is the unspecified baseline hazard function, and β = (β 1 , …, β p ) is the vector of regression coefficients (Cox, 1972).
Cox proportional hazard model is expressed as h(t|x)= h 0 (t) exp {β T X}, where h 0 (t) is a baseline hazard function, X=(X 1 , X 2 , …, X p ) is the vector of independent variables, and β=(β 1 ,β 2 , …, β p ) T is the vector of regression coefficients. Cox partial likelihood function is defined as follows: where D is the risk set of the events. The estimated coefficients of Cox's method is the value that maximize the above function (Cox, 1972).
LASSO that was first made in linear regression and then generalized to logistic and Cox regression, both estimates and selects the variables simultaneously by using a penalized function. LASSO-Cox regression coefficients are obtained through solving the equation: where t is a positive constant (Tibshirani, 1996;Tibshirani, 1997).
If you look at the variable selection procedure by LASSO method in Bayesian approach, this method uses in fact double exponential distribution (Laplace) as the prior distribution of regression coefficients:  DOI:http://dx.doi.org/10.7314/APJCP.2015.16.15.6773 Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach and tails, and instead decreases the probabilities of the middle points (Tibshirani, 1996).
LASSO-Cox method coefficients can be calculated equivalently from the following equation: , where PL(β) is Cox partial likelihood function, and λ is positive constant value called tuning parameter.
Choosing the optimal λ is very important because the higher this value is, the more coefficients become zero, the model gets more sparse, and also the higher the interpretability will be.
In order to get the optimal value, a variety of cross validation methods like 5-fold, 10-fold and generalized cross-validation can be used. In K-fold cross-validation method, the data is divided into k equal subsets. In each time, one of these subsets is considered as the validation data and the error is calculated by using the obtained estimations for the other subsets (training data). This is repeated so much that each subset is used just once as the validation data. The error mean is considered as c(λ)in all the repetitions, and the value of λ for which c(λ) becomes minimum is chosen as the optimal tuning parameter (Hastie et al., 2009;Goeman, 2010). In generalized cross validation method, the value of λ which minimize the following value is chosen as the optimal λ: Where H=(X T X) -1 X T Y (Craven and Wahba, 1978).
In this research λ values for each data set were obtained separately through generalized cross validation method. Because of having missing information in some variables, multiple imputation was used 20 times in Markov Chain Mont Carlo method, and suitable values were imputed. The advantage of this method is that it allows the researchers to use most of the existing information without violating the validity of the results (Goeman, 2010).
For each one of the 20 obtained sets of data, Cox proportional hazard and LASSO-Cox model was fitted independently and coefficients were reported. Standard error of LASSO-Cox coefficients were obtained through bootstrap method with 1000 time repetitions.
For the variables whose coefficients become zero in 50 or more percentages of the times, zero coefficients were reported and for the other variables the mean of the non-zero coefficients in 20 data sets was reported as the coefficients of that variable. In the remaining variables, the Robin's formula was used in order to report the standard error of the non-zero regression coefficients as follows: In the above formula Q i , U, B and k are the ith regression coefficients, the mean variance of the non-zero coefficients, sample variance of the non-zero coefficients and the number of non-zero regression coefficients, respectively (Rubin, 1977;Rubin, 2009;Goeman, 2010).
The relative efficiency of LASSO-Cox method versus Cox proportional hazard method was calculated by the following formula proposed by Casella and Berger (Casella and Berger, 1990):

Results
Of the 63 patients, 38 were alive during the study, 18 died and 7 had missing history of death. All available data were used in descriptive reports, but for analytical purpose, we used the complete or imputed information of 50 patients.
In 50 patients, the age at diagnosis was 59.6 (SD=12.8) years with a minimum of 34 and maximum of 84 years and the mean of survival time was 62 months. 18 patients (36%) had died and the other (64%) had been censored. Three, 5 and 10 year survival were 92%, 77% and 26%, respectively (Figure1).
The negative logarithm of likelihood function reached its minimum, i.e 140.78 in log λ=-4.02 resulting the optimal λvalue has equaled to 0.018. The same procedure was implemented for the other data sets. The variables under the study, manner of classification, and the results of fitting the two models are shown in Table 1. The results indicate that using LASSO-Cox method led to getting zero 8 variable with low effects, while Cox's method has retained all variables in the model. Besides, standard errors of the regression coefficients in Cox proportional hazard method are 2.5 to 7 as high as their correspondent ones in LASSO-Cox method (Table 1).
In order to do a total comparison between the two methods, the relative efficiency of LASSO-Cox method compared with the Cox proportional hazard method was calculated as 22.39, which is indicative of the of LASSO-Cox method being 22 times more efficient.

Discussion
The 19 year survival of the male breast cancer in Fars province showed that age, a history of alcohol use, nipple discharge, laterality, histological grade and duration of symptoms were the most important variables that can play an effective role in the patient's survival. By omitting the variables being low in effectiveness, LASSO-Cox method could estimate the most important variables much more exactly and efficiently than the Cox proportional hazard method.
Large values of standard errors in Cox proportional hazard method can represent the presence of multicollinearity among the variables which has caused the lack of stability and high variability of regression coefficients from one data set to another. In such circumstances, even the regression coefficients represented by Cox proportional hazard method will not  DOI:http://dx.doi.org/10.7314/APJCP.2015.16.15.6773 Survival Prognostic Factors of Male Breast Cancer in Southern Iran: a LASSO-Cox Regression Approach be reliable.
The coefficients obtained from LASSO-Cox method represent an increase in grade and tumor size as the variables that can cause a decrease in survival time; these results are in the same line with those of most studies on this ground, while they do not agree with a few other studies (Kuroi and Toi, 2003;Fentiman et al., 2006;Salehi et al., 2011).
In this study, nipple discharge and alcohol consumption were introduced as two factors affecting the increase in patient's survival. Regarding the few studies conducted on the effects of these two variables on the survival time, decisive remarks are hard to be made.
As a limitation of this study, we can refer to the lack of considering such variables as marital status, metastasis and undergoing a variety of treatments. Our study is the first one in Islamic Republic of Iran that measures the simultaneous effect of several variables on the survival of cancer patients. While this study has used all the information obtained from Shiraz hospitals as the center of south of Iran, planning multicenter studies in different settings on larger sample sizes seems necessary.