A Breast Cancer Nomogram for Prediction of Non-Sentinel Node Metastasis-Validation of Fourteen Existing Models

Axillary lymph node dissection (ALND), which has long been used to evaluate the status of the axillary nodes, is known to be associated with a high morbidity rate (Kell et al., 2010). Sentinel lymph node biopsy (SLNB) has successfully replaced ALND for evaluation of the axilla in breast cancer patients with clinically negative axilla (Kell et al., 2010). However, ALND performed in SLNBpositive patients demonstrated the absence of metastasis in 22-71% of the non-SLNs (Zhu et al., 2013). To avoid performing ALND for non-SLN-negative patients with SLN-positive axilla, nomograms and scoring systems for predicting the status of the axillary non-SLNs have been developed in many centers (Van Zee et al., 2003; Degnim et al., 2005; Smidt et al., 2005; Chapgar et al., 2006; Pal et al., 2008; Alran et al., 2007; Kohrt et al., 2008; Cho et al., 2008; Coufal et al., 2009; Coutant et al., 2009; Gur et al., 2010; Perhavec et al., 2010; Lombardi, et al., 2011; Mittendorf et al., 2012; Meretoja et al., 2012; Derici et al., 2012). Many studies have tested these models and have obtained varying results (Van Zee et al., 2003; Kocsis et al., 2004; Degnim et al., 2005; Soni et al.,


Introduction
Axillary lymph node dissection (ALND), which has long been used to evaluate the status of the axillary nodes, is known to be associated with a high morbidity rate (Kell et al., 2010).Sentinel lymph node biopsy (SLNB) has successfully replaced ALND for evaluation of the axilla in breast cancer patients with clinically negative axilla (Kell et al., 2010).However, ALND performed in SLNBpositive patients demonstrated the absence of metastasis in 22-71% of the non-SLNs (Zhu et al., 2013).
The characteristics of the breast cancer patients varied across different centers (Pal et al., 2008;Coutant et al., 2009).Therefore, before employing a given model, each center should validate the nomograms or scoring systems and use the best suitable model, or if possible, each center should create their own nomograms.However, the ACOSOG Z0011 study proved that among patients who underwent breast-conserving therapy, locoregional recurrence and survival with and without completion ALND were not significantly different for clinical T1-2 tumors with 1-2 positive SLNs without extranodal extension (Giuliano et al., 2010).For this subgroup of patients, the use of nomograms could be unnecessary, but for patients beyond the scope of the ACOSOG Z0011 study and for patients who are candidates for mastectomy, it may still be important to predict the status of the non-SLNs.To the best of our knowledge, the present study is also the first to validate the Rome and European models other than by using the datasets reported in the original studies (Lombardi et al., 2011;Derici et al., 2012).
Our aims were to investigate the factors that predict non-SLN metastasis, to create a nomogram, and to validate the 14 existing models in our patient group.

Materials and Methods
Two hundred and thirty seven invasive breast cancer patients with positive SLNB who underwent ALND between 2003 and 2012 in the Department of General Surgery at Ondokuz Mayıs University School of Medicine were included in the study.The subjects included in the present study were selected from 739 invasive breast cancer patients with T1-3 tumors and clinically negative axilla who had not received neoadjuvant chemotherapy and who underwent SLNB and breast-conserving surgery or mastectomy.The study design was approved by the Ondokuz Mayıs University Medical Research and Ethical Board.
Patients underwent sentinel lymph node biopsy with 5 mL injectable sterile solutions of 1% isosulphan blue.Patients with SLN metastases in frozen sections underwent immediate ALND, and patients found to have SLN metastasis by routine or serial section H&E later underwent a second surgery for ALND.All sentinel lymph nodes were sent for frozen analysis.If the SLN size was ≤1 cm, it was bisected parallel to the long axis.An imprint was applied for two cut surfaces, which then underwent frozen sectioning.The frozen sections and imprint preparates were stained with H&E and analyzed under a microscope.If the SLN >1 cm, it was cut into slices perpendicular to the long axis at 3 mm intervals.All cut surfaces underwent imprint and frozen analyses.If the SLN contained apparent metastases at the macroscopic evaluation, only the imprint analysis was performed.After the frozen section analysis, the remaining frozen tissue was fixed in formalin and embedded in paraffin for routine pathological examination.The non-SLNs that were ≤1 cm obtained from ALND were bisected parallel to the long axis, and the non-SLNs that were >1 cm were cut into slices perpendicular to the long axis at 3 mm intervals.The evaluation of non-SLNs was usually performed only by H&E, and serial sectioning or immunohistochemistry were not routinely performed.
The factors that were found to be associated with non-SLN metastasis were entered into the logistic regression analysis.Independent predictive factors and predictive probabilities of non-SLN metastasis were determined by backward logistic regression analysis.Receiver operating characteristic (ROC) curves were generated for the calculations of the area under the curve (AUC).We used a logistic regression modeling approach to describe the relationship between several predictive factors (pT, LVI, ENE, negSLN, SLNMS, and multifocality) and the expected (E) value of the dichotomous dependent variable Y (non-SLN metastasis).The formula for the logistic model to describe the probability of occurrence for the outcome of Y is as follows (Kleinbaum et al., 1998) We validated 14 models that had over 100 patients with SLN metastasis.For calculation of the likelihood of non-SLN metastasis according to the MSKCC, Stanford and MD Anderson nomograms for the present series, we used the formulas available for use on the websites of MSKCC (http://nomograms.mskcc.org/Breast/BreastAdditionalNonSLNMetastasesPage.aspx),Stanford University (https://www3-hrpdcc.stanford.edu/nsln-calculator/), and MD Anderson Center (http://www3.mdanderson.org/app/medcalc/bc_nomogram2/index.cfm?pagename=nsln.).For the other models, we used the reported formulas or criteria described in the relevant articles.For the 14 previously reported models and our new Ondokuz Mayıs nomogram, Receiver Operating Characteristic (ROC) curves were generated, and the areas under the curve (AUC) were calculated to assess the discrimination of the models.The AUC varies between 0.5 and 1.0, and a higher value is better.Discrimination which refers to the ability of a model to distinguish patients at high risk for positive non-SLN from low risk patients was quantified by AUC (Hanley and McNeil, 1982).
Scatter plots were generated to assess the agreement of the model-predicted probabilities and were evaluated visually.The calibration (i.e., the ability of a predictive model to match the predicted and observed probabilities) was assessed graphically.The predicted probabilities were categorized into 10 deciles, and the observed percentages of positive non-SLNs for each decile (actual probability) were calculated.Using the actual probability as the Y-axis and the mean predicted probability as the X-axis, the calibration curve was generated.The number and proportion of patients and the false negative rates at various scores or cut-off levels were also calculated for the Ondokuz Mayis nomogram and for each model validated using our patient series.The false negative rates were estimated as the number of patients with non-SLN metastasis by the number of patients at the scores or cutoff values.
The factors for comparison were recorded on a computer using Statistical Program for Social Science (SPSS) version 15.0.The categorical data were expressed as numbers and percentages, and the continuous data were expressed as the mean±standard deviation or median (range).Comparisons of positive non-SLNs with categorical data were performed using chi-square tests, and comparisons with numerical data were performed using Mann-Whitney-U tests.A p<0.05 was accepted as the significance level.Multivariate analysis was performed using logistic regression analysis and odds ratios (OR) for positive non-SLNs, and the 95% confidence intervals (CI) were calculated.The calibration of the Ondokuz Mayis nomogram was evaluated using the Hosmer-Lemeshow goodness-of-fit test and visually by plots (Hosmer and Lemeshow, 2006).
The generated ROC curves for the present series and for the 14 other models are presented in Figures 1 and 2. The calculated AUC value for the Ondokuz Mayis nomogram was 0.871 (CI 0.82-0.91).The AUC values, sensitivity, specificity, false positive (FP) and false negative rates (FN) and the negative and positive predictive values (NPV and PPV) for the Ondokuz Mayis nomogram and for the other 14 models validated using our dataset (and the number of patients and the AUC values reported in the original studies) are presented in Table 3.The calibration plot for the Ondokuz Mayis model is presented in Figure 3.The Hosmer-Lemeshow goodness-of-fit test revealed that the P-value was 0.18, suggesting a good calibration.The agreement of the predicted probabilities of the Ondokuz Mayis model with those of the MD Anderson and European nomograms    To assess the clinical utility of the nomograms, false negative rates estimated at various scores or cut-off levels were evaluated to define a subgroup of patients with a low predicted probability of non-SLN metastasis.Coutant et al. (Coutant et al., 2009) and Mittendorf et al. (Mittendorf et al., 2012) estimated the FN rates among patients with a predictive probability of ≤10 for clinical utility.This is a useful choice for clinical use because the false negative rate of SLN is usually accepted as <10% (Kohrt et al., 2008;Poirier et al., 2008;Coufal et al., 2009;Hidar et al., 2011).However, the predictive probability levels of ≤15% and ≤20% have also been reported to be accepted as a definition of a subgroup with a low predicted probability of non-SLN metastasis to avoid ALND or to be without axillary recurrence in the absence of ALND (Lambert et al., 2006;Ponzone et al., 2007;Zakarai et al., 2008;Hidar et al., 2011).Therefore, we also investigated the FN rates and the proportion of patients with ≤10%, ≤15%, and ≤20% cut-off levels of predicted probabilities obtained from the models (Table 4).While in the Mayo and European models, the FN rates at the ≤10% cut-off level were 25% and 28% and were 33% and 35% at the ≤20% cut-off level, respectively; in the Ondokuz Mayis nomogram, the FN rates at the ≤10% and ≤20% cutoff levels were 0% and 9%, respectively.The median predictive probabilities in the SLN metastasis only group and the non-SLN metastasis group based on the present Ondokuz Mayis model were 20% and 80%, respectively..

Discussion
Our findings demonstrate that the non-SLNs were negative in 49% of patients with positive SLNs.If ALND had not been performed, half of the patients with positive SLNs would not have undergone ALND without any benefit for staging, outcome, or decision-making for adjuvant therapy.Furthermore, these patients would not have been exposed to the potential morbidity of ALND.The main problem is determining which patients with positive SLNs should undergo completion ALND and which should not.To find a solution to this problem, predictive factors for non-SLN metastasis among patients with positive SLN have been investigated, and many nomograms or scoring systems have been developed to predict the likelihood of non-SLN metastasis using predictive factors (Van Zee et al., 2003;Degnim et al., 2005;Chapgar et al., 2006;Pal et al., 2008;Kohrt et al., 2008;Cho et al., 2008;Coufal et al., 2009;Coutant et al., 2009;Gur et al., 2010;Perhavec et al., 2010;Lombardi, et al., 2011;Mittendorf et al., 2012;Meretoja et al., 2012;Derici et al., 2012).
The present Ondokuz Mayis model has an excellent discrimination capacity, with an AUC of 0.87, and to the best of our knowledge, this is the highest AUC among the models tested to date.An AUC of 0.50 indicates no discrimination, 0.70 to 0.80 indicates acceptable discrimination, and 0.81 to 0.90 indicates excellent discrimination (AUC ≥0.90 is rare) (Hosmer and Lemeshow, 2000).Inspection of the calibration curve of our model and the Hosmer-Lemeshow test suggest that our model fits and was well-calibrated, and there were no significant difference between the predicted and the observed probabilities.The definition of a subgroup with a low predicted value in the Ondokuz Mayis model showed that the false negative rates with a predicted probability (cut-off value) of ≤10% and ≤20% were 0% and 9%, respectively.Therefore, 67 (28%) or 36 (15%) patients at the predicted probability of ≤20% or ≤10%, respectively, could have been spared the ALND in our series.The AUC and the calibration quantification are important for the evaluation and validation of the models.However, as suggested by Degnim et al. (Degnim et al., 2005) and Coutant et al. (Coutant et al., 2009), the number of patients and the false negative rate at the selected low predictive probability level (i.e., ≤10%) should also be considered.Degnim et al. (Degnim et al., 2005) stated that although the MSKCC model yielded an excellent AUC of 0.86 with the Michigan dataset, the FN rate at the predicted probability of 5% or less was 14%.
The MSKCC model is the first described nomogram (Van Zee et al., 2003).Following the MSKCC model described by Van Zee et al., many models have been proposed and tested by many studies (Van Zee et al., 2003;Kocsis et al., 2004;Degnim et al., 2005;Smidt et al., 2005;Soni et al., 2005;Lambert et al., 2006;Alran et al., 2007;Ponzone et al., 2007;Cho et al,. 2008;Klar et al., 2008;Kohrt et al., 2008;Pal et al., 2008;Poirier et al., 2008;Coufal et al., 2009;Coutant et el., 2009;Coutant et al., 2009;Scow et al., 2009;Gur et al., 2010;Moghaddam et al., 2010;Lombardi et al., 2011;Hessman et al., 2011;Hidar et al., 2011;Tan et al., 2011;Chen et al., 2012;Derici et al., 2012;Zhu et al., 2012;Sasada et al., 2013).The last proposed international model from multiple centers in Europe, however, has not been validated to date.To the best of our knowledge, the present study is the first to validate the Rome and European models (Lombardi et al., 2011;Meretoja et al., 2012).Applying the MD Anderson model to our dataset demonstrated that it was the most suitable model for our patients, generating an AUC of 0.86, which is the second highest AUC after the present model.Furthermore, the scatterplot of the predicted probabilities generated by the MD Anderson and our model displayed a high agreement.The false negative rates at the predicted probability of 10% or less and 20% or less were 0% for the MD Anderson model.However, the number of patients at these probabilities were as low as 4% and 15%, respectively.
The most unsuitable models for our dataset were the Mayo, Louisville, Rome, and European models, which yielded AUC values of 0.73, 0.70, 0.75, and 0.75, respectively.The scatter plots of the predicted probabilities of the Ondokuz Mayis and these 4 models showed a low agreement with a wide scatter around the line of agreement.Looking at the scatterplots, Degnim et al. (Degnim et al., 2005) reported that the MSKCC and Mayo models did not exhibit high levels of agreement for individual patients based on the Mayo dataset.The MSKCC nomogram, which was validated by the present dataset, showed a high AUC of 0.84, and the scatter plots of the predicted probabilities generated by the Ondokuz Mayis and MSKCC models displayed a modest agreement.The false negative rate at apredicted probability of 10% or less was 0% for the MSKCC nomogram, but the number of patients (4%) was very low at this level.
The selection of patients with a low predicted probability level of ≤20 with a false negative rate of <10% could spare ALND for an important subset of patients with SLN metastasis.The mean predicted probability for patients without non-SLN metastasis in our model was 28%.Could the selection of the higher level of predicted probability increase the risk of local recurrence?Zakaria et al. (Zakaria et al., 2008) reported that a group of patients with SLN metastasis with a mean MSKCC predicted probability of non-SLN metastasis of 20% who did not undergo ALND were found to be without any axillary lymph node recurrences with a mean follow-up 30 months.Morrow et al. (Morrow et al., 2009) also stated that omitting ALND for patients with less than a 30% risk of additional nodal involvement would likely not lead to axillary recurrence resulting in breast cancer-related death.
The European model based on our dataset yielded a moderate AUC of 0.71, which is the same as in the original series, and the scatter plot showed that the agreement with our model was low.Moreover, when the European model was applied to our dataset, the false negative rate with a predicted probability of 10% or less was 28%.These findings demonstrate that the European model (EM) was not suitable for our patients.The EM includes SLN size as ITC, micro-and macrometastasis, and does not include the largest SLN metastasis size in mm.In addition, the EM included a factor called "prevalence of non-SLN metastases", which was not included in our model or in the other models.
To date, the MSKCC nomogram has been the most tested and validated model, and the range of AUCs in different datasets have been reported as 0.58 to 0.86 (Van Zee et al., 2003;Kocsis et al., 2004;Degnim et al., 2005;Soni et al., 2005;Smidt et al., 2005;Lambert et al., 2006;Alran et al., 2007;Ponzone et al., 2007;Cho et al,. 2008;Klar et al., 2008;Kohrt et al., 2008;Pal et al., 2008;Poirier et al., 2008;Coufal et al., 2009;Coutant et el., 2009;Scow et al., 2009;Gur et al., 2010;Moghaddam et al., 2010;Hessman et al., 2011;Hidar et al., 2011;Lombardi et al., 2011;Tan et al., 2011;Chen et al., 2012;Derici et al., 2012;Zhu et al., 2012;Sasada et al., 2013).Among the studies validating MSKCC model, our dataset generated the second highest AUC of 0.85, following the AUC of 0.86 yielded by the Michigan dataset.Kocsis et al. (Kocsis et al., 2004) reported that the MSKCC nomogram could not have been validated in their dataset.Pal et al. (Pal et al., 2008), Coufal et al. (Coufal et al., 2009), Moghaddam et al. (Moghaddam et al., 2010), Tan et al. (Tan et al., 2011), and Chen et al. (Chen et al., 2012)  All models were validated by our dataset with an AUC of >0.70.This means that all of the models are of clinical utility for our patients if AUC is considered the only criteria.The MSKCC, Stanford, MOU (Masaryk), Turkish, Ljubljana, MD Anderson, and DEU models exhibited AUCs of >0.80 and had excellent discrimination in our dataset.Considering the excellent AUC values, the agreement of predictive probabilities by scatterplot, and the number of patients and low false-negative rates with a selected predictive probability, we concluded that the MD Anderson model is the most suitable and could be used for our patients.However, our Ondokuz Mayis model yielded the highest AUC among all models.The calibration was good, and the model had 0% and 9% false negative rates at probabilities of ≤10 for 36 (15%) and ≤20% for 67 (28%) patients in our patient series.The application of the present Ondokuz Mayis model for our patients could spare ALND for 36 (15%) or 67 (28%) patients depending on the choice of predicted probability by the patient and the surgeon.Mittendorf et al. (2012) reported that only 4 (4%) of 101 patients with a predicted probability of 10% or less had a positive non-SLN in the MD Anderson model, whereas applying MD Anderson model to our dataset showed that there were only 10 (4%) patients with a 0% false negative rate for the same predicted probability.Barranger et al. (2008) reported that the chances of having negative non-SLNs were 97.3% in patients with a score of 3.5 or less (median score) in the Tenon model.The application of the Tenon scoring system in our series showed that the probability of having negative non-SLNs with a score of 3.5 or less was 92%.However, the median Tenon score was 6 in our series.There were 155 (65%) patients with median scores of 6 or less in our series with the application of Tenon scoring, and the false negative rate was 34%.These findings demonstrate that the number of patients and the false negative rates for a predicted probability in the models are different for different datasets.
In the present study, the Ondokuz Mayis nomogram median cut-off value (predicted probability) was 46%.For patients in our series, the chance of having negative non-SLNs with a median cut-off value of 46% or less was 81%.With a value of 20% or less, this chance was 91%, and with a 10% cut-off value, the chance was 100%.The number of patients in the 20% or less or the 10% or less categories in our series were 67 (28%) and 36 (15%) of patients, respectively.To make a decision about whether to avoid or perform ALND, along with the false negative rates (i.e., the chance of having positive non-SLNs), the chance of having negative non-SLNs (i.e., the false positive rate) for a given predicted probability should also be considered (Hidar et al., 2011).
The present nomogram is based on a relatively small sample size and has not been validated by another series.Almost all patients underwent simultaneous lumpectomy or mastectomy and frozen section SLN analysis in our clinic.Therefore, some predictive factors were not available before the SLN analysis, and a second surgery for ALND may be required if the present nomogram was used.
The present Ondokuz Mayis model has an excellent discrimination capacity to distinguish patients at low risk for positive non-SLN from high risk patients, with an AUC of 0.87, and to the best of our knowledge, this is the highest AUC among the models tested to date.Our model also fits and was well-calibrated.Our findings showed that the false negative rates with a predicted probability of ≤10% and ≤20% were 0% and 9%, respectively.Therefore, 67 (28%) or 36 (15%) patients at the predicted probability of ≤20% or ≤10%, respectively, could have been spared the ALND in our series.
Nomograms, which are methods to predict the possibility of non-SLN metastasis, do not yet have the ability to replace ALND.They are increasingly being used by many surgeons (Park et al., 2007).As Scow et al. (2009) noted, models always perform best in the population on which they are based.Thus, all nomograms may not have utility for all patient populations.Every clinic should validate a model before using it, or in the best case, every clinic should create a nomogram, analyze it and consider the characteristics of the nomogram in making decisions about omitting or performing ALND.The characteristics of the nomogram could also be shared with patients and their families during counseling before surgery.The present nomogram to calculate the likelihood of non-SLN metastasis for SLN-positive breast cancer patients is available at http://tip.omu.edu.tr/bc_nomogram.ondokuzmayis/

Table 4 . Number of Patients and False Negative Rates at Various Scores or Cut-off values by Models for 237 Patients with Positive SLNs, and Ondokuz Mayis Nomogram
No. of patients with equal or lower than scores or predicted probabilities (cut-off values).FNR: false negative rates; **Estimated by the patients with non-SLN metastasis among No. of the patients were assessed by Scatter plots (data not shown).The MD Anderson model was found to have high agreement with the Ondokuz Mayis model, whereas the MSKCC, Turkish, MOU (Masaryk), Ljubljana, and DEU models showed modest agreement.The Louisville, Tenon, SNUH (Seoul), European, Mayo, Rome, Cambridge, and Stanford models were found to exhibit low agreement, with wide scatter around the line of agreement.Our findings demonstrated that the MSKCC, Stanford, Turkish, MOU (Masaryk), Ljubljana, MD Anderson, and DEU models can effectively predict non-SLN metastasis. * DOI:http://dx.doi.org/10.7314/APJCP.2014.15.3.1481Nomogram for Non-Sentinel Node Metastasis in Breast Cancer Cases

Table 3 . AUC values, Sensitivity, Specivity, False Negativity and Positivity, NPV and PPV for the Ondokuz Mayis Nomogram and the 14 Other Models Validated in our Series, and the Number of Patients and AUC values by the Original Series
* 95% confidence interval (CI) was not reported.NPV: negative predictive value, PPV: positive predictive value have also reported AUCs of <0.70, thus not validating the MSKCC nomogram.The Stanford nomogram, which was the second most validated nomogram with a range of AUCs of 0.65-0.76,has also been validated by our patient series with an AUC of 0.82, which is the highest AUC among the studies validating the Stanford nomogram.Of the 13 studies, 8 validated the Stanford nomogram.The present study was the first to validate the Rome and European nomograms with AUC values of 0.71 and 0.75, respectively.