Black hispanic and black nonhispanic breast cancer survival data analysis with half-normal model application

BACKGROUND
Breast cancer is the second leading cause of cancer death for women in the United States. Differences in survival of breast cancer have been noted among racial and ethnic groups, but the reasons for these disparities remain unclear. This study presents the characteristics and the survival curve of two racial and ethnic groups and evaluates the effects of race on survival times by measuring the lifetime data-based half-normal model.


MATERIALS AND METHODS
The distributions among racial and ethnic groups are compared using female breast cancer patients from nine states in the country all taken from the National Cancer Institute's Surveillance, Epidemiology, and End RESULTS cancer registry. The main end points observed are: age at diagnosis, survival time in months, and marital status. The right skewed half-normal statistical probability model is used to show the differences in the survival times between black Hispanic (BH) and black non-Hispanic (BNH) female breast cancer patients. The Kaplan-Meier and Cox proportional hazard ratio are used to estimate and compare the relative risk of death in two minority groups, BH and BNH.


RESULTS
A probability random sample method was used to select representative samples from BNH and BH female breast cancer patients, who were diagnosed during the years of 1973-2009 in the United States. The sample contained 1,000 BNH and 298 BH female breast cancer patients. The median age at diagnosis was 57.75 years among BNH and 54.11 years among BH. The results of the half-normal model showed that the survival times formed positive skewed models with higher variability in BNH compared with BH. The Kaplan-Meir estimate was used to plot the survival curves for cancer patients; this test was positively skewed. The Kaplan-Meier and Cox proportional hazard ratio for survival analysis showed that BNH had a significantly longer survival time as compared to BH which is consistent with the results of the half-normal model.


CONCLUSIONS
The findings with the proposed model strategy will assist in the healthcare field to measure future outcomes for BH and BNH, given their past history and conditions. These findings may provide an enhanced and improved outlook for the diagnosis and treatment of breast cancer patients in the United States.


Introduction
. Projections for new invasive breast cancer cases in the U.S. suggest that in the year 2014, around 232,670 women will be diagnosed with breast cancer, and around 40,000 will die from the disease (NCI, 2014a;NCI, 2014b;Siegel, Ma, Zhou, & Jemal, 2014). In addition, one out of every eight women will be diagnosed with breast cancer in her lifetime (DeSantis et al., 2011).
According to the Surveillance, Epidemiology, and End Results Program (SEER) cancer registry, women have a 12% chance of being diagnosed with breast cancer during their lifetime (SEER, 2014a(SEER, , 2014b. Between the years, 2002 and 2003 there has been a 7% decrease seen in breast cancer incidence among women (Jemal et al., 2006;ACS, 2013a). This decrease was due to a drop in the number of users of postmenopausal hormone therapy (HT) after a study published in 2002 in the Women's Health Initiative (Narod, 2011). The study linked the use of HT to an increased risk of breast cancer and heart disease. Since then, breast cancer incidence has been relatively steady (ACS, 2013a;Narod, 2011).
There are several risk factors associated with increased incidence of breast cancer and this includes genetic factors, socio-economic status, education level, race and ethnicity, and access to care (Liu et al., 2012;Cheung, 2013). Age is one of the most predictive factors of breast cancer; as a woman ages, her risk of being diagnosed with breast cancer also increases (Virnig et al., 2010;Van de Water et al., 2012). Around 66% of invasive breast cancers occur in females over the age of 50, and 12% in women younger than age 45 (ACS, 2013b). In 2011, the highest incidence of breast cancer occurred in women between the ages of 50 and 64 (WHO, 2010). Additionally, delaying treatment may adversely affect patient prognosis. Women have reported several reasons for delaying treatment once a lump has been identified, these reasons included: belief that symptoms were harmless and temporary, use of alternative medicines, carelessness, fear of mastectomy, and fear of social isolation (Memon et al., 2013). However, the same study showed that women did return for treatment if the lump size increased, the symptoms became severe, if they felt concerned and if they were referred (Memon et al., 2013). Women develop breast cancer when the abnormal cellular division in the breast grows out of control. These cancer cells grow to form a large mass or lump known as a tumor, and these tumors are commonly classified as ductal carcinoma in situ (DCIS) or lobular carcinoma in situ (LCIS) depending on the tumor location and characterization (Virnig et al., 2010). The detection of cancerous tumors in the breast can be done through a mammography screening, followed by a confirmatory biopsy. Women age 20 and older are asked to complete Breast Self-Examinations (BSE), those in their 20s and 30s are asked to complete a Clinical Breast Exam (CBE) every three years and to begin mammograms at age 40 (Smith et al., 2011).

Race, ethnicity, and breast cancer
During the years 2006-2010, the average age-adjusted annual incidence rate of breast cancer for white non-Hispanic women living in the United States was 127.3 per 100,000 women per year, followed by 118.4 per 100,000 women per year for black non-Hispanic (BNH) women and 91.1 per 100,000 women per year for Hispanic women, of all races (ACS, 2013a). White non-Hispanic women have the potential for the greatest number of years lost compared to other ethnic groups (Liu et al., 2012). They are also more likely to be diagnosed with breast cancer compared to other ethnic groups (Ooi et al., 2011). Breast cancer, however, is the second leading cause of death among black and white Hispanic women resulting in 17,100 new invasive breast cancer cases and 2,400 breast cancer deaths in 2012 (Siegel, Naishadham, & Jemal, 2011). Using SEER breast cancer data from 2004 to 2009, Blacks were identified as having the shortest survival outcome of most ethnic and racial groups (Cheung, 2013). In addition, BNH women and Hispanic women, of all races, are less likely than white non-Hispanic women to be diagnosed with early stage breast cancer and BNH women tend to have larger tumors (Ooi et al., 2011;ACS, 2013a).
Low socio-economic status and lack of health insurance puts BNH women at risk for a higher mortality rate when diagnosed with breast cancer (Liu et al., 2012). BNH women have a lower survival rate than other racial and ethnic groups in the United States and also have a low chance of recovery due to more aggressive tumors associated with late stage diagnosis (Siegel at al., 2012b(Siegel at al., , 2012aACS, 2013a). BNH women had a 1.5-2.5 increased risk of having stage IV breast cancer and are more likely to die of a cancer specific death compared to white non-Hispanic women (Banegas and Li, 2012;Khan et al., 2014e).
Few research studies exist on cancer related health disparities among Hispanic women of different races (Black and White) (Banegas and Li, 2012;Khan et al., 2014). Hispanic women, of all races, are also more likely to die from breast cancer in comparison to white non-Hispanic women, even when diagnosed at the same stage and age (ACS, 2013a;Siegel at al., 2012b). Hispanic women, of all races, who do not speak English, have lower breast cancer screening rates and poorer prognosis once diagnosed. Hispanic women, of all races, are also more likely to have advanced stage breast cancer and larger tumors compared to white non-Hispanic women (NCI, 2014a;NCI, 2014b). In 2010, the CDC showed that breast cancer was the most commonly diagnosed cancer among BNH women and was the leading cause of cancer deaths among this group (Liu et al., 2012). Evidence of racial and ethnic differences in screening and prognosis continue. For example, Jewish women are more likely to be screened for BRAC 1 and 2 compared to BNH and Hispanic women, of all races (Levy et al., 2011). In addition, BNH women are typically less trusting of their primary care physicians and cancer treating teams. English speaking Hispanic women, of all races, are significantly less trusting of their diagnosing physician. Low trust among BNH and Hispanic women, of all races, can signify lower efficacy to complete course of therapy (Kaiser et al., 2011).
The differences in risk and breast cancer survival time between two ethnic subgroups can be measured by comparing the number of days patients were alive after the diagnosis. A lifetime is measured in different ways depending on what is being measured, such as death of a person, administrative censoring, or when a tool being used breaks, a common observation in biomedical sciences. Lifetime data has been around since the 1970s, and while there are many ways to statistically measure lifetime data by making use of statistical probability models, such as exponential, gamma, Weibull, and halfnormal model. The half-normal model has recently applied to solve some challenging problems in the statistical community, Khan (2012Khan ( , 2013.
The half-normal model to measure lifetime data has been studied to a certain extent, but not to compare the survival model analysis between the ethnicities of BH and BNH. There are very few authors who have used the half-normal model in data analysis, for example, Khan (2012Khan ( , 2013, among others. The half-normal model has a scale parameter, σ. The probability density function (PDF) of the half normal is given by
The maximum likelihood estimation, or MLE, is a method of estimating the parameter of the half-normal model. When applied to a data set, the MLE provides estimate for the model's parameter. The MLE function based on a sample of size n data points from the halfnormal model is given by Given a set of data drawn from a half-normal probability model, the unknown parameter of σ can be estimated by the method of maximum likelihood, which is given by The log-likelihood function of the half-normal model is given by logL= -n log σ-(∑y 2 i ÷2σ 2 ).
There have been a few studies conducted to examine distribution of breast cancer among population belonging to different Black ethnic subgroups (Banegas and Li, 2012;Ooi et al., 2011). Even fewer researchers used BH and BNH survival data to model statistical probability distributions. The goals of this paper are to: i.Obtain descriptive statistics of both ethnicities (BH and BNH); ii. Review the right skewed half-normal model and its estimation of the parameter; and iii. Use of the Kaplan-Meier survival curve to examine lifetime distribution of both ethnicities.

Materials and Methods
Breast cancer data was taken from the SEER cancer registry of the United States. The sample consisted of 608,032 breast cancer cases, which were diagnosed between years 1973-2009. This data provided us with information from twelve U.S. cancer registries. Stratified random sampling was used to select the sub-sample of BNH and BH breast cancer patients from all female breast cancer cases. For more about selection of sample from the SEER cancer registry, readers are referred to Khan et al. (2014aKhan et al. ( , 2014bKhan et al. ( , 2014cKhan et al. ( , 2014d. The frequency distribution table of the selected sample was used for both ethnicities. Descriptive statistics including mean, standard deviation (SD), range, and quartiles for age at diagnosis and survival time were used. Statistical modeling approach is very important to compare lifetime data in the field of engineering and biomedical sciences. To compare the lifetime data by using statistical probability distribution from both ethnicities, the half-normal probability model was used to graphically distinguish both ethnicities. Furthermore, survival analysis was conducted using Kaplan-Meier (K-M) and Cox proportional hazard regression methods to compare hazard ratios of both ethnicities.
The data analysis for this present study included frequency table generation, descriptive statistics, and survival analysis. Calculations were conducted using SPSS (IBM Corp. (2010)) and SAS ® software (SAS Institute Inc. (2011)). The standard software package Wolfram Research (2012) "Mathematica version 8.0" was used to display the graphical representation of the half-normal probability curve after taking consideration of the maximum likelihood estimates for the parameter. The survival times for both ethnicities were plotted with half-normal probability model and they were superimposed into a one graph to show their differences. The Mathematica software also used to draw histograms for survival times for both ethnicities. The geographic map of the randomly selected nine states out of the twelve states was derived using the GMAP procedure in SAS ® software. We used SAS ® software to generate the bar graphs for the breast cancer patients' survival times and age at diagnosis both for black Hispanic and black non-Hispanic ethnicities. Furthermore, SAS software was used for plotting K-M graph.
The histogram (Figure 2) shows that survival time of both ethnicities is right skewed. The x-axis represents survival time in months and y-axis represents frequency. For BNH, better survival among some cases can be inferred through the longer survival time at x-axis. For both ethnicities, most of the survival time frequencies lie between 0 to 150 data points, which is consistent with the 75% quartile values. Figure 3 represents the cancer registries that were selected by stratified random sampling. Most of the states belonged to either eastern or western part of the U.S. Despite lower representation in the total sample, cases from Iowa, New Mexico, and Utah showed higher survival time. Breast cancer cases from Michigan and California states showed lower survival as compared to the rest of the total sample. In graphs stratified by ethnicities, Washington, New Mexico and Iowa states show higher survival for BNH cases. In case of BH, Georgia, Michigan and Washington states show higher survival time.
The Maximum Likelihood Estimate (MLE) for the model's parameter was σ BH =93.728 for the BH sample, and σ BNH =128.711 for the BNH sample, respectively. The x-axis represents the survival times and y-axis represents the probability density function (pdf). There are two halfnormal densities plotted in Figure 4. The red solid line indicates the half-normal density plot based on the BH survival times and the purple dotted line indicates the half-normal density plot for the BNH survival times. The survival times for both ethnicities follow positive skewed model with their respective variability in survival times. It is clearly noted that the BNH group has higher variability compared with BH group as clearly seen from the values of their corresponding maximum likelihood estimates.  DOI:http://dx.doi.org/10.7314/APJCP.2014.15.21.9453 Black Hispanic and Black Non-Hispanic Breast Cancer Data Analysis The survival curve for BH cases is approximately overlapping the BNH curve at the beginning ( Figure 5). When both curves reach 100 months, we see a significant decrease in survival for both BH and BNH cases, with a significant difference in survival time between both ethnicities. After around 50 months, BH curve is significantly below the BNH curve, and this continues until 300 months.
For binary covariates, hazard ratio is the estimate of the ratio of the hazard rate in one group to the hazard rate in another group. The hazard ratio for BNH ethnicity with BH as reference, was 0.756 (95% CI: 0.663, 0.861), with χ 2 =17.89 at p<0.0001. The value of this ratio indicates that between both ethnicities, the BNH breast cancer patients have better survival as compared to the BH breast cancer patients. At the 50 month time-point, BH cases have around 50% chance of survival as compared to BNH, who have 58% chance of survival. At 100 months, this difference is around 10% between both ethnicities, with BH having around 25% and BNH having around 35% chances of survival.

Discussion
Past studies have shown that age is the strongest predictor of breast cancer, but race and ethnicity are also major risk factors (Akinyemiju et al., 2013;Van de Water et al., 2012;Virnig, Tuttle, Shamliyan, & Kane, 2010;ACS, 2013b;WHO, 2010). There is significant evidence showing that notable differences in breast cancer survival rates between different states, across various socioeconomic strata, and between different racial and ethnic groups (Liu et al., 2012). However, few studies have systematically studied ethnic disparities among black racial subgroups such as Hispanic and non-Hispanic (Trapido et al., 1994(Trapido et al., , 1990. The present study deals with two different ethnic groups within the black race, and assesses the outcome of breast cancer. The study also proposes an alternative methodology to comparing survival times between these two ethnicities. Statistical methods were used to describe survival or lifetime data for the patients and the inferences for observations among these two ethnicities were calculated. To compare the survival times for the ethnicities with respect to variability of the data, the half-normal probability model was used. The maximum likelihood method was used to measure the variability from both ethnicities data. In K-M survival estimates, BNH had 66 (95% CI: 58-71) months of median survival time as compared to 49.5 (95% CI: 45-60) months for BH. In Figure 5, we observe that BNH and BH survival curves are splitting at around 50 month time-point. This significant gap in between two survival curves was consistent until the end. Hazard ratio suggests that BNH breast cancer cases have around 24% better chance of survival as compared to BH breast cancer cases. Using the survival data the resulting survival curve plotted through K-M estimate were similar to the halfnormal model, a rightly skewed distribution. The x-axis, denoted by survival time, on both Figures 4 and 5 showed similar range and significant differences.
In the selected sample, most of the BNH cases came from the Michigan registry (38.8%), and most of the BH cases from Georgia registry (31.5%). Both states are part of the eastern region of the United States, but the geographical location differs. In our sample, the two least represented registries were Utah with 0.4% BNH cases and Hawaii with 0.3% BH cases. The results suggest that BH cases are diagnosed at least 3 years earlier as compared to the BNH cases. Some previous studies have also reported that BH are generally diagnosed at an earlier age as compared to other ethnicities and the risks of stage II and stage III breast cancer had no significant differences across non-Hispanic black, Hispanic white and Hispanic black women (Banegas and Li, 2012;Ooi et al., 2011). In this study, the PDF from half normal distribution suggests that BH women were at a higher risk of dying from breast cancer. Although, age at diagnosis was statistically lower in BH cases, it did not contribute in the longer survival. Two possible explanations could be that BH women develop more aggressive forms of breast cancer as compared to BNH and breast cancer disparities among this subgroup is significantly higher as compared to BNH population. Our study has some limitations. The data was obtained from the SEER database, and there is a possibility that some cases were misclassified in term of race and ethnicity. The sample size differed for each ethnicity; BNH had a sample size of 1000, whereas BH only had 300 (two cases with missing information were excluded). The cancer registries selected by stratified random sampling covered cases from eastern and western coast of the U.S. and cases from the central part were under represented.
The PDF of the half normal distribution was positively skewed and suggested better survival for BNH cases, however, different sampling techniques may produce different sigma (σ) values depending on the selection of sample data points, resulting in a different graph. It should be noted how the Kaplan-Meier estimate curve was right skewed and similar to the positively skewed curve of the half-normal model. The Kaplan-Meier estimate showed that BNH had a longer survival time than BH. Although the difference in mean survival time was statistically significant, stratified random sampling with larger samples may change the direction of result.
Our analysis showed that disparities in survival times differ between and within ethnic groups. Increased efforts should be used to address the care and diagnosing needs of black Hispanics in order to increase survival times. Black non-Hispanics having a later diagnosis time may also need to be addressed in order to ensure diagnosis is occurring in earlier stages of the disease.