Identification of Genetic and Non-genetic Risk Factors for Nasopharyngeal Carcinoma in a Southeast Asian Population

Nasopharyngeal carcinoma (NPC) constitutes 75-95% of the cancer cases of nasopharynx in low-risk populations and almost all those in high-risk populations (Whelan and Ferlay 1992). Globally, nasopharyngeal carcinoma is considered a relatively rare disease, having an agestandardized incidence rate in both sexes of less than 1 in 100,000 persons per year. This accounts for merely ~0.7% of the cancer burden across the globe (Jemal et al., 2011). However, there is a clustering of NPC in Southern Chinese and Southeast Asian populations (Clifford 1970; Vokes et al., 1997; Yu and Yuan, 2002; Yoshizaki et al., 2012). Located in Southeast Asia, Malaysia has one of the highest incidence rates of NPC in the world, together with the other two Southeast Asian countries Indonesia and Singapore (Whelan and Ferlay, 1992). In Malaysia, NPC is the fifth most common cancer nationwide (4.5% of all cancer cases) and has an age-standardized incidence of 8.5 and 2.6 per 100,000 males and females respectively (Zainal, 2006). NPC is the most prevalent cancer among young male adults (aged 15-49) and co-dominant with colorectal, lung, prostate and liver cancers in the older


Introduction
Nasopharyngeal carcinoma (NPC) constitutes 75-95% of the cancer cases of nasopharynx in low-risk populations and almost all those in high-risk populations (Whelan and Ferlay 1992). Globally, nasopharyngeal carcinoma is considered a relatively rare disease, having an agestandardized incidence rate in both sexes of less than 1 in 100,000 persons per year. This accounts for merely ~0.7% of the cancer burden across the globe (Jemal et al., 2011). However, there is a clustering of NPC in Southern Chinese and Southeast Asian populations (Clifford 1970;Vokes et al., 1997;Yu and Yuan, 2002;Yoshizaki et al., 2012). Located in Southeast Asia, Malaysia has one of the highest incidence rates of NPC in the world, together with the other two Southeast Asian countries Indonesia and Singapore (Whelan and Ferlay, 1992). In Malaysia, NPC is the fifth most common cancer nationwide (4.5% of all cancer cases) and has an age-standardized incidence of 8.5 and 2.6 per 100,000 males and females respectively (Zainal, 2006). NPC is the most prevalent cancer among young male adults (aged 15-49) and co-dominant with colorectal, lung, prostate and liver cancers in the older age group (≥50 years old).
Findings from epidemiological studies suggest that genetic predispositions, environmental risk factors and Epstein-Barr virus (EBV) infection may have important roles in the development of NPC (Zheng et al., 1994;Zhang et al., 2004;Yoshizaki et al., 2012). However, the near ubiquity of EBV infection and other environmental risk factors cannot fully explain the geographical and ethnic clustering of NPC incidences, suggesting a strong genetic link to NPC carcinogenesis (Serraino et al., 2005). The studies of NPC genetic predisposition, such as genome-wide association study (GWAS), have reported some early successes (Ng et al., 2009;Tse et al., 2009;Bei et al., 2010).
All epidemiological studies on the associations of genetic and/or non-genetic factors with NPC have provided insights into the etiology of NPC, and will be beneficial to the construction of a cancer risk prediction model. Risk prediction has a huge potential in bringing benefits to the public, as individuals at high risk to developing cancer may be provided with strategies for intervention and prevention (Spitz et al., 2007). Many statistical models, built on both genetic and non-genetic risk factors, had been developed previously to assess and manage the risks of cancers, particularly breast, colorectal, ovarian and prostate cancers (Taplin et al., 1990;Hartge et al., 1994;Eastham et al., 1999;Rockhill et al., 2001;Selvachandran et al., 2002;Imperiale et al., 2003;Tice et al., 2005). However, such a tool developed for any head and neck cancers, including NPC, remains scarce (Jiang and Liu, 2009;Bosch et al., 2011;Jin et al., 2012). This may be due to the fact that globally NPC is an uncommon form of cancer compared to other cancer types, in spite of its geographical and ethnic clustering. In addition, the genetic markers associated with NPC have not yet been fully explored and catalogued.
In the present study, a case-control association study was conducted with the specific aims of identifying and confirming both genetic and non-genetic risk factors associated with NPC in a Southeast Asian population in Malaysia.

Study population
NPC and control subjects were recruited from four different hospitals in Malaysia: Hospital Putrajaya (Putrajaya), Hospital Sultan Ibrahim (Johor Bharu), Hospital Kuala Lumpur (Kuala Lumpur), and Beacon International Specialist Centre (Selangor). For controls, only those who have never been diagnosed with any cancer and those without a family history of NPC were included. Demographic data and information about the known clinical risk factors for NPC were collected by face-to-face interview.
The risk factors included personal and family history of cancers, exposure to occupational hazards (daily or near daily exposure to ionizing radiation, heavy metals, fume, wood dust, and volatile chemicals), cigarette smoking, alcohol drinking, and dietary intakes of fruits and vegetables, red meat, and salt-cured food (Chang and Adami, 2006). In accordance with Malaysian dietary habit, foodstuff was quantified as the average proportion of total daily food intake over the past 10 years, instead of the commonly used portion sizes. Salt-cured food was measured as the average number of meals per month that included salt-cured food over the past 10 years. A regular consumer of salt-cured food was defined as those who included the foodstuff at least once a month.
Informed consent was obtained from all the subjects and the study protocol was conformed to national ethics guidelines and was approved by Medical Review & Ethics Committee (MREC), under the Ministry of Health, Malaysia (Registration ID: NMRR-10-652-6473).

DNA isolation and processing
Blood samples (1-5 ml) were collected at the time of subject recruitment. Genomic DNA was extracted using QIAamp DNA Blood Mini Kit (Qiagen, Valencia, CA, USA). The quantity and purity of the isolated DNA were determined using a Smart SpecTM plus spectrophotometer (Bio-rad, Inc), and the integrity of the DNA was assessed by 0.8% agarose gel electrophoresis followed by visual inspection.

SNP selection and genotyping
A literature search was used to identify a total of 768 candidate SNPs previously reported to associate with various cancer types in East, Southeast and South Asian populations. The cancer types were breast, cervical, colorectal, gastric, liver, lung, NPC, oral, ovarian, prostate and thyroid cancers, as well as leukemia. Genotyping was conducted using Illumina Golden Gate Genotyping Assay Platform (Illumina Inc., San Diego, USA) according to the manufacturer's protocols and recommendations. Custom genotyping probes were designed and submitted to Illumina Inc. for quality control and assessment using Assay Design Tool (ADT) (Illumina Inc., San Diego, USA). All 768 SNPs achieved designability scores of 0.5 or 1.0. Beadchips were scanned using Bead Array Reader System (Illumina Inc., San Diego, USA). Raw data generated from the scan were deciphered and qualitychecked using Genome Studio software (version 2011.1, Illumina Inc., San Diego, USA).

Statistical analysis
The average call rate per sample was >99% and the average call rate per SNP was 93%. SNPs with call rate <93% were excluded from further analysis. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated using STATA version 10 (StataCorp, Texas). Statistical significance was defined as p value <0.05. All SNPs were tested for deviation from Hardy-Weinberg equilibrium (HWE) using the chi-square (χ 2 ) test; and SNPs deviated from HWE (PHWE<0.05) were excluded. The association of SNPs with NPC susceptibility was analyzed further according to a best-fitted dominant and recessive genetic models as previously described (Lerman, 1996). JLIN (Java Linkage disequilibrium plotter) software version 1.6.0 was used for the analysis of linkage disequilibrium (LD) (Carter et al., 2006).

Results
The 96 NPC cases and controls were matched according to the age, ethnicity and gender (Table 1). None of these controls had a family history of NPC or personal history of any cancer. The mean age of the controls was 49.1 years (±10.6), and the mean age of the cancer subjects was 45.8 years (±12.7).
A total of 40 SNPs were identified to be significantly associated with NPC risk, attaining statistical significance   (Table 2), while 25 were best-fitted with a recessive genetic model (Table 3). Furthermore, the OR of these 40 SNPs ranged from 0.07-8.02 (Table 2), with 22 SNPs conferring risk (OR>1) and the remaining 18 having a protective effect (OR<1). Of the 40 SNPs, 11 attained a higher statistical significance of p value<0.01 level (Table 2 boldface). Next, we analyzed LD among the 40 SNPs identified. Three SNPs located in MMP2 gene in chromosome 16 (rs243842, rs243844 and rs243845) were found to be in perfect LD (r 2 =1.000), while 2 SNPs in TP53BP1 gene in chromosome 15 (rs2602141 and rs560191, data not shown) were in high LD (r 2 =0.979, data not shown). All the remaining SNP pairs did not show significant linkage (r 2 <0.8, data not shown). Several known non-genetic risk factors for cancers were analyzed for their associations with NPC risk. Our data demonstrated that four risk factors were significantly associated with NPC, including exposure to occupational hazards (OR=28.3, 95%CI=3.91-1204, p value<0.0001), low dietary intake of fruits and vegetables (OR=14.3, 95%CI=3.67-79.6, p value<0.0001), high red meat diet (OR=6.13, 95%CI=2.02-20.7, p value=0.0003), and regular consumption of salt-cured food (OR=3.43, 95%CI =1.15-11.0, p value=0.0129) (Table 3). However, the associations of cigarette smoking and alcohol consumption with NPC did not reach statistical significance (Table 3).

Discussion
As compared with other major cancer types, the study of genetic predisposition to NPC is lacking, even in the disease endemic areas. In the present study, we demonstrated the association of 40 SNPs with NPC in a Southeast Asian population. A search of the catalog of published GWAS revealed that out of the 11 documented NPC-associated SNPs (rs1412829, rs1572072, rs189897, rs2517713, rs28421666, rs2860580, rs2894207, rs29232, rs3129055, rs6774494, and rs9510787), 3 SNPs (rs29232, rs2517713, and rs3129055) were statistically significant in the present study (Supplemental Table S1) (Ng et al., 2009;Tse et al., 2009;Bei et al., 2010;Hindorff et al., 2011). Another SNP rs189897 was in our panel of screened SNP, but the associated did not reach statistical significance (data not shown). The remaining 7 SNPs from GWAS database were, however, not included in our original 768-SNP panel and thus their associations cannot be ascertained. In addition to the 3 SNPs identified by GWAS, the associations of rs16896923 and rs5009448 with NPC have also been reported previously in a Han Chinese population (Tse et al., 2009). Interestingly, all these 5 previously reported, NPC-associated SNPs are located in chromosomal region 6p21.3, and the associations of rs2517713, rs29232 and rs5009448 with NPC reached a higher statistical significance of p value<0.01 (Table 1). The mapping of chromosome 6p21.3 had been previously achieved by positional cloning approach (Lu et al., 2003), and the associations of a number of variants located within 6p21.3 with NPC risk had been described in a Taiwanese population (Lu et al., 2005).
The known environmental and lifestyle risk factors for NPC include Epstein-Barr virus infection, occupational exposure to wood dust, consumption of salt-cured foods, cigarette smoking and alcohol drinking; and a diet high in fruits and vegetables may reduce the risk of NPC (Armstrong et al., 1983;Yu et al. 1986;1988;Armstrong et al., 2000;Yuan et al., 2000;Chang and Adami 2006;Ekburanawat et al., 2010). Our data were in agreement with these previous reports, with the exception of cigarette smoking and alcohol drinking, both of which did not attain statistical significance (Table 3). However, in addition to wood dust, the occupational hazards registered in this study also included ionizing radiation, heavy metals, fume, and volatile chemicals. Due to the small sample size, the contribution of each individual type of hazard cannot be determined. It will be interesting to investigate the contribution of individual occupational hazard using a larger sample size in the future.
The major limitation of the present study is its sample size. A moderate-sized study usually loses its power as allele frequency and effect size decrease. Nevertheless, selecting disease cases and controls based on a family history of the disease can considerably increase the power of the case-control association study, by which the inclusion of cases with affected relatives decreases the required sample size and thus the cost of such studies (Peng et al., 2010). Among the NPC subjects in this study, 45.9% reported having a family history of cancers, including 14.6% having a family history of NPC. In contrast, among the control subjects, only 6.3% have a family history of any cancer, and none of which was that of NPC (Table 1). Hence, the negative impact of the small sample size was minimized in this study as case and (1.15-11.0) *Measured as a percentage of daily food consumption control subjects were selected according to their family history of cancers. This was exemplified by the fact that known environmental and dietary risk factors for NPC, as well as 3 out of 4 previously reported GWAS SNPs in the panel were successfully identified in this study. Small study sample size poses a challenge to performing racial stratification analysis. The study subjects of this study consisted of Malaysians of selfidentified ethnic Malay and Chinese (Table 1). However, ethnic self-identification does not necessarily translate into racial and genetic differences at the population level. A previous genetic study has demonstrated high similarity between the two ethnic groups and to other East Asian populations (Teo et al., 2009).
In NPC endemic areas, such as Malaysia, a risk prediction method will prove to be useful for the identification of high risk individuals and risk management. Such a tool can only be realized if both the genetic and non-genetic risk factors for NPC are identified and studied. It is hoped that the list of risk factors identified here will not only lead to the better understanding of NPC, but hopefully will also gear research towards the development of better NPC risk prediction and management approaches.