Inferring Single Nucleotide Polymorphisms in MicroRNA Binding Sites of Lung Cancer-related Inflammatory Genes

Since the first noted by the pathologist Rudolph Virchow who found inflammatory cells are present within tumors (Coussens et al., 2002), epidemiological evidences now indicate inflammation mediates oncogenesis and chronic inflammation contributes to about 25% of all human cancers (Hussain et al., 2007). Inflammation induces carcinogenesis maybe by the following mechanism (Ohnishi et al., 2013): reactive oxygen species (ROS), reactive nitrogen species (RNS), harmful endogenous genotoxic substances, generated from inflammatory and epithelial cells result in oxidative and nitrative DNA damage and are involved in the initiation and/or promotion of inflammation-mediated carcinogenesis. Lung cancer is the most common malignant tumor and the primary reason of global neoplasm deaths (Jemal et al., 2011). Recently, there were also numerous of laboratory and clinical studies have extensively reported the relationship between inflammation and lung cancer, as well as a current review (Cho et al., 2011; Hattar K et al., 2013). Although the exact cause of lung inflammation leading to carcinogenesis is not known, there was the hypothesis (Houghton et al., 2008) demonstrated that chronic airway inflammation alters the bronchial epithelium and lung microenvironment and the expression of oncogenes and tumor suppressor genes might be induced to cause to neoplastic transformation. Pulmonary inflammation plays a risk role in promoting development of lung cancer. Several kinds of conditions


Introduction
Since the first noted by the pathologist Rudolph Virchow who found inflammatory cells are present within tumors (Coussens et al., 2002), epidemiological evidences now indicate inflammation mediates oncogenesis and chronic inflammation contributes to about 25% of all human cancers (Hussain et al., 2007). Inflammation induces carcinogenesis maybe by the following mechanism (Ohnishi et al., 2013): reactive oxygen species (ROS), reactive nitrogen species (RNS), harmful endogenous genotoxic substances, generated from inflammatory and epithelial cells result in oxidative and nitrative DNA damage and are involved in the initiation and/or promotion of inflammation-mediated carcinogenesis.
Lung cancer is the most common malignant tumor and the primary reason of global neoplasm deaths (Jemal et al., 2011). Recently, there were also numerous of laboratory and clinical studies have extensively reported the relationship between inflammation and lung cancer, as well as a current review (Cho et al., 2011;Hattar K et al., 2013). Although the exact cause of lung inflammation leading to carcinogenesis is not known, there was the hypothesis (Houghton et al., 2008) demonstrated that chronic airway inflammation alters the bronchial epithelium and lung microenvironment and the expression of oncogenes and tumor suppressor genes might be induced to cause to neoplastic transformation.
Pulmonary inflammation plays a risk role in promoting development of lung cancer. Several kinds of conditions RESEARCH ARTICLE

Inferring Single Nucleotide Polymorphisms in MicroRNA Binding Sites of Lung Cancer-related Inflammatory Genes
Fei He 1 , Ling-Ling Zheng 2 , Wen-Ting Luo 3 , Rong Yang 4 , Xiao-Qin Xu 5 , Lin Cai 1 * bring about lung inflammation, including tobacco smoke, occupational hazards and pathogen infections. Cigarette smoke contains great amount of carcinogens and modulates inflammation and promotes chronic inflammation in the conducting airways by impairing innate host defense mechanisms (Lee et al., 2009;Lee et al., 2012). Another condition contributes to inflammation is pathogen. Infection triggers the inflammatory response which is a part of normal host defense, preceding tumor development. However, tumorigenic pathogens subvert host immunity and establish persistent infections associated with low-grade but chronic inflammation and then dysregulate inflammatory cytokines and transcription factors (Grivennikov et al., 2010).
Lung cancer cells use inflammatory cytokines and their receptors for tumor growth, invasion, migration and metastasis. MicroRNAs (miRNAs) are endogenous non-coding single-stranded RNAs of about 21-22 nucleotides in length, which can regulate gene translation and modulate gene expression at post-transcriptional level during the most biological and pathological processes (Lagos-Quintana et al., 2001). At present, many researchers (Saunders et al., 2007) considered that single nucleotide polymorphisms (SNPs) in the miRNA seed sequence have higher probability of affecting miRNAs function. Therefore, supposed that SNPs occur in the binding site between miRNAs and mRNAs (usually in 3' untranslated region, UTR) in genes of inflammation signaling pathways, which may weaken or reinforce the expression of miRNA target genes and then, modify miRNA targeting activities and modulate the level of inflammation in response to various inflammatory stimuli. Many studies evaluated the possible associations between lung cancer and polymorphisms (Chen et al., 2013;Cheng et al., 2013;Zu et al., 2013;Kim et al., 2014).
In this study, we selected lung cancer-related genes that belong to inflammation pathways responding to microorganism and cigarette smoking and hoped to catalogue SNPs, which might affect the expression levels of the target genes. It will provide data for the followup studies on susceptibility or prognosis, functional verification and build evidence for diagnosis and treatment of lung cancer.

Selection of candidate genes
We focused on the genes which related to lung cancer participated the inflammatory reaction to microorganisms, such as chlamydia pneumonia, mycobacterium tuberculosis or human immunodeficiency virus (HIV) and tobacco smoking. Candidate genes are retrieved according to the  information from the website (http://www.sabiosciences. com/Cytokines_Inflammation.php), BioCarta and KEGG pathways which are two common databases that provide displays of gene interactions for human cellular processes (http://cgap.nci.nih.gov/Pathways) and literatures (Shen et al., 2011;Yu et al., 2011;McMillan et al., 2011 ;Lee et al., 2012) commonly acknowledged as important inflammatory genes associated with smoking.

Procedure
For each gene, we proceeded as follows: 1) The 3'UTR was identified according to the UCSC genome browser; 2) Putative miRNA-binding sites within the 3'UTR of each gene were identified by five specialized algorithms approaches as mentioned at materials1.2.3.4.5; 3) The polymorphisms falling within the miRNA-binding sites identified as in stage 2 were searched in SNP database; 4) The polymorphisms were directly predicted by inserting the gene name into the web sites as recited above at materials 6.7.8; 5) The SNPs selection was performed based on frequencies reported for Chinese and the criterion was the SNPs having the minor allele frequency (MAF) lower than 0.05 were excluded. Because in future, we will apply case-control association study on this ethnic group to do the further research; 6) The algorithm of material 11 was run to assess the binding free energy (expressed asΔG (kJ/mol), Gibbs free energy) for both for the common and the variant alleles identified as in stages 3 and 4; 7) The SNPs for their abilities to affect the binding of the miRNAs with their targets were evaluated by calculating as variation of ΔG (i.e., ΔΔG), which was expressed as the difference in the energies between the two alleles was computed and used as the parameter for the assessment of the impact that each polymorphism shows for a given miRNA target site.

Results
335 candidate genes that were inflammation-related genes were retrieved from sabiosciences website, BioCarta and KEGG pathways and literatures, which contain the reactive genes under the stimulus of tobacco smoke and microorganisms.
Currently, it is difficult to judge which algorithm produces the most reliable or sensitive target predictions. A union of the result from five algorithm databases was performed for obtaining more reliable genes. Among the 335 candidate genes, 149 genes had miRNA target sequences in their 3'UTR regions, which were predicted by five algorithms. 32 SNPs didn't show any binding free energy at the miRNA target binding site. The remaining 142 genes showed 269 SNPs in the predicted miRNAbinding sites based on criterion (It only list the SNPs, whose |ΔΔG| average ≥2.37 kJ/mol in Table 1 for lack of space) .
For some genes, several miRNAs were predicted as the target sites, and others may be predicted to be targeted by only one miRNA. In order to account for these differences, as parameter for predicting the biological impact of each polymorphism, the average of the absolute values of ΔΔGs should be used for each SNP (expressed as |ΔΔG|average). In order to give a priority list of SNPs having an impact on miRNA binding, we ranked the values of |ΔΔG| average and classified the SNPs in three groups corresponding to quartile. The first grade (|ΔΔG| average ≥2.37kJ/mol) is composed of SNPs having a predicted high impact on the biology of the miRNA binding sites. The second grade (0.60 <|ΔΔG| average ≤2.37kJ/mol) is composed of SNPs with a predicted mild biological activity, whereas within the last (|ΔΔG| average <0.60kJ/mol) belong SNPs maybe with weakest activity.
For the all 269 SNPs, there were 202 miRNAs predicted for binding more than one SNP (Table 2). CDK6 rs42377 and rs42035 were found to have the shared target gene of hsa-miRNA-144; GREM1 rs17816260 and rs33963919 were found to have the shared target gene of hsa-miRNA-190; EDARADD rs61737025 and rs61736989 were found to have the shared target gene of hsa-miRNA-329; PRLR rs379899 and rs371913 were found to have the shared target gene of hsa-miRNA-3616-5p, hsa-miRNA-4795-5p and hsa-miRNA-573.

Discussion
In this study, we finally identified 269 SNPs within the miRNA-binding sites of 142 genes. Firstly, we chose 335 genes from inflammatory response pathway, including toll-like receptors (TLRs) signaling pathway, JAK/ STAT signaling pathway, NF-κβ pathway and NOD-like receptors (NLRs) pathway and so on. Such genes play a key role on active immunity and passive immunity and whether in the extrinsic or intrinsic, inflammatory cells and molecules can impact the genomes of cancer cells through a variety of mechanisms. Then by detecting SNPs in miRNA target sites, we got 149 genes. Finally, we obtained 269 SNPs in 142 genes. And we also found that there were 164 SNPs shared the predicted binding miRNAs.
Among these SNPs, some were located in the seen regions of miRNA target sequences, whereas some were in other regions. Since 1993 the first discovery of miRNA, a great number of studies had classified genetic polymorphisms, which will affect miRNA regulation by various molecular mechanisms into three categories. First are polymorphisms within precursor miRNAs (pre-miRNAs); second are polymorphisms in miRNA-target-mRNA sites and third are variations in miRNA machinery genes (Mishra et al., 2009). It indicated that because SNPs in miRNA-target-mRNA sites are more likely to be under positive selection pressure, they tend to be deleterious, due to differences among various populations and contribute to diseases (Saunders et al., 2007). In current study, we focused SNPs in this region and it will be better for evaluating potential causative SNPs.
Due to the expensive experimental expenses and not suitable approaches, so many researchers conducted bioinformatics prediction and statistical analyses to investigate the diseases associated SNPs in miRNA target sites. Some studies introduced bioinformatics methods to identify a set of SNPs within miRNAs binding sites of genes Song et al., 2014) and others verified the specific SNP by molecular experiments (Bhat et al., 2013). It is well-known that the functions of SNPs in miRNA-binding sites were miRNAs regulation and miRNAs or target genes expression. At present, there are many available algorithms and databases to predict miRNA target genes. The commonly rule in the method is that nucleotides 6-8 in the 5' end of the miRNA (called as 'seed' sequences) provide the maximal binding free energy of the miRNA-target duplex and the G:T pairing is admitted in the miRNA-target (John et al., 2004;Tomari et al., 2005). The most studies predict SNPs within miRNA binding sites and assess the potential functions of SNPs in 3'UTR via well-developed algorithms, based on the differences in the alignment scores and variations in binding free energy (Betel et al., 2010;Liu et al., 2012). In addition, some studies investigated the effects of SNPs according to the secondary structures of the miRNA binding sites by using RNAfold (Hariharan et al., 2009). And others employed a linear model to assess the effects of SNPs on the gene expression phenotypes (Richardson et al., 2011;. Using different programmes and databases, we obtained different quantity of lung cancer-related inflammatory genes that have miRNA-binding sites. It is difficult to judge which SNPs or miRNAs are likely to play more roles in lung cancer development without experiments being done, we combined results from eight databases to increase the accuracy of the analysis. In recent years, more and more databases have been used to explore the SNPs in miRNA-binding sites, which including miRanda, PicTar, MirSNP, TargetScan Human, miRNASNP, Patrocles, DIANA-microT and PolymiRTS Database. The more databases predicted the SNP, the more likely it would be the true target SNP (Song et al., 2014).
Currently, there were numbers of studies focus on the association between SNPs in miRNA binding-sites of specific gene and diseases some about susceptibility and others about prognosis. Now genome wide association studies (GWA study, or GWAS) reported scores of diseases related SNPs were in non-coding region. The significance of the association may be brought up by still unknown mechanisms or by linkage disequilibrium (LD) with functional polymorphisms. Thus, the regulation of miRNAs on target genes may work. Some researchers (Richardson et al., 2011) basing on the list of SNPs from GWAS, or in strong LD with a GWAS SNP performed a genome-wide scan of SNPs that abrogate or create miRNA recognition element (MRE) seed sites (MRESS) and identified high priority candidate SNPs for functional studies and for disease risk prediction. Other researchers (Wu et al., 2011) discovered the relationship between specific genes expression or miRNA level and diseases, so they thought that SNPs within miRNA target sites may influence their encoded target-mRNAs and their downstream effectors and they would predict the 3'UTR of these genes contains potential MREs and verify that these mutations maybe as a key in regulating gene expression. At present, all studies the on gene variation in miRNA-binding regions were performing several kinds of arithmetic or getting help from various websites. But there was no uniform standard and procedures in selecting SNPs, the results of predicting not only need evaluate in the association study also confirm by the functional research. So we will do the further investigation to certificate these SNPs.
In our results, we found some SNPs were reported have association with cancer or other disease which may impact function of genes by combing different miRNAs because of changing sequences. This paper provided the basis for a reasoned algorithm-driven selection of SNPs. It is important to address that all the polymorphisms predicted supports future investigations to validate these results in well-characterised populations by functional assays or case-control association studies. The proposed approach could help to ease the identification of functionally relevant SNPs and minimize the workflow and the costs.