Bioinformatics Analysis Reveals Connection of Squamous Cell Carcinoma and Adenocarcinoma of the Lung

Lung cancer is the second most common type of cancer in the world and is the most common cause of cancer-related death in men and women; it is responsible for 1.3 million deaths annually worldwide (Yildiz et al., 2011). Squamous cell carcinoma and adenocarcinoma are the major histological types of non-small cell lung cancer (NSCLC). Because they differ on the basis of histopathological and clinical characteristics and their relationship with smoking, their etiologies maybe different; for example, a different tumor suppressor gene may be related to the genesis of each type. Accounting for 25% of lung cancers (Travis, 2002), squamous cell lung carcinoma usually starts near a central bronchus. A hollow cavity and associated necrosis are commonly found at the center of the tumor. Adenocarcinoma accounts for 40% of non-small-cell lung cancers (Travis, 2002). It usually originates in peripheral lung tissue. Most cases of adenocarcinoma are associated with smoking; however, among people who have never smoked (“never-smokers”), adenocarcinoma is the most common form of lung cancer (Subramanian and Govindan, 2007). Many genetic alterations associated with lung cancer have been reported to date. Changes in oncogenes including activation of genes in the myc and ras families (Johnson et al., 1987; Rodenhuis et al., 1987; Gemma et al., 1988; Rodenhuis et al., 1988) and inactivation of tumor suppressor genes including several tumor suppressor genes


Introduction
Lung cancer is the second most common type of cancer in the world and is the most common cause of cancer-related death in men and women; it is responsible for 1.3 million deaths annually worldwide (Yildiz et al., 2011).Squamous cell carcinoma and adenocarcinoma are the major histological types of non-small cell lung cancer (NSCLC).Because they differ on the basis of histopathological and clinical characteristics and their relationship with smoking, their etiologies maybe different; for example, a different tumor suppressor gene may be related to the genesis of each type.Accounting for 25% of lung cancers (Travis, 2002), squamous cell lung carcinoma usually starts near a central bronchus.A hollow cavity and associated necrosis are commonly found at the center of the tumor.Adenocarcinoma accounts for 40% of non-small-cell lung cancers (Travis, 2002).It usually originates in peripheral lung tissue.Most cases of adenocarcinoma are associated with smoking; however, among people who have never smoked ("never-smokers"), adenocarcinoma is the most common form of lung cancer (Subramanian and Govindan, 2007).
Many genetic alterations associated with lung cancer have been reported to date.Changes in oncogenes including activation of genes in the myc and ras families (Johnson et al., 1987;Rodenhuis et al., 1987;Gemma et al., 1988;Rodenhuis et al., 1988) and inactivation of tumor suppressor genes including several tumor suppressor genes (TP53, CDKN2A and STK11, NF1, ATM, RB1, APC, etc) along with tyrosine kinase genes (KRAS, EGFR, ERBB, EPHA3, NRAS, KDR, FGFR4 and NTRK etc) that may function as proto-onco-genes.Further insight into the gene alterations underlying lung cancer was involved in signaling pathway.Additionally, there is a significant excess of mutations and copy number alterations in genes from the MAPK, p53, Wnt cell cycle and mTOR signaling pathways, suggesting links to the disease (Ding et al., 2008).
Microarray gene expression profiling has recently been used to define prognostic signatures in patients with lung carcinomas (Garber et al., 2001;Miura et al., 2002;Beer et al., 2002;Ramaswamy et al., 2003); A high-throughput microarray experiment was designed to analyze genetic expression patterns and identify potential genes to target for AC (Li et al., 2006).However, whereas several studies have investigated gene expression profiles in lung AC, there have been no similar large studies that have investigated in SCC.AC and SCC are closely related in humans, especially among smokers.In order to investigate the similarity and specificity of SCC and ACC in gene expression profiles, we found differentially expressed genes between them and constructed three regulatory networks, the common regulatory network, the AC specific network and the SCC specific network.Furthermore, a set of transcription factors which regulate the differently expression genes play an important role in the progress of SCC and AC were identified in our network.

Affymetrix microarray data
Our data was obtained from Bhattacharjee et al. (2001), which are based on the Affymetrix human U95A oligonucleotide arrays.Only 161 chips are available.The 161 specimens include squamous cell lung carcinomas (n=21), adenocarcinoma (n=123) and normal lung (n=17) specimens.Total 8742 genes were detected in our experiment.

Pathway data
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals (Kanehisa, 2002).The 'pathway' database records networks of molecular interactions in the cells, and variants of them specific to particular organisms (http://www.genome.jp/kegg/).

Regulation data
There are approximately 2600 proteins in the human genome that contain DNA-binding domains, and most of these are presumed to function as transcription factors (Wachi et al., 2005).The combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development (Brivanlou and Darnell, 2002).
These transcription factors are grouped into 5 super class families, based on the presence of conserved DNAbinding domains.TRANSFAC database contains data on transcription factors, and their experimentally-proven binding sites and regulated genes (Wingender, 2008).
Transcriptional Regulatory Element Database (TRED) has been built to increase the needs of an integrated repository for both cis-and trans-regulatory elements in mammals (Jiang et al., 2007).TRED did the curation for transcriptional regulation information, including transcription factor binding motifs and experimental evidence.The curation is currently focusing on target genes of 36 cancer-related TF families.
Combined the two regulation datasets, total 7201 regulatory relationships between 375 TFs and 2650 target genes were collected.

Differentially expressed genes (DEGs) analysis
The t-test method was used to identify DEGs.To circumvent the multi-test problem which might induce too much false positive results, the BH method (Fromm et al., 1986) was used to adjust the raw P-values into false discovery rate (FDR).The original expression datasets from all conditions were processed into expression estimates using the RMA and quantile method with the default settings implemented by Affy package in R, and then constructed the linear model.The DEGs only with the fold change value larger than 1.2 and FDR less than 0.05 were selected.

Pathway enrichment analysis
In order to facilitate the functional annotation and analysis of large lists of genes in our network, DAVID (Huang da et al., 2009) (The Database for Annotation, Visualization and Integrated Discovery) bioinformatics resources were used.DAVID consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists.

Regulation network construction
Using the regulation data that have been collected from TRANSFAC database and TRED database, we matched the relationships between differentially expressed TFs and their differentially expressed target genes.We built the regulatory networks by Cytoscape (Shannon et al., 2003).We got 91 regulatory relationships between 21 TFs and their 72 differentially expressed target genes in adenocarcinoma and 314 regulatory relationships between 39 TFs and their 204 differentially expressed target genes in squamous cell lung carcinomas.By integrating the two networks, fifty-seven overlapping regulatory relationships between 11 TFs and their 50 target genes were obtained.

Differentially expressed genes (DEGs) selection
After microarray analysis, the differentially expressed genes with the fold change value larger than 1.2 and FDR less than 0.05 were selected.We found 1376 DEGs between normal lung specimens and adenocarcinoma, and 2292 DEGs between normal lung specimens and squamous cell lung carcinomas.One thousand and fifty overlapping DEGs were found between the 1376 DEGs and 2292 DEGs, wherein 1039 DEGs changes in the same direction.PHF3, HAS2, STX1A, PRPF4, NCAPD2, CEL, GPSN2, NMU, LRP10 and APOBEC3F expressed at a low level in adenocarcinoma, but a high level in squamous cell lung carcinomas.Only GABBR2 has high expression in adenocarcinoma while low expression in squamous cell lung carcinomas.The expression values of PHF3, HAS2, STX1A, PRPF4, NCAPD2, and GABBR2 in three specimens were demonstrated in Figure 1.The different expression values suggesting that these TFs may play an important role in lung carcinoma.There are 326 specific genes only found in AC and 1242 specific genes only found in SCC, which suggesting that although the   pathogenesis of AC and SCC were significantly similar, they are also have their own specificity.

Regulation network construction and analysis
Because AC and SCC are two subtypes of non-small cell lung cancer, their regulatory networks may have the same subnets.We obtained microarray data from Bhattacharjee et al (2001).After microarray analysis, the differentially expressed genes with the fold change value larger than 1.2 and FDR less than 0.05 were selected.We got 91 regulatory relationships between 21 TFs and their 72 differently expressed target genes in adenocarcinoma and 314 regulatory relationships between 39 TFs and their 204 differentially expressed target genes in squamous cell lung carcinomas.There are 57 overlapping regulatory relationships between AC and SCC.By integrating the regulatory relationships above, three regulatory networks (Figure 2-4) were built between TFs and their target genes.As we can see in Figure 2, TP53, SP3, PGR, SP3, SMAD4 and ESR2 with higher degrees form a local network which suggesting that these TFs may play an important role both in AC and SCC.Of these TFs, TP53 and PGR have high expression in the two subtypes while SP3, SMAD4 and ESR2 have low expression.
There is a specific regulatory network in AC (Figure 3).Compared to the common regulatory network, the AC specific network adds some relationships related with TP53, ESR2, and SP3 etc, and the regulatory relationships of HNF4A only found in AC specific network, which suggesting that the regulatory relationships caused by HNF4A play an important role in carcinogenesis of AC.
The SCC specific regulatory network was built as shown in figure 4. In this network, there are also some new regulatory relationships related with TP53 and SP3.The most significant difference is that a large number of regulatory relationships related with CREB1, ESR1, RARA, SPI1 were found in this network and these TFs does not appeared in AC specific network.

GO analysis of the regulation network
To investigate the function differences of AC and SCC network, several Gene Ontology (GO) categories were enriched among these genes in the common regulatory network, AC specific regulatory network and SCC specific regulatory network.The gene function of the common regulatory network contains regulation of biological quality, regulation of gene-specific transcription, positive regulation of macromolecule metabolic process and so on.The gene of AC specific regulatory network contains positive regulation of transcription from RNA polymerase II promoter, natural killer cell differentiation, T cell differentiation and so on.The gene function of SCC specific regulatory network contains cell proliferation, response to organic substance, regulation of multicellular organismal process and so on (Table 1).

DAVID analysis of the regulation network
We used DAVID to perform pathway enrichment of the three regulatory networks.Seven KEGG pathways were enriched among these genes in the common regulatory network, including 'Cell cycle', 'Wnt signaling pathway' and so on.For AC specific regulatory network, pathways including 'Cell cycle', 'Wnt signaling pathway', 'ErbB signaling pathway' etc were enriched; for the SCC specific regulatory network, the enriched pathways contains 'p53 signaling pathway', 'TGF-beta signaling pathway', 'Focal adhesion', 'Jak-STAT signaling pathway' and so on (Table 2).

Discussion
The recent development of cDNA microarray or cDNA chip technology, a high-throughput method of monitoring gene expression, has made it possible to analyze the expression of thousands of genes at once (Ermolaeva et al., 1998;Joseph, 1996;Khan et al., 1998;Schena et al., 1995).Consequently, regulatory networks of TFs and their target genes can now be constructed on the basis of the altered expression of multiple genes in different tumor specimens.
In the study reported here, we analyzed the differentially expressed genes of AC and SCC, and constructed three regulatory networks, the common regulatory network, the AC specific network and the SCC specific network.Furthermore, gene ontology analysis and DAVID analysis were performed to analyse the functional annotation of genes and involved pathways in the three networks.
From the result of common regulation network construction between AC and SCC, we could find that TP53, SP3, PGR, SMAD4 and ESR2 are hub nodes in our common regulatory network and some of them were proved to be related with lung carcinomas by previous studies (Beattie et al., 1985Sekido et al., 1998;Toyooka et al., 2003;Zhu et al., 2003;Marquez-Garban et al., 2007).Of these TFs, TP53 and PGR have high expression in the two subtypes while SP3, SMAD4 and ESR2 have low expression, which suggesting that these TFs may play an important role both in AC and SCC.These expression profiles also found in previous studies (Takahashi et al., 1989;Bodner et al., 1992;Kosaka et al., 2009) of lung cancer.
TP53 (also known as p53), is a tumor suppressor gene that encodes tumor suppressor protein.Tumor protein p53 is a DNA-binding protein which responds to diverse cellular stresses to regulate target genes that induce cell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism.TP53 was identified to mutate in lung carcinoma (Le Calvez et al., 2005).Furthermore, the frequency of TP53 mutations in invasive tumors was significantly higher than those in noninvasive ones (Nakanishi et al., 2009).SP3 belongs to a family of 16 different transcription factors containing a highly conserved zinc finger DNA-binding domain (Philipsen and Suske, 1999).SP3 is over-expressed in a variety of cancers including gastric, colorectal, pancreatic, epidermal, thyroid, breast and lung cancers.Previous studies (Colon et al., 2011) showed that Sp proteins mediate a number of genes involved in cell proliferation, survival and angiogenesis.
HNF4A is a TF which only observed in the AC specific network; this suggested that the regulatory relationships caused by HNF4A may play an important role in carcinogenesis of AC.HNF4A is a nuclear transcription factor that binds DNA as a homodimer.The encoded protein controls the expression of several genes, including hepatocyte nuclear factor 1 alpha, a transcription factor which regulates the expression of several hepatic genes.Although the role of HNF4A has not been confirmed to be related with lung carcinoma, some evidence suggests that the average expression levels of HNF4a gene in some lung disease were modestly increased compared to normal human lungs.In comparison with normal human lungs, the averages of HNF4A gene expression levels were significantly higher in human adenocarcinoma.In contrast to adenocarcinoma, the average expression levels of HNF4A in squamous cell carcinomas were significantly reduced (Qu et al., 2009).
A large number of regulatory relationships related with CREB1, ESR1, RARA, SPI1 were found in SCC specific network and these TFs does not appeared in AC specific network.There are 40 differentially expressed target genes of CREB1, but only 6 of them were found in AC specific network, most of them regulate genes in SCC specific network.There are 24 differentially expressed target genes of RARA, but only 4 of them were found in AC specific network, most of them regulate genes in SCC specific network.We proposed a hypothesis that the high expression profile of CREB1, ESR1, RARA and SPI1 and their corresponding regulatory relationship are of importance in the carcinogenesis of SCC.
In the past decade, many signaling pathways are identified associated with cancer development, such as JAK/STAT, MAPK/ERK, PI3K/AKT, NF-kB, Wnt, TGF-β, etc (Dreesen and Brivanlou, 2007).Identically, these pathways play a significant role in lung carcinomas.Mutations in the BRAF and KRAS genes occur in non-small-cell lung cancer patients, which induces the activation of extracellular signal-regulated kinase (ERK)-1/2 (MAPK) pathway and the development of lung carcinoma (Ji et al., 2007).The Wnt signaling pathway, originally identified in Drosphila melanogaster, is highly conserved in evolutionary pathways involved in homeostasis, cell proliferation, differentiation, motility, and apoptosis.It is deregulated in a number of cancers, including lung cancer.TGF (Transforming growth factor β) -beta signaling pathway were proved to be related with lung carcinoma.Cell-autonomous TGF-β signaling is required for both induction and maintenance of in vitro invasiveness and metastasis during late-stage tumorigenesis (Martin, 1998).
The basic understanding of the mechanisms underlying the function of AC and SCC genes is important.A deeper understanding of transcription factors and their regulated genes remain an area of intense research activity in futures.Our regulatory network is useful in investigating the connection between AC and SCC.We also find some new transcription factors and target genes related to AC and SCC; besides, many pathways such as pathways in cancer, p53 signaling pathway and Jak-STAT signaling pathway have been linked with SCC by our method.However, further experiments are still needed to confirm the conclusion.

Figure 1 .
Figure 1.Expression Value of Part of DEGs Between AC and SCC in Three Specimens.The ordinate is gene expression value, 'Normal' represents normal lung specimens, 'AC' represents adenocarcinoma and 'SCC' represents squamous cell lung carcinomas

Figure 2 .Figure 3 .
Figure 2. The Common Regulatory Network Between AC and SCC.Same subnets were found in this common regulatory network.Rhombus (◊) represents transcription factors and circle (O) represents target genes