Transcriptome Network Analysis Reveals Potential Candidate Genes for Esophageal Squamous Cell Carcinoma

Esophageal cancers (or oesophageal cancer) which arises from the epithelium, or surface lining of the esophagus are typically carcinomas. ESCC fall into two classes. One subtype is primarily squamous cell cancer, arising from the cells that line the upper part of the esophagus and adenocarcinoma, and this cancer is similar to head and neck cancer in their appearance and associates with tobacco and alcohol consumption. The other is adenocarcinomas, which are often associated with a history of gastroesophageal reflux disease and Barrett’s esophagus, arising from glandular cells that are present at the junction of the esophagus and stomach (Holmes and Vaughan, 2007). Recently, many researches are carried out to understand the precise molecular mechanisms of esophageal squamous cell carcinoma (ESCC), which could be summarized as three aspects: tumour suppressor genes, oncogenes and apoptotic genes. P53 is a tumour suppressor that halts progression in both the G 1 and G 2 phase of the cell cycle to assess DNA damage. Accumulation of p53 in the normal oesophagus, suggested that the loss of suppressor function p53 might be an early event in carcinogenesis of the oesophagus. Mutations in codons 175, 248 and 273 of p53 are considered to have growth advantages to progress to invasive squamous cell carcinoma and


Introduction
Esophageal cancers (or oesophageal cancer) which arises from the epithelium, or surface lining of the esophagus are typically carcinomas.ESCC fall into two classes.One subtype is primarily squamous cell cancer, arising from the cells that line the upper part of the esophagus and adenocarcinoma, and this cancer is similar to head and neck cancer in their appearance and associates with tobacco and alcohol consumption.The other is adenocarcinomas, which are often associated with a history of gastroesophageal reflux disease and Barrett's esophagus, arising from glandular cells that are present at the junction of the esophagus and stomach (Holmes and

Transcriptome Network Analysis Reveals Potential Candidate Genes for Esophageal Squamous Cell Carcinoma
Zheng Ma 1 , Wei Guo 1 , Hui-Jun Niu 1 , Fan Yang 1 , Ru-Wen Wang 1 , Yao-Guang Jiang 1 , Yun-Ping Zhao 1 * occured most frequently.LOH of the Rb gene was found correlated with the loss of pRb protein expression and associated with p53 alterations in human oesophageal cancer.It is suggested that associated Rb and p53 inactivation may be the major event in the development and progression of oesophageal cancer, due to the greater selective advantage of the affected cells.The oncogenes activated most frequent in oesophageal cancer are cyclin D1, c-erbB1 and 2, c-myc, c-ras, Int-2/hst-1, and EGFR.Cyclin D1 gene amplification was found in about 32% of oesophageal tumours, and 92% of these tumours showed cyclin D1 overexpression.Frat1 expression was also found to be relatively high in human oesophageal cancer cell lines through the activation of the WNT-h-catenin-TCF signaling pathway.Proteins that have been found to be aberrantly expressed within oesophageal cancer including the anti-apoptotic proteins Bcl-2 (up-regulated) and proapoptotic protein Bax (down-regulated) (Kuwano et al., 2005;McCabe and Dlamini, 2005).
DNA microarray analysis as a global approach is applied to investigate physiological mechanisms in health and disease (Spies et al., 2002).The microarray experiment significant reduction of MDR1 expression was observed in patients suggesting that MDR1 might affect the sensitivity of 5FU and/or CDDP in the adjuvant chemotherapy for esophageal cancers (Kihara et al., 2001).Genomic expression profiling evolves as a useful tool to identify novel pathomechanisms in human cancer (Guo, 2003).
The purpose of this paper is to propose a hypothesis that a transcriptome network can be developed such that a set of transcription factors, regulated the differently expression genes are induced by ESCC and can be identified and modulated.Further analysis of the genes and pathways in the network were taken to identify potential mechanisms responded to the ESCC.The study does not address regulation network but searches for the significance pathways related to ESCC.
Pathway data: KEGG (Kyoto Encyclopedia of Genes and Genomes) is a collection of online database dealing with genomes, enzymatic pathways and biological chemicals (Kanehisa, 2002).The PATHWAY database records networks of molecular interactions in the cells, and variants of them specific to particular organisms (http:// www.genome.jp/kegg/).Total 130 pathways, involving 2,287 genes, were collected from KEGG.
Regulation data: There are approximately 2600 proteins in the human genome that contain DNAbinding domains, and most of which are presumed to function as transcription factors (Wachi et al., 2005).The combinatorial use of a subset of the approximately 2000 human transcription factors easily accounts for the unique regulation of each gene in the human genome during development (Brivanlou and Darnell, 2002).
These transcription factors are grouped into 5 super class families, based on the presence of conserved DNAbinding domains.TRANSFAC database contains data on transcription factors, and their experimentally-proven binding sites and regulated genes (Wingender, 2008).
Transcriptional Regulatory Element Database (TRED) has been built to increase the needs of an integrated repository for both cis-and trans-regulatory elements in mammals (Jiang et al., 2007).TRED has the curation for transcriptional regulation information, including transcription factor binding motifs and experimental evidence.The curation is currently focusing on target genes of 36 cancer-related TF families.
Combined the two regulation datasets, total 6328 regulatory relationships between 276 TFs and 3,002 target genes were collected (Table 1).
miRNA datasets: Integrate miRNA to disease datasets and miRNA to target genes datasets to find the relation between disease and genes.A human miRNA disease database (HMDD) (Lu et al., 2008) which manually retrieves the associations of miRNA and disease from literatures contains 444 miRNA genes, 259 diseases, 1,149 publications, and 2,886 miRNA-disease associations.miR2Disease (Jiang et al., 2009) provides a comprehensive resource of miRNA deregulation in various human diseases.Total 349 miRNA and 163 diseases were collected.Merge the above two database, 5,036 relationships were collected.We also integrate starbase (Yang et al., 2011), miR2Disease, miRecords (Xiao et al., 2009) and TarBase (Papadopoulos et al., 2009) database of miRNA-target genes, whether is proved by experiment.At last, 211,464 relations between miRNA and genes were selected.

Methods
Differentially expressed genes (DEGs) analysis: For the GSE23400 dataset, the limma method (Smyth, 2004) was used to identify DEGs.The original expression datasets from all conditions were processed into expression estimates using the RMA method with the default settings implemented in Bioconductor, and then constructed the linear model.The DEGs only with the fold change value larger than 1.5 and p-value less than 0.05 were selected.
Co-expression analysis: To demonstrate the potential regulatory relationship, the Pearson Correlation Coefficient (PCC) was calculated for all pair-wise comparisons of gene-expression values between TFs and the DEGs.The regulatory relationships whose absolute PCC are larger than 0.6 were considered as significant.
Gene Ontology analysis: The Biological Networks Gene Ontology tool (BiNGO) is an open-source Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes.The BiNGO analysis (Maere et al., 2005) was used to identify overrepresented GO categories in biological process.
Regulation network construction: Using the regulation data that have been collected from TRANSFAC database and TRED database, we matched the relationships between differentially expressed TFs and its differentially expressed target genes.
Base on the above two regulation datasets and the pathway relationships of the target genes, we constructed the regulation networks by Cytoscape (Shannon et al 2003).Base on the significant relationships (PCC > 0.6 or PCC < -0.6) between TFs and its target genes, 33 putative regulatory relationships were predicted between 7 TFs and 22 target genes.
Significance analysis of pathway: We adopted an impact analysis that includes the statistical significance of the set of pathway genes, and also considered other crucial factors such as the magnitude of each gene's expression change, the topology of the signaling pathway, their interactions, etc (Draghici et al., 2007).In this model, the Impact Factor (IF) of a pathway P i is calculated as the sum of two terms: The first term is a probabilistic term that captures the significance of the given pathway P i from the perspective of the set of genes contained in it.
It is obtained by using the hyper geometric model in which pi is the probability of obtaining at least the observed number of differentially expressed gene, Nde, just by chance (Tavazoie et al., 1999;Draghici et al., 2003).
The second term is a functional term that depends on the identity of the specific genes that are differentially expressed as well as on the interactions described by the pathway (i.e., its topology).
The second term sums up the absolute values of the perturbation factors (PFs) for all genes g on the given pathway P i .
The PF of a gene g is calculated as follows: In this equation, the first term ΔE (g) captures the quantitative information measured in the gene expression experiment.The factor ΔE (g) represents the normalized measured expression change of the gene g.The first term ΔE (g) in the above equation is a sum of all PFs of the genes u directly upstream of the target gene g, normalized by the number of downstream genes of each such gene N ds (u), and weighted by a factor βug, which reflects the type of interaction: β ug = 1 for induction, β ug = −1 for repression (KEGG supply this information about the type of interaction of two genes in the description of the pathway topology).US g is the set of all such genes upstream of g.We need to normalize with respect to the size of the pathway by dividing the total perturbation by the number of differentially expressed genes on the given pathway, N de (P i ).In order to make the IFs as independent as possible from the technology, and also comparable between problems, we also divide the second term in equation 1 by the mean absolute fold change ΔE, and calculated across all differentially expressed genes.The results of the significance analysis of pathway were shown in Table 3.
Regulation network between TFs and pathways: To further investigate the regulatory relationships between TFs and pathways, we mapped DEGs to pathways and got a regulation network between TFs and pathways (Figure 2).miRNA network construction: Using the miRNAdisease data that have been collected from HMDD and miR2Disease database, we matched the disease only to the esophageal.The esophageal related miRNA selected and then mapped the miRNA to the target genes.At last, 14 miRNA and 431 DEGs were used to construct the network.

Regulation network construction in ESCC
To get pathway-related DEGs of ESCC, we obtained publicly available microarray data sets GSE23400 from GEO.After microarray analysis, the differentially expressed genes with the fold change value larger than 1.5 of GSE23400 and p-value less than 0.05 were selected.2008 genes were selected as DEGs from GSE23400.To get the regulatory relationships, the co-expressed value (PCC ≥0.6) was chosen as the threshold.Finally, we got only 17 regulatory relationships between 8 TFs and their 16 differently expressed target genes.By integrating the regulatory relationships above, a regulation network of ESCC was built between TFs and its target genes (Figure 1).In this network, STAT1 with the higher degree forms a local network which suggesting it may plays an important role in ESCC.Both STAT1 and IRF9 regulate the ISG15 target genes directly.Besides, SP1 regulating 3 DEGs and TFAP2A regulating the CCNB1 were also observed in our network.

GO analysis of the regulation network in ESCC
Several Gene Ontology (GO) categories were enriched among these genes in the regulatory network, including negative regulation of biological process, tissue development, embryonic eye morphogenesis and so on (Table 2).

Significant pathway in ESCC
To identify the relevant pathways changed in lung adenocarcinoma, we used a statistical approach on pathway level.Significance analysis at single gene level may suffer from the limited number of samples and experimental noise that can severely limit the power of the    chosen statistical test.Pathway can provide an alternative way to relax the significance threshold applied to single genes and may lead to a better biological interpretation.So, we adopted a pathway based impact analysis method that contained many factors including the statistical significance of the set of differentially expressed genes in the pathway, the magnitude of each gene's expression change, the topology of the signaling pathway, and their interactions and so on.The impact analysis method yields many significant pathways contained Cardiac muscle contraction, Cell cycle, ECM-receptor interaction and so on (Table 3).

Regulation network between TFs and pathways in ESCC
To further investigate the regulatory relationships between TFs and pathways, we mapped DEGs to pathways and got a regulation network between TFs and pathways (Figure 2).In the network, SP3 and ATAT1 were shown as hub nodes linked to lots of esophageal related pathways.Some of TFs Interactive regulated lots of pathways, such as STAT1 and SP3 both regulated the Phagosome pathway, and STAT1 and IRF9 both regulated the RIG-Ilike receptor signaling pathway.

miRNA network analysis
The miRNA network further confirms that the DEGs were highly related to ESCC.Finally, we selected 14 miRNA and 431 DEGs that response to esophageal (Figure 3).

Discussion
From the result of regulation network construction in ESCC, we could find that many TFs and pathways closely related with ESCC have been linked by our method.The gene STAT1 is shown as hub nodes in our transcriptome network.Although the role of STAT1, Sp1 and ISG15 in ESCC has not been investigated to verify, some evidences also suggest that they may play important roles in response to ESCC.DOI:http://dx.doi.org/10.7314/APJCP.2012.13.3.767Transcriptome Network Analysis Reveals Potential Candidate Genes for ESCC Sp1 transcription factor was over-expressed in ESCC cells and the increased Sp1 staining was observed in esophageal tumors from patients (Papineni et al., 2009).However, Sp1 plays an important role in ESCC through regulating other target genes expression, such as fascin.Fascin expression was enhanced by Sp1 over-expression and blocked by Sp1 RNAi knockdown.Specific inhibition of ERK1/2 decreased phosphorylation of Sp1, and thus suppressed the transcription of the FSCN1, resulting in the down-regulation of fascin.Stimulation with EGF could activate the ERK1/2 pathway and increase phosphorylation levels of Sp1 to enhance fascin expression (Lu al., 2010).
STAT1 can be phosphorylated by the receptor associated kinases, and translocate to the cell nucleus where they act as transcription activators in response to cytokines and growth factors such as EGF.EGF treatment leads to a strong growth arrest in three esophageal squamous cell carcinoma cell lines.STAT1 was also found to be activated by EGF in this three cell lines.Therefore, the EGF-STAT1 pathway may be intrinsic to esophageal epithelial lineage of cells and is lost in a considerable fraction of the carcinomas (Watanabe et al., 2001).And, the introduction of a dominant-negative STAT1 construct into TE8 cells abolished not only ESCC cells growth inhibition but also p21/WAF1 induction by EGF (Ichiba et al., 2002).
RARG (retinoic acid receptor gamma 1), is a nuclear hormone receptor that act as ligand-dependent transcriptional regulators.RARs have numerous target genes, which have retinoic response elements in their promoter regions.RARG were down-regulated in esophageal adenocarcinoma tissues (Hourihan et al., 2003).
Cyclin B1 plays an important role as a mitotic cyclin in the G (2)-M phase transition during the cell cycle.One-, 3-, and 5-year survival rates in esophageal squamous cell carcinoma (ESCC) cells with cyclin B1 expression were significantly lower than those in ESCC cells without cyclin B1 expression (Nozoe et al., 2002).And ESCC cells over-expressing cyclin B1 reveal strong invasive growth and high potential of metastasis to lung in xenograft mice.Suppression of cyclin B1 expression via small interfering RNA approach specifically inhibits their ability to metastasize (Song et al., 2008).
TAP1 protein is a member of the MDR/TAP subfamily of ATP-binding cassette (ABC) transporters.This protein is involved in the pumping of degraded cytosolic peptides across the endoplasmic reticulum into the membranebound compartment where ABC1 molecules assemble.TAP1, as an antigen-processing machinery component, was down-regulated in 44% of ESCC lesions.TAP1 loss was significantly associated with tumor grading, lymph node metastasis and depth of invasion (Ayshamgul et al., 2011).
ISG15 is an ubiquitin-like protein that becomes conjugated to many cellular proteins upon activation by interferon-alpha and -beta.5-Fluorouracil (FU) is a chemotherapeutic agent commonly used against ESCC.Study showed that three IFN-related genes, an IFN receptor gene (IFNAR2) and two IFN-stimulated genes (ISG15K, ISG-54K) were up-regulated after three cell lines (T.T, TE-2, and TE-6), derived from human ESCC, were exposed to 5-FU indicating a combination of 5-FU and an IFN, may be particularly efficacious in ESCC (Matsumura et al., 2005).
Prior laboratory prediction of individual drug response is of key importance in ESCC.Interferon induced transmembrane protein 1 (IFITM1) gene alone, being suggested as a key gene of Wnt pathway, was commonly selected in 5-fluorouracil (5-FU) and cis-platinum (CDDP) screening methods using a new statistical analysis of oligonucleotide microarray expression data, based on a two-dimensional mixed normal model.The results suggested the IFITM1 gene was a novel critical biomarker of CDDP response in ESCC (Fumoto et al., 2008).IFITM1 was up-regulated in esophageal tumor tissue from patients (Chattopadhyay et al., 2009).
Using a case-control design, the authors measured the odds of being diagnosed with colorectal adenocarcinoma some time in life among persons diagnosed with adenocarcinoma of the esophagus relative to persons diagnosed with squamous cell carcinomas of the esophagus.The results concluded that men with esophageal adenocarcinoma may be more likely to be diagnosed with colorectal cancer in their lifetime than expected, but the opposite association may exist for women providing additional evidence that some colorectal and esophageal adenocarcinoma share a common etiology (Vaughan et al., 1995).Some studies also reported the risk of colorectal cancer in Barrett's esophagus patients was more than six times greater than in those patients participating in colonic screening programs (Morgan, 1996).
Primary malignant melanoma of the esophagus is a rare but aggressive tumor that accounts for <1% of all ESCC (Lin et al., 2006).To date, dozens of melanomaassociated antigens (MAGEs) have been identified in relation with tumor genesis which can be recognized by T lymphocytes to induce immune reaction.Therefore, MAGE genes take part in the immune process by targeting some early tumor cells for immune destruction.Over-expression of 2b,3,4,6,9,10,and 12 was found in esophageal adenocarcinoma relative to Barrett's metaplasia which may provide potential targets for immunotherapy in patients with esophageal adenocarcinoma (Lin et al., 2004).
One case of ESCC revealed serum prostate-specific antigen (PSA) and γ-seminoprotein levels elevated in blood biochemistry detection.Furthermore, PSApositive mRNA was demonstrated in the tissue of the esophageal tumor by RT-PCR.Previous reports also suggested esophagus cancer might result from prostate cancer metastasis (Nakamura et al., 1997).Annexin I is a pleotrophic, calcium-dependent phospholipid binding protein.Study indicated loss of inhibition of annexin I appeared associated with a lack of cellular differentiation, and annexin I protein expression was decreased in human esophageal and prostate cancer, which indirectly suggested that some correlation consisted in esophageal and prostate carcinoma onset (Paweletz et al., 2000).
A deeper understanding of transcription factors and their regulated genes remain an area of intense research activity in futures.Our regulation network is useful in investigating the complex interacting mechanisms of transcription factors and their regulated genes in disease.However, further experiments are still needed to confirm the conclusion.Our network could explain the mechanism of esophageal not only from the regulation network but the miRNA network.The 14 miRNAs were related to esophageal collected from HMDD and miR2Disease database.

Figure
Figure 1.Regulation Network Construction in ESCC

Figure 2 .
Figure 2. Regulation Network of TF-PATHWAY Figure 3. miRNA Network.The red points strand for the miRNA, the yellow points strand for the TFs and the pink points strand for target genes

Table 1 . Regulation Data from TRANSFAC and TRED
.,