Bicluster and Pathway Enrichment Analysis of HCV-induced Cirrhosis and Hepatocellular Carcinoma

Hepatocellular carcinoma is the sixth most common cancer worldwide and the most common form of liver cancer, being responsible for 80% of the primary malignant liver tumors in adults (Avila et al., 2006; Ferlay et al., 2010). In 2008, an estimated 748,000 new cases of liver cancer occurred and approximately 696,000 people died of this cancer worldwide (Abedi-Ardekani et al., 2011). This disease is most prevalent in Eastern and Southeastern Asia, and Middle Africa, with more than half of the patients being reported from China (Avila et al., 2006). HCC is the third most common cancer in China with the age-standardized rate of 37.4 and 34.1 per 100,000 person-years in males and females, respectively (Ferlay et al., 2010). Much is known about the development and causes of HCC. The risk factors for liver carcinogenesis include chronic infections with hepatitis B (HBV) and C (HCV) viruses, chronic alcohol consumption (Donato et al., 2002), consumption of aflatoxin B1 (AFB1) (Soini et al., 1996), contaminated food (Abedi-Ardekani et al., 2011) and type 2 Diabetes (Wideroff et al., 1997; El-Serag et al., 2006). Most HCC is thought to be associated with either chronic hepatitis C virus (HCV) or hepatitis virus (HBV) infection (Barazani et al., 2007). HCV causes a chronic


Introduction
Hepatocellular carcinoma is the sixth most common cancer worldwide and the most common form of liver cancer, being responsible for 80% of the primary malignant liver tumors in adults (Avila et al., 2006;Ferlay et al., 2010).In 2008, an estimated 748,000 new cases of liver cancer occurred and approximately 696,000 people died of this cancer worldwide (Abedi-Ardekani et al., 2011).This disease is most prevalent in Eastern and Southeastern Asia, and Middle Africa, with more than half of the patients being reported from China (Avila et al., 2006).HCC is the third most common cancer in China with the age-standardized rate of 37.4 and 34.1 per 100,000 person-years in males and females, respectively (Ferlay et al., 2010).
Current evidence suggests that the early stages in HCC development are characterized by certain common traits governed by both genetic and epigenetic mechanisms.A variety of genomic studies have identified potential biomarkers for detection of early HCC (Marrero and Lok, 2004), such as GPC3 (Capurro et al., 2003), TERT, STK15, PLA2 (Smith et al., 2003) and HSP70 (Chuma et al., 2003).Most of the protein markers used for histopathological analysis are associated with cell proliferation and resistance to cell death (Hui et al., 1998;Thorgeirsson and Grisham, 2002).However, the process of HCV-induced carcinogenesis is still poorly understood.Therefore, it is important to better understand the roles of deregulated genes in the progression from healthy liver to cirrhotic and then hepatocellular carcinoma.
Factor Analysis for Bicluster Acquisition (FABIA) is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data.The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques (Hochreiter et al., 2010).In this present study, we performed bicluster analysis using fabia to classify the genes in the progression and identified the differentially expressed genes among healthy liver samples, cirrhotic samples and HCC samples.Furthermore, we performed pathway enrichment analysis using BiNGO to identify the significant pathways related to the DEGs.We sought to explore the underlying molecular mechanism of the progression of HCV-induced hepatocarcinogenesis.

Materials and Methods
The transcription profile of GSE6764 was downloaded from GEO (http://www.ncbi.nlm.nih.gov/geo/), a public functional genomics data repository, which are based on the Affymetrix GPL570 platform data (Affymetrix Human Genome U133 plus 2.0 Array).Total 75 samples including 65 samples of HCV-induced HCC and 10 samples of normal liver were used to analysis.Sixtyfive samples were obtained from 38 patients with HCV infection representing 13 samples from cirrhotic tissue, 17 dysplastic nodules, and 35 HCCs.
Kyoto Encyclopedia of Genes and Genomes (KEGG) (http://www.genome.jp/kegg/) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals (Kanehisa, 2002).The 'pathway' database records networks of molecular interactions in the cells, and variants of them specific to particular organisms.FABIA is developed to find biclusters that have correlated rows and columns.More precisely, the rows in the bicluster need to be only correlated across the columns of the bicluster and vice versa.Bicluster analysis with the FABIA using the fabia package in Bioconductor (Gentleman et al., 2004) was performed to find the enriched biclusters.The package allows to extract biclusters from data sets based on a generative model according to the FABIA method (Hochreiter et al., 2010).
Gene Ontology (GO) analysis has become a commonly used approach for functional studies of large-scale genomic or transcriptomic data (Hulsegge et al., 2009).
BiNGO (Maere et al., 2005) is an open-source Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes.We used the BiNGO to identify over-represented GO categories in biological process.Moreover, hypergeometric tests were used for statistical analysis and the BH method (Benjamini, 1995) False Discovery Rate (FDR) procedure was used for the multiple testing correction with the hypergeometric distribution.The FDR less than 0.05 was chosen as the threshold.We performed GO enrichment analysis for each bicluster respectively, and selected the most significant GO term of each bicluster as its GO term annotations.
For the GSE6764 dataset, the SAM method (Li and Tibshirani, 2011) (http://www-stat.stanford.edu/~tibs/SAM/) was used to identify DEGs.The DEGs only with fold change value larger than 2 were selected.

Results
For dataset GSE6764, we performed data preprocessing using R and Bioconductor.After deleting the unqualified probes, we got 13127 probes to perform bicluster analysis with fabia package.Total 19 biclusters were obtained (Table 1).GO enrichment analysis was performed for each bicluster respectively, and selected the most significant GO term of each bicluster as its GO term annotations.As shown in table 1, several biological processes were enriched, such as "cellular metabolic process" (bicluster7, 8, 10, 13, 14, 16), "immune system process" (15, 19), "response to chemical stimulus" (12, 18), "calciumdependent cell-cell adhesion" (5) and so on.
In order to get differentially expressed genes, we obtained publicly available microarray dataset GSE6764 from GEO.For the GSE6764 dataset, the SAM method was used to identify DEGs.After microarray analysis, the differentially expressed genes with the fold change value larger than 2 were selected.We got 501 DEGs between healthy liver samples and cirrhotic samples, with 81% (408) up-regulation and 19% (93) down-regulation.Total    328 DEGs with 31% (102) up-regulation and 69% (226) down-regulation were identified between healthy liver samples and HCC samples and 395 DEGs with 32% (125) up-regulation and 68% (270) down-regulation were identified between cirrhotic samples and HCC samples (Figure 1).There were 7 overlapping genes which expressed differentially in all the three type samples.The 7 DEGs are CLEC4M, MAP2, APOF, NAT2, CHST4, PROM1, and SFRP5.We held the opinion that these 7 genes play an important role in the progression from healthy liver to cirrhotic and then HCC.To identify the functional annotation of the 7 DEGs, we performed GO enrichment analysis, and 6 biological processes were enriched, including "leukocyte cell-cell adhesion", "virion transport", "peptide antigen transport" and "regulation of Wnt receptor signaling pathway" (Table 2).
For the DEGs of GSE6764, we performed pathway enrichment analysis to find the most significant pathways related to the progression of HCV-induced hepatocarcinogenesis.
There were 501 DEGs between healthy liver samples and cirrhotic samples.Of these, 408 genes were up-regulated and 93 genes were down-regulated.We performed pathway enrichment analysis of the upregulated and down-regulated genes, respectively.After p-value correction, the pathways whose Bonferroni, Benjamini, and FDR at least two values less than 0.05 were selected as the significant pathways.Finally, we got 11 enriched pathways in up-regulated genes and 0 enriched pathways in down-regulated genes (Table 3).The enriched pathways include "ECM-receptor interaction", "focal adhesion", "cell adhesion molecules (CAMs)" and so on.There were 395 DEGs between cirrhotic samples and HCC samples.Of these, 125 genes were up-regulated and 270 genes were down-regulated.We performed pathway enrichment analysis and screened as described previously.Finally, we got none enriched pathways in up-regulated genes and 1 enriched pathway in down-regulated genes.The enriched pathway was "complement and coagulation cascades"

Discussion
Hepatocellular carcinoma (HCC) is a common primary cancer associated frequently with hepatitis C virus (HCV) (Smith et al., 2003).To gain insight into the molecular mechanism of HCV-induced hepatocarcinogenesis, we performed bicluster analysis of the cDNA microarray obtained from GEO on 75 surgical liver samples from 48 HCV-infected patients, and identified the differentially expressed genes in the progressive stages of hepatocarcinogenesis. Furthermore, we performed pathway enrichment analysis of the DEGs.
In our result of bicluster analysis of the gene expression profile, we can find that total 19 clusters were enriched.The "cellular metabolic process" was the most significant biological process.Cellular metabolism is the set of chemical reactions that occur in living organisms in order to maintain life.This process allows organisms to grow and reproduce, maintain their structures, and respond to environmental changes.Cells from some tumors use an altered metabolic pattern compared with that of normal differentiated cells in the body (Levine and Puzio-Kuter, 2010).Otto Warburg postulated that change in metabolism is the fundamental cause of cancer (Warburg, 1956).
The other significant biological process was "immune system process".This reflects the transition from normal to carcinoma, where the liver function becomes impaired and extracellular matrix deposition (Bieche et al., 2005).The immune system can specifically identify and eliminate tumor cells on the basis of their expression of tumor-specific antigens or molecules induced by cellular stress (Swann and Smyth, 2007).However, HCV has evolved a variety of mechanisms to establish persistent infections and evade the host immune response (Block et al., 2003).Previous studies have shown that expression of HCV core activate NF-кβ, a transcription factor that is involved in regulating the immune response (Zhu et al., 2001).Hepatocytes from patients chronically infected with HCV show elevated levels of NF-кβ protein and increased NF-кβ DNA binding activity (Tai et al., 2000).It is suggested by Timothy et al. that HCV core protein promotes persistent infection by downregulating the host immune system.
The "response to chemical stimulus" was also observed in our bicluster analysis (bicluster 12 and bicluster 18).We checked the condition12 and condition18 and found that these conditions only contain the samples from early, middle and advanced stage of HCC.Based upon these observations, we hypothesized that these biclusters were result from the chemotherapy to the tumor.
Total 7 genes were identified as DEGs of GSE6764, including CLEC4M, MAP2, APOF, NAT2, CHST4, PROM1, and SFRP5.This result suggesting that these genes play an important role in the progression of this disease, especially the CLEC4M and CHST4.
CLEC4M(C-type lectin domain family 4 member M), also been designated as CD209L, encodes a transmembrane receptor L-SIGN.Liver/lymph nodespecific intercellular adhesion molecule-3-grabbing integrin (L-SIGN) is a calcium-dependent lectin expressed mainly on endothelial cells of liver and lymph nodes (Koppel et al., 2005).Jason et al. have demonstrated that L-SIGN is a liver-specific receptor for HCV, and may play important roles in HCV infection and immunity (Gardner et al., 2003;Cormier et al., 2004).L-SIGN may establish cellular interactions with T cells (Bashirova et al., 2001) and enable activated T cells to recirculate to the liver or to the lymph nodes through interactions with L-SIGN (Cocquerel et al., 2006).L-SIGN and DC-SIGN may rather contribute to the establishment or persistence of infection, both by the capture and delivery of HCV to the liver and by modulating dendritic-cell functions as suggested by Cormier et al. (2004) and Lozach et al. (2004).
CHST4 (Carbohydrate sulfotransferase) encodes an N-acetylglucosamine 6-O sulfotransferase.This protein is involved in the modification of glycan structures on ligands of the lymphocyte homing receptor L-selectin and plays a central role in lymphocyte trafficking during chronic inflammation (Wu et al., 2011).
Of the 11 up-regulated genes related pathways, the "ECM-receptor interaction" was the most significant.The extracellular matrix (ECM) consists of a complex mixture of structural and functional macromolecules and serves an important role in tissue and organ morphogenesis and in the maintenance of cell and tissue structure and function.Cirrhotic liver contains approximately six times more ECM overall than normal liver (Kumar and Sarin, 2007).ECM can directly influence the function of surrounding cells through interaction with cell surface receptors, including integrins and nonintegrin matrix receptors.The interaction of ECM with integrins is thought to be an important factor in cell-cell and cell-substrate adhesions in tumors, tumor invasion and metastases (Jaskiewicz et al., 1993).ECM can also indirectly affect cell function via release of soluble cytokines, which in turn are controlled by local metalloproteinases (Seyer et al., 1977;Jaskiewicz et al., 1993).This pathway is not only related to the transition from normal to cirrhosis, and also related to the hepatocarcinogenesis of cirrhosis.
Only one pathway were enriched in the down-regulated genes between cirrhosis and HCC, that is "complement and coagulation cascades".This result suggesting that the activity of blood coagulation factor decreased significantly in the progression from cirrhosis to HCC, which may because of the damage of normal liver cells in the progression.Previous studies suggested that the level of coagulation factor can be used to diagnose cirrhosis and primary HCC, and is an indicator of prognosis of liver disease (Amitrano et al., 2002).Therefore, further analysis and investigation of this pathway will be of clinical significance to understand this carcinogenesis.
In conclusion, we have used bioinformatics methods to analyse the potential molecular mechanism of HCVinduced hepatocarcinogenesis. Our analysis indicated several differentially expressed genes might play crucial roles in hepatocarcinogenesis, including CLEC4M, MAP2, APOF, NAT2, CHST4, PROM1, and SFRP5.Furthermore, our results predicted the differentially expressed genes might be involved in hepatocarcinogenesis through upregulating the pathways of ECM-receptor interaction, focal adhesion, cell adhesion molecules and other cancer-related pathways, and downregulating the pathways of "complement and coagulation cascades".We expect numerous advanced researches in HCV-induced hepatocellular carcinoma in the coming years based on our Meta-analysis.

Figure 1 .
Figure 1.VENN Graph Display the Information of Our Data.Total 7 overlapping genes were selected as DEGs from GSE6764.The 'control' represents for the healthy liver samples, 'ci' represents for cirrhotic samples and 'HCC' represents for hepatocellular carcinoma samples