Construction of a Protein-Protein Interaction Network for Chronic Myelocytic Leukemia and Pathway Prediction of Molecular Complexes

Chronic myelogenous leukemia (CML) is a kind of chronic myeloproliferative disorders of primitive pluripotent hematopoietic stem cells. The special cytogenetic features of Chronic myelogenous leukemia cells is Philadelphia chromosome t (9; 22) (q34; q11). The proto-oncogene c-ABL in the long arm of Chromosome 9 translocates to the breakpoint cluster pool (BCR) of the long arm of chromosome 22 , formating fusion gene BCR-ABL. Which encodes a protein having a strong activity of tyrosinase. The tyrosine kinase pathway can be phosphorylated by a variety of genes such as RAS, myc, c-CBL, phthalocyanine inositol phospholipid 3 ‘kinase, and NF-kB. Downstream of tyrosinase stimulation is the expression of oncogenes including c-fos, c-jun, etc. which stimulate cell non-growth factor-dependent proliferation and block apoptosis eventually leading to uncontrolled cell proliferation. BCR-ABL gene is considered as the molecular basis of the pathogenesis of CML and as an effective indicator diagnosis, efficacy, prognosis. The gene targeted drugs for the treatment of leukemia has


Introduction
Chronic myelogenous leukemia (CML) is a kind of chronic myeloproliferative disorders of primitive pluripotent hematopoietic stem cells.The special cytogenetic features of Chronic myelogenous leukemia cells is Philadelphia chromosome t (9; 22) (q34; q11).The proto-oncogene c-ABL in the long arm of Chromosome 9 translocates to the breakpoint cluster pool (BCR) of the long arm of chromosome 22 , formating fusion gene BCR-ABL.Which encodes a protein having a strong activity of tyrosinase.The tyrosine kinase pathway can be phosphorylated by a variety of genes such as RAS, myc, c-CBL, phthalocyanine inositol phospholipid 3 'kinase, and NF-kB.Downstream of tyrosinase stimulation is the expression of oncogenes including c-fos, c-jun, etc. which stimulate cell non-growth factor-dependent proliferation and block apoptosis eventually leading to uncontrolled cell proliferation.BCR-ABL gene is considered as the molecular basis of the pathogenesis of CML and as an effective indicator diagnosis, efficacy, prognosis.The gene targeted drugs for the treatment of leukemia has

Construction of a Protein-Protein Interaction Network for Chronic Myelocytic Leukemia and Pathway Prediction of Molecular Complexes
Chao Zhou 1 , Wen-Jing Teng 2 , Jing Yang 1 , Zhen-Bo Hu 3 , Cong-Cong Wang 3 , Bao-Ning Qin 2 , Qing-Liang Lv 3 , Ze-Wang Liu 2 , Chang-Gang Sun 1 * played a good effect but can not be cured, and gradually develop resistance.It requires our understanding of CML systematically to find more key genes and signaling pathways , and research on multi-target therapy or multidrug combination therapy can bring a hope for CML cure.. Recently, systems biology approaches such as networkbased methods have been successfully applied to elucidate the mechanism of diseases (Ideker et al, 2008;Zhao et al., 2008).From the systematic perspective, analysis of CML related bio-molecular interaction networks will improve the understanding of the complexity of molecular pathways underlying CML and will help to uncover the dynamic processes of disease progression.
In this study, we screened the confirmed disease-related genes using online Mendelian Inheritance in Man database (OMIM), created a protein-protein interaction networks based on biological function using Cytoscape software, detected molecule complex which may be included in the network, and conducted prediction of the relevant pathways that may be involved.We provided a basis for further explaining the mechanism of chronic myeloid leukemia development.

Materials and Methods
Design: Study on enrichment of genomic biological pathways.

Data acquisition
OMIM is a comprehensive, authoritative, daily updated human phenotype database, containing more than 12 000 genes of all human genetic diseases, and mainly focusing on hereditary diseases.Text messages, related reference information, sequence records, maps, and related databases are available for each gene.(Hamosh et al., 2005;Amberger et al., 2009).

The construction of gene/protein interaction networks
Chronic myelogenous leukemia associated genes were submitted to Cytoscape 2.8.2 plug-in Agilent Literature Search 2.7.7 (USA Agilent Technologies company) and Pubmed (Vailaya et al., 2005).False positive interaction information was removed from retrieval results.Then, gene/protein interaction relations were read in Cytoscape 2.8.2 and visualized (Shannon et al., 2003).

Network analysis
MCOMD algorithm in Cytoscape 2.8.2 web analytics plug-in Clusterviz of 1.2 was administrated to make the correlation analysis for the area of the construction of biological networks (Rintaro et al., 2012).By analyzing the network structure, proteins were grouped to form molecular compounds in the entire network and were shown in Cytoscape according to the correlation integral value.The areas with integral value higher than 3 were regarded as molecular compounds.The gene/protein names contained in the molecular compounds were submitted to The Database for Annotation, Visualization and Integrated Discovery (Huang et al., 2009).By retrieving Kyoko Encyclopedia of Genes and Genomes (KEGG) Database, biological pathways involved in chronic myelogenous leukaemia heredity were identified.

Main outcome measures
Protein networks were constructed based on the chronic myelogenous leukemia -related genes, nodes (proteins) and edges (interaction between), molecular complexes in the network and its associated interaction points and nodes (protein) and the edge (interaction between ), analyze the biological pathways involved in the molecular complexes.

Protein interaction networks
Through text mining 79 genetic-related genes showed that there was a network diagram with 638 nodes (proteins) and 1830 edges.As shown in Figure 1, the triangles represented OMIM genetic disease related proteins, while the diamonds represented the proteins obtained from text mining.

Network topology attribute analysis
Network topology attribute analysis shows that the connectivity of nodes in the network (the number of nodes in the network ) obeys descending distribution, i.e. with the increase of edges connected to the node, correspondingly the number of nodes decrease , so it can be seen that the gene -protein interaction networks are scale-free networks (Burkard et al., 2010).We found that the connectivity of nodes in the network greater than or equal to 20 corresponds to a sharp reduction in the number of nodes.Therefore, we regarded the nodes which the connectivity is greater than / equal to 20 as the key nodes (hub).Key nodes included: TP53, AKT1, MAPK8, STAT3, EPHB2, MYC, TNFSF13, PIK3CA, JUN, JAK2, MAPK3, SRC, ABL1, ERVK2, IL6, MAPK14, FRAP1, BCL2L1, HCC, PCBP2, TNFSF10, BCR, CD4, CDKN1A, PARP1 etc.
Comparison(The horizontal axis represents betweenness, and the ordinate represents the connectivity degree.And the graphic in the table represents each node in the network).As can be seen that the connectivity (the number of nodes in the network ) of nodes in the network obeys descending distribution, while the connectivity is greater than /equal to 20, the number of nodes corresponding to a sharp decrease.The detection of molecular complexes MCOMD algorithm analysis, there is a total of five molecular complexes whose correlation integral values are higher than 3.

Molecular complex pathway enrichment
Submit the 5 names of protein molecule complexes online to obtain the relevant passage in Table 1.

Discussion
In recent years, scientists have did a lot of studies of molecular mechanisms to CML:K Sailaja etc.suggested that the CYP3A5*3 gene polymorphism is significantly associated with the risk of CML development and disease progression (K Sailaja et al., 2010), Gulzar etc. proved that the presence of GSTT1genotype may have protective role against the CML (Gulzar et al., 2012), Khaldon Bodoor etc. highlighted an essential role of DAPK methylation in chronic leukemia in contrast to p15 methylation in the acute cases (Khaldon et al., 2014).Previous studies mainly used regression/variance analysis without considering the entire biological network analysis and targeted therapy prediction.Our results provide new understanding of CML pathogenesis and prognosis with the hope to offer theoretical support for future therapeutic studies.
Biomolecular network analysis is an important direction of systematic biology research.Understanding the complexity of molecular networks is an important challenge to the post-genomic era.The function of biomolecular function is often dependent on modularization, the network module which is made by a number of nodes in conjunction with each other and has a stable structure which can often reflect a similar nature between nodes (Chen et al., 2010;Kovács et al., 2010).
Analyzing the function module is the most common method to analyze biological molecular network.For example, Jin Gu etc. used gene network modules response to study inflammation and angiogenesis, and had predicted that ADAM17, CD40, ETS1, FOXO1, SMAD3 and TLR4 are the target genes of mir-145 (Jin et al., 2010), Francisco Azuaje and others have found that ITGB1 and FN1 are potential biomarkers associated with heart failure through analyzing related protein network module (Francisco et al., 2010), Zhu XL etc. have used the Weighted Gene Co-expression Network Analysis method to identify prognostic markers for EC and characterized underlying molecular mechanisms by KEGG pathway enrichment and transcriptional regulation analyses (Zhu et al., 2012).Feng DQ etc. have characterized different KEGG pathways and GO terms by altered gene expression with complex pathway analysis, and identified potentially subjected to miRNA regulation (Feng et al., 2012).It can be seen, that the analysis of functional module can reflect the overall web properties of biological molecules.
There are three main categories to construct Biomolecular regulatory network : constructing gene regulatory networks through a mathematical model ; building a network through literature mining and integrating multiple data (Friedman et al., 2000;Hwang et al., 2005;Mohamed et al., 2009).Liu Y etc. built a PPI network using the nearest neighbor expansion method from existed database, and analyzed the key protein nodes (Liu et al., 2013).But in our experiment we built network with the method of literature mining, then analysised modules to draw relevant signaling pathways and hub proteins, the new method and multi-angles deepened disease research.Building a network through literature mining means using bioinformatics , computational biology and other fields of computer science to analyzed the knowledge of the literatures, and build biomolecules regulatory networks by using the relationship between gene / protein interaction of the exciting literatures.The advantage of this method is that establishing the relationship between the regulation accurately and the network stably.
Therefore, in this paper we use the OMIM genes information and literature mining constructs to construct the chronic myeloid leukemia protein interaction networks.Although, the data of this experiment is genetic genes, but to elaborate and prove its functionality as well as the relationship with other molecules is mostly at the protein level in the literatures.For this reason, this network constructed combining with literature data mining should be a protein interaction network.In Cytoscape, each protein is represented as a node, and each interaction between two nodes is represented as an edge.To increase the liabilities and define the protein perturbation to a certain level, the network reconstruction was limited to the first protein neighbors (Wu et al., 2014).
According to 79 genes provided by OMIM, this study has built chronic myelogenous leukemia protein interaction network, containing 638 nodes (proteins), 1830 edge (interaction).Can the network describe molecular regulation of chronic myelogenous leukemia in the development?According to the existing literature (Biethahn et al., 1999;Reddy et al., 2000;Reilly et al., DOI:http://dx.doi.org/10.7314/APJCP.2014.15.13.5325 A Protein-Protein Interaction Network of Chronic Myelocytic Leukemia andPathway Prediction of Molecular Complexes 2002;Margaret, 2002;Bader et al., 2003;Donald et al., 2005;Jayaraman et al., 2012) BCR, jun is the main reason of chronic myelogenous leukemia genes; as a cancer gene, myc is downstream signaling molecules of BCR ABL fusion protein, involved in the pathogenesis of chronic myelogenous leukemia; Number-growing evidence suggests the role of Stat protein family in the inition and development of leukemia, and the activation of the Stat proteins are associated with poor prognosis and shortened survival in patients; CD4's immune inhibitory effect make cancer gene to escape immune surveillance; IL -6 is a pleiotropic cytokine that can regulate a variety of cellular functions: including cell proliferation, cell differentiation, immune defense mechanisms and blood cell production, etc.; the abnormalities of tumor suppressor gene p53 and oncogene myc is associated with rapid change of chronic myelogenous leukemia ;the mutation of JAK2 V617F in chronic myeloid leukemia sex is rare, and CML patients with this mutation have a longer survival, JAK2 V617F mutation may indicate a good prognosis; Experiments which use the radiolabeled molecules LymphoRad131 TNFSF13B to assist cancer chemotherapy has entered clinical trials; the inordinate expression of apoptosis-related gene bcl-x, anti-apoptotic gene akt1, proto-oncogene scr in time and space make their protein product change in the quality and quantity , and this is an important cause of tumorigenesis.These are relational pathogenesis of chronic myelogenous leukemia, We constructed a network which comprise these genes or proteins , this proves that genes -protein interaction network we constructed has a certain authenticity that can be used to describe the interaction of chronic myeloid leukemia in molecules.
As the network is very large, the experimental introduced MCOMD algorithm to evaluate the network's regional integration through the correlation integral.Correlation integral is descripted of proteins associated with the degree within the region.Proteins of the same molecular complexe generally have the same biological function, so we can discover unknown gene functions or new molecular functional groups.Research shows 5 molecular complexes that is more than 3 in correlation integral.This knowledge DAVID is not only extensive genes annotation in different species, but also enriched to biological information of a single gene.The Protein molecule biological pathways of complexes 2, 4, 5 are not exist, There are two reasons speculated: Although the relevance of these molecular complexes correlation integral is higher, but it can not prove that it contains a protein with similar biological functions; Existing studies have not revealed their biological pathways involved.Molecular complex 1, 3 involved many biological pathways, Figure 3 shows its complexity, Where there are 6 biological pathways whose FDR < 1 in molecular complex 1, and 36 in molecular complex 3.
Molecular complex 1 can be predicted to be relative to complex regulation of cytokines, specifically cytokines and inflammation, cytokine receptor interaction and biological receptor signal transduction pathways.Existing literature indicates myeloproliferative diseases are highly sensitive to cytokine, these genes of signaling pathway may provide research direction for molecular therapy of CML.Molecular complex 3 Shows the complex biological behavior of chronic myeloid leukemia and extensive correlation with other cancers.CML is demonstrated not simply controlled by a particular gene or signaling pathways, but by the complex process of network system coordinately regulated which consisted of a variety of signaling pathways and multiple genes.In the signaling network , it is likely there is some " key regulatory point".
The experiment dig out a variety of signaling pathways and genes, and with the application of protein network greatly expanded, such as transcriptome network analysis and pathway crosstalk analysis (Pan, 2012) , it can provide reliable directions for molecular mechanism research of CML treatment.

Figure 1 .
Figure 1.Network Map of Chronic Myeloid Leukemia Protein Interaction (Overall + partial)

Figure
Figure 3. Molecular Complexes Obtained by MCOMD Algorithm Avnalysis