RESEARCH ARTICLE Network Analyses of Gene Expression following Fascin Knockdown in Esophageal Squamous Cell Carcinoma Cells

Fascin-1 (FSCN1) is an actin-bundling protein that induces cell membrane protrusions, increases cell motility, and is overexpressed in various human epithelial cancers, including esophageal squamous cell carcinoma (ESCC). We analyzed various protein-protein interactions (PPI) of differentially-expressed genes (DEGs), in fascin knockdown ESCC cells, to explore the role of fascin overexpression. The node-degree distributions indicated these PPI sub-networks to be characterized as scale-free. Subcellular localization analysis revealed DEGs to interact with other proteins directly or indirectly, distributed in multiple layers of extracellular membrane-cytoskeleton/ cytoplasm-nucleus. The functional annotation map revealed hundreds of significant gene ontology (GO) terms, especially those associated with cytoskeleton organization of FSCN1. The Random Walk with Restart algorithm was applied to identify the prioritizations of these DEGs when considering their relationship with FSCN1. These analyses based on PPI network have greatly expanded our comprehension of the mRNA expression profile following fascin knockdown to future examine the roles and mechanisms of fascin action. and in vivo , suggesting fascin crucial roles in regulating neoplastic progression. Affymetrix GeneChip Human U133 plus 2.0 arrays the expression profile in ESCC stably fascin shRNA


Introduction
Fascin-1 (FSCN1) is an actin-bundling protein that induces cell membrane protrusions and increases cell motility (Hashimoto 2011;Tan et al., 2013). FSCN1 mainly accumulates in cellular structures containing actin bundles, including filopodia and stress fibers, causing F-actin to aggregate side-by-side into bundles (Adams 2004), but is usually absent or present at low levels in normal epithelial cells. Accumulated evidence shows fascin is overexpressed in several human epithelial cancers, such as gastric, colonic, skin, breast, and urothelial cancers (Tan et al., 2013). Overexpression of fascin in these tumors usually correlates with highgrade, extensive invasion, distant metastasis, or poor prognosis, serving as a biomarker for various aggressive carcinomas.
In our previous studies, we demonstrated that fascin becomes overexpressed in the malignant transformation of normal esophageal squamous cells to cancer cells (Rong et al., 2004), and upregulation of fascin is markedly correlated with cell proliferation and lymph node metastasis, suggesting fascin could be a potential novel biomarker for esophageal squamous cell carcinoma (ESCC) (Zhang et al., 2006). shRNA-mediated knockdown of fascin in ESCC cells decreases cell proliferation, cell invasiveness and tumor growth in vivo, suggesting fascin plays crucial roles in regulating neoplastic progression. Affymetrix GeneChip Human genome U133 plus 2.0 arrays has been applied to analyze the mRNA expression profile in ESCC cells stably expressing fascin shRNA (Xie et al., 2010).
Of the various different types of molecular interactions in living cells, protein-protein interactions (PPI) plays a crucial role in determining the multi-functionality of a single protein and mediating enormous biological processes. Virtually all proteins achieve their specific functions in biological contexts through cascades of interactions. PPI network based analyses have been developed quickly in recent years with various applications, including protein function prediction, interaction prediction, identification of disease candidate genes, gene regulation (Peng et al., 2014;Zhang et al., Deng et al., 2015;Srihari et al., 2015).
The biological information of mRNA expression profile generated from fascin knockdown has not been fully mining and described in our previous reports (Xie et al., 2010). Analyses must be pushed to make sense out of mRNA expression profiles, beyond the merely listing of affected genes in a traditional way. In this study, we analyzed the mRNA expression profile following fascin knockdown using the knowledge of protein-protein interaction networks.

Materials and Methods
The differentially-expressed genes GSE11373, the mRNA expression profile of fascin RNAi in ESCC EC109 cell line, is available from NCBI GEO database (http://www.ncbi.nlm.nih.gov/geo/). The expression data was subjected normalization and log transformation treatment, then the differentially-expressed genes (DEGs) were obtained by fold-change analysis.

PPI network construction
The latest human protein-protein interactions dataset is downloaded from HPRD (http://www.hprd.org/), which were collected from a manual search of the experimentally verified literatures (Goel et al., 2012). The current HPRD FLAT file contains 9617 unique proteins and 39140 edges (interactions). The HPRD dataset has been widely applied in human PPI network research because of its reliability (Liu et al., 2013;Wu et al., 2014). For visualization and analysis of PPI networks, open-source Cytoscape software was applied. In Cytoscape, PPI networks are represented as graphs in which the nodes are the proteins and the edges present as their interactions (Smoot et al., 2011). For visualization in the context of biological networks, the different node attribute files and visual style files were established and imported into Cytoscape.
We constructed various PPI sub-networks by mapping DEGs to the HPRD parent PPI network. First, the downregulated DEGs, up-regulated DEGs and total DEGs were mapped and extracted to their respective sub-network. To detect and extract the first level interactions between DEGs and their neighboring proteins, we used Cytoscape menus of "Select first neighbors of selected nodes" and "New network->from selected nodes, all edges". Second, fascin protein FSCN1 was used as the query node to extract interactions for the axis of FSCN1 g neighbors proteins g DEGs g neighbors proteins, constructing an FSCN1-central PPI network. Third, a sub-network was created by selecting nodes with all edges by Cytoscape after all DEGs were mapped to the HPRD PPI network to detect the internal interactions between DEGs. To avoid miscalculations of PPI network topological parameters, duplicates, single nodes and self-interactions of these sub-networks were also removed.

Network topological parameter analyses
The network topological parameters were analyzed using NetworkAnalyzer. NetworkAnalyzer is a java plugin for Cytoscape, efficiently computes a large number of topological network parameters for directed and undirected networks, such as distributions of node degrees, neighborhood connectivities, average clustering coefficients, topological coefficients, shortest path lengths, and shared neighbors of two nodes (Assenov et al., 2008). In this study, the power law of node degree distribution, one of most important network topological characteristics, was analyzed and carried out as we described previously . Briefly, the edges in all networks were treated as undirected. The degree of a protein was the number of its directly connecting neighbours in the network. Node degree distribution P (k) is defined as the number of nodes with a degree k for k=0, 1, 2, …. By fitting a line on the node degree distribution data, the pattern of their dependencies can be visualized. NetworkAnalyzer considers only data points with positive coordinate values for fitting the line where the power law curve of the form y = βxa. The R2 value is a statistical measure of the linearity of the curve fit and used to quantify the fit to the power line. When the fit is good, the R2 value is very close to 1.

Subcellular layout of PPI sub-network
The subcellular localization information of each protein in the total DEG PPI sub-network was retrieved from the Swiss-Uniprot database by a custom R program and was imported into Cytoscape as a node attribute. Then, the Cerebral (http://www.pathogenomics.ca/cerebral/) was applied to re-distribute the protein nodes according to their subcellular localization. Given an interaction network and subcellular localization annotation, Cerebral automatically generates a view of the network in the style of traditional pathway diagrams, providing an intuitive interface for the exploration of a biological pathway or system (Barsky et al., 2007). The assembled layered PPI network was divided into 10 layers according to their subcellular locations in this study: Secreted, Membrane, Cytoskeleton/Cytoplasm, Cytoplasm, Membrane/Nucleus, Membrane/Cytoplasm/ Nucleus, Cytoplasm/Nucleus, Cytoskeleton/Cytoplasm/ Nucleus, Nucleus and Downstream effectors (proteins with unknown subcellular location). Since protein translocation into nucleus could play an important role in gene expression regulation, proteins with multiple locations, especially in the nucleus, were classified in more detail when we curated the subcellular localization information of the DEGs (Zuleger et al., 2012).
The igraph R program was applied to find the shortest path between FSCN1 and FOS in the total DEG PPI sub-network. The shortest path algorithm is able to find the shortest connection between two nodes in the graph (Csardi and Nepusz, 2006). The protein members of these paths were also displayed according to their subcellular localization.

Generation of the functional annotation map
To determine whether interacting proteins in the network were clustered according to their Gene Ontology (GO) functional annotation, we integrated the GO annotation into the PPI networks by mining for overrepresented GO "Biological Process" terms for proteins using the ClueGO (http://www.ici.upmc.fr/cluego/), which allows the decoding and visualization of functional annotation of grouped GO terms in the form of networks (Bindea et al., 2009).

Random Walk with Restart to prioritize DEGs
Random Walk with Restart (RWR) is a ranking algorithm (Köhler et al., 2008). The RWR on graphs is defined as an iterative walker that starts on either a seed node or a set of seed nodes, and moves to its immediate neighbors randomly at each step. Finally, all the nodes in the graph are ranked by the probability of the random walker reaching this node. Let p0 be the initial probability vector and ps be a vector in which the i-th element holds the probability of finding the random walker at node i at step s. The probability vector at step s + 1 can be given by the following formula: p t+1 = (1-r) Wp t + rp 0 where r is the restart probability, W is the columnnormalized adjacency matrix of the network graph, and p t is a vector of size equal to the number of nodes in the graph where the i-th element holds the probability of being at node i at time step t. The initial probability vector p 0 was constructed such that equal probabilities were assigned to the nodes representing members of the disease, with the sum of the probabilities equal to 1.
In this study, RWR was carried out by a custom R program in the total DEG PPI sub-network with fascin protein FSCN1 being set as the seed node. The probabilities of DEGs were regarded as node attribute and displayed by Cytoscape.

PPI sub-network of DEGs
In total, 276 differentially-expressed genes (DEGs) were obtained using a two-fold change as the threshold. 193 genes were upregulated and 83 genes were downregulated in EC109 cells following FSCN1 knockdown. In order to understand how the DEGs could alter cell behavior, we analyzed their interactions with other proteins. Three PPI sub-networks were constructed by mapping the downregulated, upregulated and total DEGs to the HPRD parent PPI network. The downregulated DEG PPI subnetwork contained 286 nodes and 649 edges (interactions), including 42 downregulated DEGs ( Figure 1A). The upregulated DEG PPI sub-network contained 784 nodes and 2711 edges, including 94 upregulated DEGs ( Figure  1B). The total DEG PPI sub-network was comprised of 1015 nodes and 3837 edges, including 137 DEGs ( Figure  1C). These three sub-networks indicated that knockdown of fascin greatly altered the PPI network in ESCC, as hundreds of DEGs interact with thousands of proteins to enlarge the biological consequences of fascin knockdown. To focus on fascin protein FSCN1, the sub-network based on the axis of FSCN1 g neighbours g DEGs g neighbours was also built to detect the relationship between FSCN1 and the nearest DEG proteins. This axis sub-network was comprised of 32 nodes and 34 edges, including 8 DEGs: downregulated FSCN1, NEXN, TAGLN and PLD1, and upregulated SCIN, SLC1A1, PPM1A and NDRG1 ( Figure 1D). Moreover, DEG-DEG interactions were acquired ( Figure 1E). This sub-network contained 16 nodes (7 downregulated and 9 upregulated) and 10 edges, forming two four-DEG interactions and four two-DEG interactions.

Network topological properties
Because of its distinguishing topological characteristics, biological networks (e.g. PPI network) should be significantly different from random networks (Maslov and Sneppen, 2002;Zhu et al., 2007). As shown in Figure 2, the distributions of node degree approximately followed power law distributions, with an R 2 =0.866, 0.907 and 0.864 for the downregulated, upregulated and total DEG sub-networks, respectively. This indicates the three PPI sub-networks were scale-free, which is one of most important characteristics of true complex biological networks (Barabasi and Oltvai, 2004). These results also suggest that a relatively few protein nodes act as hubs to link with a large number of other protein nodes. Other topological parameters of these sub-networks, such as clustering coefficient, network centralization and network density, were showed in Table 1.

Protein layers of the total DEGs PPI sub-networks
The appropriate subcellular localization and protein translocations are crucial because it provides the physiological context for their function, such as complex formation, protein modification and signal transduction. The proteins in the total DEG sub-network were divided into 10 layers according to their subcellular localization ( Figure 3A). FSCN1 localizes to the cytoskeleton/ cytoplasm, especially in filopodia in various types of cancer cells (Lieleg et al., 2011;Zanet et al., 2012). Similarly, FSCN1-interacting proteins are mostly located in the cytoskeleton, or in the cytoplasm (e.g. DNAJB9, ACTC1, ACTA1, RAB1A), or in the membrane (e.g. NGFR and PRKCA). However, two interacting proteins CTNNB1 and PRKCD have the ability to translocate into the nucleus ( Figure 3B). Two FSCN1 neighboring proteins (PPM1A and NDRG1) and their interacting proteins can translocate into nucleus ( Figure 3C). These results show FSCN1 interacts with proteins directly or indirectly with various subcellular localizations, e.g. membrane, cytoskeleton, cytoplasm and nucleus. Then, the knockdown of fascin by RNAi might disturb the PPI Blue nodes represent interacting proteins that are not significantly differentially expressed FSCN1 affects the signal cascades of extracellular g membrane g cytoskeleton/cytoplasm g nucleus.
To future illustrate the strength of this kind analysis, we applied the shortest path algorithm to find the possible shortest path from FSCN1 to FOS, and identify the linking proteins between FSCN1 and FOS. We found 12 shortest paths from FSCN1 to FOS (Table 2) with all the lengths were 3. We also distributed these proteins members in the paths according to their sub-cellular localizations ( Figure  3D). Since most of the cell signals are transducted from network directly or indirectly in multiple subcellular localization layers, causing altered DEG expression, eventually changing cell proliferation, cell morphology, invasion and metastasis, as we confirmed in previous report (Xie et al., 2010). These results indicated that cytoplasm to nucleus, we presume the four following shortest paths had the maximum likelihood: a) FSCN1 g

Annotation of the total DEGs PPI sub-network
It is important to understand which aspects of cellular behavior are perturbed by the DEGs through the interactions in the PPI network. The over-represented GO "Biological Process" terms of the total DEGsnPPI sub-network were analyzed. Based on the interactome of the total DEG PPI sub-network, a functional annotation map was generated where members of the sub-network ended up in nodes corresponding to their enriched GO terms, and where edges connecting GO terms indicated that some of their respective proteins share the same enriched GO terms (Figure 4). Not surprisingly, several GO terms associated with cytoskeleton organization of FSCN1 protein were found, such as "cell surface receptor signaling pathway", "transmembrane receptor protein tyrosine kinase signaling pathway", "cell junction organization", "regulation of transmembrane transport" and "cell mobility". These results suggested that the PPI, sub-network disturbed by the knockdown of fascin, involved various biological processes which closely related to the functions of FSCN1, providing new insights into the functions of FSCN1.

DEG prioritization
Since knockdown of fascin perturbed expression of hundreds of genes, we ranked the DEGs by their importance in the context of FSCN1. In this study, RWR was done by a custom R program, for the total DEG PPI sub-network, with FSCN1 set as the seed node. The probability scores of all protein members were regarded as a node attribute and displayed by Cytoscape. The more significant the node, the bigger it appears ( Figure 5A). For a better illustration, the DEGs were extracted from the modified PPI sub-network ( Figure 5B). Since some of the DEGs were scored closely, and re-arranged after the scores were log10-transformed and classified by the ranges, for example, FSCN1 is classified as A, DEGs within score -2.0~-2.99 are classified as B, DEGs within -3.0~-3.99 are classified as C, and so on. (Figure 5C). We found the proteins neighboring FSCN1 (SCIN, TAGLN, NEXN, PPM1A, PRKD1, NDRG1, ATXN1, PLD1, SLC1A1) were classified into the first class of closeness to FSCN1, consistent with the FSCN1-central sub-network and the idea of the RWR algorithm. SCIN ranked the first closed DEGs to FSCN1. Except for these neighboring proteins, these results provided the priorities for other DEGs when considering their relationship with FSCN1.

Discussion
We developed a systems approach by linking fascin knockdown-mediated differential gene expression with publicly available PPI data to build sub-networks that delineate the cellular roles of fascin. First, three subnetworks for downregulated, upregulated and total DEGs are comprised of thousands of protein nodes, indicating FSCN1 influences other proteins directly or indirectly, and its knockdown perturbs the PPI network to cause global alterations in ESCC. Second, this analysis provides full screen of FSCN1-binding proteins and their neighboring proteins. It should be noted that the PPI node degree distributions of these three sub-networks follow a power law, a necessary condition for networks to be scale-free, rather than random. This indicates that fascin might be a key node such that changes in fascin levels will globally alter cell behavior in ESCC.
This kind of PPI sub-network analysis is comparable to "in silico pull-down" of molecules by iterative expansion, providing clues to detect protein complex formation. Moreover, our approach easily discovers multiple interactions between DEGs compared to a literature search, such as one group of HK2, PRMD1, PTGDS and ATXN1, and another group of COL7A1, DCN, COL4A3 and THBS1. This FSCN1-central sub-network indicates the poteintial functions of their neighboring proteins. Since FSCN1 is an acting-binding protein, plays important roles in carcinoma cell invasion and metastasis, many of its neighboring proteins relate to actin or the cytoskeleton, and have similar functions. For example, NEXN encodes a filamentous actin-binding protein that mediates motility of HeLa cells (Wang et al., 2005). TAGLN encodes a

Figure 5. Prioritization Analysis of DEGs in the PPI Sub-Network. A) A Random Walk with Restart algorithm was
used to score all proteins in the PPI network for their network proximity to the seed node FSCN1. Based on the scores, the size of each node in the PPI sub-network was designed in a gradient. B) The DEGs were extracted from (A) to show their size. C) DEGs were re-arranged according their closeness to FSCN1 protein transformation and shape change-sensitive actin crosslinking/gelling protein, and is upregulated in gastric carcinoma-associated fibroblasts to promote gastric cancer cell migration and invasion (Yu et al., 2013). PLD1 has been extensively implicated in the regulation of the actin cytoskeleton in a number of physiological processes, such as membrane trafficking, cell migration and adhesion (Rudge and Wakelam, 2009). Upon inhibition of PLD1 by quercetin, the invasion and proliferation of glioma cells are subsequently inhibited (Park et al., 2011). Therefore, our method is capable of identifying important fascinassociated proteins.
Subcellular localization of proteins is important for providing clues about protein function and the intricate pathways that regulate cellular activities at the subcellular level. In this study, we incorporate subcellular localization information into total DEG PPI sub-network, generating more biologically intuitive pathway-like layouts of a network. We presume that signaling is transduced by consecutive PPI sequences, since the composition and biological role of proteins vary with subcellular localization. For example, proteins located in the plasma membrane are primarily involved in cell adhesion, the cytoskeleton and cell signaling, whereas in the nucleus, proteins are mainly involved in transcription and ribosomal assembly. FSCN1 directly interacts with proteins, such as CTNNB1 (β-Catenin) and PRKCD, that have the ability to translocate into the nucleus. β-Catenin is an important component of protein complexes that constitute adherens junctions, regulating cell growth and adhesion between cells. β-Catenin anchors the actin cytoskeleton, but can translocate into the nucleus to regulate transcription of its target genes in cooperation with transcription factors of the LEF-1 family (Fagotto, 2013). Interestingly, the downregulated DEG CYR61 is also a transcriptional target of β-catenin, consistent with a previous report (Si et al., 2006). Another FSCN1 interacting protein with the ability to translocate into nucleus is PRKCD, a member of the PKC subfamily that plays a role in growth control, differentiation and apoptosis. PRKCD translocates into nucleus by various stimuli, such as IL6 and insulin (Horovitz-Fried et al., 2008;Wallerstedt et al., 2010). Our association of fascin with hundreds of proteins capable of translocating into the nucleus indicates that the knockdown of fascin should have a large impact on the ESCC gene expression profile. According to their subcellular localizations, most of these paths obey the principle of from extracellular to cytoplasm, till nucleus. With a number of proteins could translocate into nucleus, it is convinced that the knockdown of fascin caused great impact on the ESCC gene expression profile.
Choosing DEGs for subsequent functional experiments is still a huge challenge for the researchers after microarray analysis. We prioritized DEGs by using FSCN1 as the seed node, in an RWR algorithm, to identify DEGs immediately downstream of FSCN1. These result provided the priorities of other DEGs when considering their relationship with FSCN1and important clues for experiments to identify the DEGs.
In summary, the analyses based on PPI networks have greatly expanded our understanding of the mRNA expression profile following fascin knockdown. The PPI network is able to implicate gene function under a variety of conditions, especially when the genes lack all other functional annotations or are sparsely annotated. In addition, the PPI network can be used for gene prioritization irrespective of whether or not the genes have other functional annotations.