The extraction of targeted subnetworks is a powerful way to identify functional modules and pathways within complex networks. Here, we present SubNet, a Java-based standalone program for extracting subnetworks given a basal network and a set of selected nodes. Designed with both graphical user interface (GUI) and command line user interface (CUI), SubNet combines four different extraction methods, which offers the possibility to interrogate a biological network according to the question investigated.

SubNet: a Java application for subnetwork extraction. Zhang Q, Zhang ZD (2013) *Bioinformatics* 29(19):2509-2511. pubmed
reprint

» SubNet

- Source code (NetBeans project directory tree, 16 MB)
- Executables: with GUI (33 MB) & with CUI (33 MB)

» Network

Source | Version | Edge weights? | |

HPRD | 9 | / | No |

STRING | 9.05 | Yes | No |

GeneMANIA (Human) | Oct 2012 | ||

Coexpression (Shaykhiev,Crystal,2011) | Yes | No | |

Coexpression (Wang,Crystal,2011) | Yes | No | |

Coexpression (Barnes,Colbert,2009) | Yes | No | |

Coexpression (Tilley,Crystal,2011) | Yes | No | |

Coexpression (Barnes,Glass,2010) | Yes | No | |

Colocalization (Johnson,Shoemaker,2003) | Yes | No | |

Colocalization (Schadt,Shoemaker,2004) | Yes | No | |

Physical interaction (IREF.OPHID) | Yes | No | |

Physical interaction (IREF.SmallScaleStudies) | Yes | No | |

Physical interaction (IREF.HPRD) | Yes | No | |

Physical interaction (IREF.GRID) | Yes | No | |

Physical interaction (BIOGRID) | Yes | No | |

Notes: (1) A large number of coexpression and physical interaction networks are available at GeneMANIA. We only list the largest five of each type. (2) All networks are undirected. |

» System requirements

- Computer: RAM usage depends on the size of the network. We recommend to start with 4 GB RAM.
- Software: the latest Java from Oracle

» Input files

- Network: a text file with two or three tab-delimited columns. Each edge of the network is specified on one line, with its two nodes on the first and the second columns. These two columns are mandatory. The optional edge weight may be specified on the third column. If absent, by default all weights are set as 1. For directed network, the source node is on the first column and the target node on the second. Some useful networks with or without edge weights are available for download above.
- Seeds: a text file with each selected node on one line
- Network files in other formats (e.g., GML) need to be reformated for SubNet to use. Such a file can be imported into Cytoscape first and then saved in the sif format, a simple tab-delimited text format. The two columns of node names in the sif file can be selected and saved in a new text file for SubNet to use. On Unix or Mac OS X, this can be easily done with the 'cut' command. On Windows, this can be done with Microsoft Excel.

» Output files

- Location: the output files are generated in the directory where SubNet is located.
- Each method generates three text files with self-explanatary names.

Output file for | Generated by | Tab-delimited columns | ||

Network (.top) | All methods | Node1 name | Node2 name | |

Edges | Shortest path | Node1 name | Node2 name | Betweenness |

All other | Node1 name | Node2 name | ||

Nodes | Shell | Node name | Its shell number | Its degree |

Shortest path | Node name | Its betweenness | Its degree | |

Emission decay | Node name | Its decay value | ||

PageRank | Node name | Its PageRank score | Its degree | |

Notes: (1) The network file can also contain orphan nodes, which are not linked to any other nodes. (2) The node degree is its degree in the subnetwork. |

» Graphical user interface

- Lauch SubNet by double-clicking SubNet_GUI.jar
- (To request more memory allocation, use the -Xmx option with java on the command line. For example, to allocate 4000 Mb memory to SubNet, execute the command 'java -Xmx4000M -jar SubNet_GUI.jar' to lauch SubNet)
- Select the network file by clicking on the 'Browse' button in the first panel
- Choose the network type, directed or undirected
- Click on the 'Load' button to read and create the network
- Notice run-time messages displayed in the Analysis message box
- Select the seed file by clicking on the 'Browse' button in the second panel
- Select the subnetwork extraction method from the combo box in the third panel
- Specify the appropriate parameter value
- Click 'Start' to launch the extraction

» Command line user interface

- Synopsis: java -jar SubNet_CmdLine.jar [ --
*d*| --*ud*] [ --*shell*| --*shortest*| --*decay*| --*matrix*]*network.top**seeds.txt**other_option* - [ --
*d*| --*ud*]: network type, directed or undirected - [ --
*shell*| --*shortest*| --*decay*| --*matrix*]: subnetwork extraction methods *other_option*: shell number or decay threshold or page rank score threshold, used with the concomitant method specification- To request more memory allocation, use the -Xmx option with java

Formally, given a network *G *= (*V*, *E*) with its ensemble of *n* vertices *V* and edges *E* and a set of preselected nodes *S*, SubNet constructs a subnetwork *W* based on *S* so that *W* is a subset of *G*.

» Extraction by shell

A node in a network has neighbors on different levels: one edge away, on the so-called first shell, are the direct neighbors, two edges away are the indirect neighbors on the second shell, and so on. Directly based on the notion of "guilty by association" – the closest molecules are highly likely to be related to each other and involved in the same pathways and mechanisms (Schwikowski et al., 2000), this method extracts the neighbors of each of preselected nodes within a certain distance. Given the set of selected genes *S* = {*v _{i}*}, the set of nodes within the

in which {*v _{i}*}

» Extraction by shortest path

To make use of indirect interactions, one can take higher-order neighborhoods into consideration. In a complex molecular interaction network, given a pair of connected biomolecules, there are almost always a large number of routes to reach one node from the other. The shortest path between them is of particular interest and significance, as it is assumed to be the route used most for information communication between the two molecules and has been shown to provide a good measure for functional relatedness among genes (Sharan, et al., 2007; Managbanag et al., 2008). We use the shortest path in our second method – if the shortest path between two genes includes at least one transitive gene it is postulated that the transitive genes on the path will be involved in the same biological process as the terminal genes. To find the shortest path between two nodes, we use Dijkstra's algorithm (Dijkstra, 1959), which constructs a shortest-path tree from a selected node to every other node in the network. Because the shortest path between two nodes may not be unique, we will recover all the shortest paths if there is more than one. Given the set of selected genes, *S* = {*v _{i}*}, the set of nodes on the shortest paths among them,

in which {*v _{i}* <->

» Extraction by emission decay

Given the network connectivity, the preselected seeds can be considered as heat sources, emitting functionality along the edges to other nodes in the network. Such emission decays as the distance increases, and the total retainment at each node (excluding seeds) is the sum of functionality transmitted from all seeds:

in which *d _{i}* and

Each node sees its decay value updated after each seed is being considered, with the initial decay value at 0 before the first seed. Once every seed has been explored, the final decay value of each node is then compared to a threshold *τ*, which is initially set by the user. The resulting subnetwork *W* consists then in the nodes with decay value higher than *τ* and the edges connecting them.

» Extraction by PageRank

We also developed a method based on the node centrality/influence and adapted the Google PageRank algorithm (Page *et al.*, 1999) to incorporate preselected nodes. Let **A** be the adjacency matrix of network *G* with:

in which *w _{ij}* is the weight of the edge between nodes

The centrality/influence of each node (*p _{i}*) is the sum of the influence it receives from each of its source node (

It can be written in the matrix form as:

This is the characteristic equation of the eigensystem, and the solution to vector **P** is an eigenvector of **B** with the corresponding eigenvalue of 1. However, if **B** is a stochastic matrix that is also irreducible and aperiodic, 1 is the largest eigenvalue and **P **is the principal eigenvector (Perron–Frobenius theorem). To make such a matrix, we first replace every cell of a row with 1/*n* if that row contains only 0 and then add a link from each node to every node with a small transition probability. This operation generates a new matrix **C**:

in which **E** is an *n*×*n* matrix of ones and *d* is a parameter controlling the weight of the **E** component in **C**. To obtain **P**, we use the power iteration method (von Mises *et al.*, 1929), which can be applied to large sparse matrices of biological networks:

.

The resulting subnetwork *W* consists of the nodes with values in **P** higher than a preselected thresholdand the edges connecting them.

Schwikowski, B., Uetz, P. and Fields, S. (2000) A network of protein-protein interactions in yeast. *Nat Biotechnol*, **18**, 1257-1261.

Sharan, R., Ulitsky, I. and Shamir, R. (2007) Network-based prediction of protein function. *Mol Syst Biol*, **3**, 88-100.

Managbanag, J.R., Witten, T.M., Bonchev, D., Fox, L.A., Tsuchiya, M., Kennedy, B.K., Kaeberlein, M. (2008) Shortest-Path Network Analysis Is a Useful Approach toward Identifying Genetic Determinants of Longevity. *PLoS ONE*, **3**, e3802.

Dijkstra, E. W. (1959) A note on two problems in connexion with graphs, *Numerische Mathematik*, **1**, 269-271.

Page, L., Brin, S., Motwani, R. and Winograd, T. (1999) The PageRank citation ranking: Bringing order to the Web.

von Mises, R. and Pollaczek-Geiringer H. (1929) Praktische Verfahren der Gleichungsauflösung. ZAMM, *Zeitschrift für Angewandte Mathematik und Mechanik*, **9**, 152-164.