Weighted Gene Correlation Network Analysis

Please wait for operation to finish...

Welcome to the Weighted Gene Correlation Network Analysis (WGCNA) shiny app. This app implements the WGCNA R package (Langfelder and Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics, 2008) in a user-friendly interface.

This application was developped for and is maintained by the eTRIKS IMI consortium.

Introduction

What is WGCNA?

Weighted Gene Correlation Network Analysis (WGCNA) is a widely used data mining and analysis method developed to study biological networks based on pairwise correlations between variables.

This method was first published by B. Zhang and S. Horvath (A general framework for weighted gene co-expression network analysis, Statistical applications in genetics and molecular biology, 4 (1), 2005).

Although mainly used to analyse gene expression data, WGCNA is suited to analyse any type of continuous biological omics data. The rationale behind this method is to use the correlations levels between the omics features to extract meaningful results, complementing the traditional methods of omics data analysis who focus on statistically relevant differences of expression and/or abundance of the omics features between groups.

What does this application do?

This application allows users with little or no coding skills to try, explore and play with the WGCNA method. In order to make this experience as straightforward as possible, we have restricted the data available in this application to two model datasets of gene expression in mice. Other implementations of this method within the eTRIKS project will allow users to upload their own data.

In general terms, this application will let the user:

  • choose among the two datasets of mice gene expression data,
  • generate a correlation network of all genes available in the selected dataset,
  • interrogate this network to identify groups of genes (modules) correlated with each other,
  • cross these gene modules with available clinical data,
  • explore the relations between gene modules and
  • generate a network of modules and export it to the user’s favorite network visualization software.

1. Data Input

In this section, the user chooses and gets a first look at the dataset to use in the analysis.

a. Choose Demo Dataset

Demo Datasets

The user can choose between a Female Mice Liver transcriptomics dataset (N= 135) or a Male Mice Liver transcriptomics dataset (N= 124), each with 3600 probesets more details in Ghazalpour et al., Integrating Genetic and Network Analysis to Characterise Genes Related to Mouse Weight, PLoS Genetics, 2006. These two datasets have been used by the authors of the WGCNA method in their tutorials.

Each dataset comprises two files, one for the transcriptomic expression (in log2 and Z-scored), one for the clinical parameters, each having the subject IDs as first column.

Tables

The tables of ‘Gene Expression’ and ‘Clinical Traits’ show the source data matrices. The user can choose to show more entries and search for a specific subject.

Visualisation

The sample dendrogram shows the dendrogram computed on the distances (Euclidean metric) between the samples. Below is shown the clinical traits heatmap. The colors represent the Z-scores of the clinical traits: blue corresponds to low values, white to the means (i.e. 0), red to high values and grey to missing values. Once the user is happy with the choice of dataset, he/she can proceed by clicking on the Go to Next Step button or on the Network Construction tab on top of the page.

Table of Gene Expression (100 first genes)

Table of Clinical Traits



2. Network Construction

In this section, the user will define the values of parameters and choose options to build the correlation network and define gene modules.

a. Choose Network Type

The user can choose between 3 types of network:

  • Unsigned: uses absolute value of correlations,
  • Signed: uses value and sign of correlations,
  • Signed Hybrid: uses only positive correlations and assigns 0 to negative correlations.

The user can then click on the Compute button to compute the network with his/her choice of network type.


b. Choose Soft Power

The correlation values will be elevated to the power β (called soft power), in order to exacerbate the high correlation values and improve the signal to noise ratio.

The value of this soft power has to be chosen such that the resulting network shows an approximate scale-free topology (SFT; meaning that the degree distribution of the network follows a power-law) as presented in the top left plot.

The other three plots can also be used to check the scale-free topology fit: there should be few nodes with a high connectivity (hub genes), hence a low mean and median connectivity and a high maximum connectivity.

The first integer value of the soft power for which the scale-free topology fit is above 80% is highlighted in red in the plots and automatically selected (but it can be adjusted manually in the next step).

The user can download the tables used to draw the plots in csv format by clicking on the Download Table button.


c. Create Network

Compute Network

The Network can be constructed either in an automated or manual way (Automatic or Manual Construction Method respectively).

If the user chooses to use the Manual network computation, he/she will have access to parameters for modules detection.

The user can specify the soft power (by default the red value in the previous step) and the Topology Overlap Matrix (TOM) type (only for unsigned network). The central idea of TOM is to take into account the direct connection strengths as well as connection strengths mediated by shared neighbours.

TOM type can be:

  • Unsigned: consider all links as having the same effect,
  • Signed: consider the possibility of negative feedback.

Once the parameters are chosen the user can click on the Calculate Network button to compute the network accordingly.

Resulting modules are then represented by their module eigengenes (virtual genes representing each module). The tree of the module eigengenes (Clustering of module eigengenes) is found in the first part of the figure under the parameter box. It represents how correlated the module eigenes are. The second part of the figure (Cluster dendrogram) represents the dendrogram of the genes, the first band of colors represents the initial modules, and the second band represents the merged modules if any.

The user can download the modules assignments for all genes in the analysis by clicking on the Download Module Assignments button.

Once the user is happy with the definition of modules, he/she can go to the third part of the program by clicking on the Go to Next Step button, or the Module – Trait Relationship tab at the top of the page.

Compute Network

Download Module Assignments

3. Module - Trait Relationship

In this section, the user will explore relationships between gene modules and clinical traits.

a. Eigengene - Trait Analysis

This figure represents the Pearson's correlations between module eigengenes defined at the previous step and the clinical traits available, as well as the associated p-values ( correlation (p-value)). The intensity of the color corresponds to the strength of the correlation (red and blue for positive and negative correlations respectively, see scale on the right side of the plot).

If the values are not easily readable it is possible to download them as a csv file by clicking on the Download Module Trait Association button and open it with your preferred software.

A useful check for module definitions can be performed here: there should be very few (if any) significant correlations between the grey module and the clinical traits, else the user should probably go back to the Network Construction step, tweak the parameters and compute the network and/or detect the modules again.


b. Gene - Trait Analysis

Here the user can further study gene modules correlated with a clinical trait of interest.

The figures represent:

  • The expression values of the module eigengene across all samples (top), which can help to identify modules that might be driven by only one or few sample(s) and which might not be of much interest,
  • The gene significance across modules (bottom left), which shows the biological relevance for all genes in all modules related to the clinical trait studied,
  • The module membership vs gene significance (bottom right), which shows the (absolute) correlation between the genes of the chosen module and the module membership of each gene (i.e. the correlation between each gene and the eigengene) of the chosen module. The user can select dots on the plot (by brushing over the region of interest in the plot) and have more details about those genes.

c. Network of Modules

The gene network of one or multiple selected module(s) is represented. A link threshold can be specified, which will remove links with a lower weight. The user can also choose to remove or keep in the network unlinked nodes by ticking the box. The names of the genes are shown when hovering the mouse over the nodes in this network.

The network can be exported as zip file containing two csv tables, one for the nodes and one for the edges. These tables can be opened in external graph visualisation softwares. The Download Shown Network button only exports the current view of the network and the Download Whole Network button exports the complete network (keep in mind that building the full network can be resource demanding).

4. To Go Further

This application stops the analysis process at the gene modules identification and relation with clinical variables.

The next step in interpreting the results would be to submit modules correlated with a clinical trait of interest to an enrichment analysis tool, in order to explore the biological meaning of said modules.

This can be done by exporting the modules definitions (by clicking on the Download module assignments button at the end of section 2) and uploading each module to the user’s favourite enrichment software. Usual choices include MetaCore, Ingenuity Pathway Analysis, g:Profiler or DAVID, among others.

A clearer picture of the molecular mechanisms involved in the correlation with a clinical trait can be obtained by collating enrichment analysis results from all modules correlated with said clinical trait.