Graph Wavelet Denoising

Introduction

GWNet is graph-wavelet filtering method for single-cell gene expression profiles which denoises the expression data. This method has three modules : 1. Denoising gene expression python module takes gene expression profiles as input and returns filtered/denoised expression data. 2. Differential centrality python module takes ageing data (young and old) as input and gives differential pagerank and degree after building network for both datasets separately. User also gets individual centrality for both input datasets. 3. AUC for overlap between ground truth (Golden set of interaction between genes) and unfiltered, filtered graphs. User also gets option to select method to infer network. User can also use their own method to infer network.

Installation

Download the module which user wants to use. Install the dependencies. You have to download python code in your local machine/server. For execution you have to pass filename of expression csv as a input file. Expression csv file should not contain header and genes i.e. it consist of only data on which filtering is going to perform. Row represents samples and column represent genes in csv file. Single cell data should be fpkm. Following dependencies exist :

One needs to have python 3.0+ installed in their machine.

Python libraries : Numpy, Pandas, Pygsp, scipy, sklearn, networkx

R dependencies: minet,methods,GENIE3,propr

User can use following file to create a conda environment for run the modules:

Requirement.txt

Usage of Module : Denoising Gene Expression

One needs to have python 3.0+ installed in their machine. Following are the dependencies of the code :

Numpy, Pandas, Pygsp, scipy,Pygsp, sklearn

R dependencies : minet,methods,propr

You have to download python code in your local machine/server. For execution you have to pass filename of expression csv as a input file. Expression csv file should not contain header and genes i.e. it consist of only data on which filtering is going to perform. Row represents samples and column represent genes in csv file. Single cell data should be fpkm.

USE THE FOLLOWING COMMANDS :

bash
python3 NetWave.py -e (expression data) -f (filter_method) -t (data_type) -k (KNN) -p (cutoff(for wavelet only)) -o (output_path)
Eg.,
python3 NetWave.py -e net1.txt -f wavelet -t bulk -k 40 -p 70 -o net1_output.txt
'net1.txt' consist of expression data to be filtered.

Regarding each iput parameter :
-e expression Data : Path to expression data
-o output path : Path to save filterd expression
-f : Filter method to use (Options are : chebychev, wavelet, SureShrink, BayesShrink, Gaussian_MDL_hardthresh)
-t : data type (Options : single_cell / bulk)
-k : KNN value (min(genes,samples) in expression data)
-p : Percentile Frequency cutoff (for 'wavelet' only)

Usage of Module : Differential Centrality

One needs to have python 3.0+ installed in their machine. Following are the dependencies of the code :

Numpy, Pandas, Pygsp, scipy, sklearn, networkx

R dependencies : minet,methods,GENIE3,propr

You have to download python code in your local machine/server. For execution you have to pass filename of young and old filtered expression csv as a input file. Expression csv file should not contain header and genes i.e. it consist of only data on which filtering is going to perform. Row represents samples and column represent genes in csv file.

User gets following output from the module :

Differential degree and pagerank for old and young.

centrality (degree, pagerank, betweenness, closeness & eigen values) for both young and old in separate files by the default names "young_centrality.txt" and "old_centrality.txt"

USE THE FOLLOWING COMMANDS :

bash
python3 centrality.py -y (filtered_young) -o (filtered_old) -g (genes_list) -c (Network_Inference_method) -r (result_path) -t (top_edges)
e.g.

python3 centrality.py -y kimmel_young_filtered_pneumocyte.txt -o kimmel_old_filtered_pneumocyte.txt -g genes_lungs.txt -c pearson -r diff_output.txt -t 100000

'kimmel_young_filtered_pneumocyte.txt' consist of filtered young expression data, 'kimmel_old_filtered_pneumocyte.txt' consist of filtered old expression data.

Regarding each iput parameter :
-y Filtered young expression Data : Path to young filtered expression data
-o Filtered old expression Data : Path to Old filtered expression data
-g List of genes in expression Data : Path to Gene list
-r result_path : Path to save differential centrality result
-c Network Inference Method: Options : pearson, spearman, aracne, genie3, phi, rho
-t top_edges : Number of Top Edges to select to build network

User can further use enrichr on file differential centrality. User needs to sort in order of increasing degree / pagerank based centrality. Then, one can selecting top/bottom 500 genes to get important pathways in either young/old. If user selects top 500 genes, the genes are then important in young, else if user selects genes from bottom 500, those genes have importance in old.

Usage of Module : Differential Centrality with Already Inferred Network

One needs to have python 3.0+ installed in their machine. Following are the dependencies of the code :

Numpy, Pandas, Pygsp, scipy, sklearn, networkx

You have to download python code in your local machine/server. For execution you have to pass filename of young and old adjacency (Network Already inferred if user donot want to use the methods provided) csv as a input file. Adjancency csv file should not contain header and genes i.e. it consist of only data on which filtering is going to perform. Row and columns both should represent connection weights between genes in csv file.

User gets following output from the module :

Differential degree and pagerank for old and young.

centrality (degree, pagerank, betweenness, closeness & eigen values) for both young and old in separate files by the default names "young_centrality.txt" and "old_centrality.txt"

USE THE FOLLOWING COMMANDS :

bash
python3 centrality.py -y (adjancency_young) -o (adjacency_old) -g (genes_list) -r (result_path) -t (top_edges)
e.g.

python3 centrality.py -y correlated_mat_y.txt -o correlated_mat_o.txt -g genes_lungs.txt -c pearson -r diff_output.txt -t 100000

'correlated_mat_y.txt' consist of young adjancency matrix, 'correlated_mat_o.txt' consist of old adjacency matrix.

Regarding each iput parameter :
-y Adjancency matrix of young : Path to young adjacency matrix
-o Adjancency matrix of old : Path to Old adjacency matrix
-g List of genes in expression Data : Path to Gene list
-r result_path : Path to save differential centrality result
-t top_edges : Number of Top Edges to select to build network

Usage of Module : Improvement In Overlap with Ground Truth after Denoising

One needs to have python 3.0+ installed in their machine. Following are the dependencies of the code :

Numpy, Pandas, Pygsp, scipy, sklearn

R dependencies : minet,methods,GENIE3,propr

You have to download python code in your local machine/server. For execution you have to pass filename of expression csv as a input file. Expression csv file does not contain header and genes i.e. it consist of only data on which filtering is going to perform. Row represents samples and column reprent genes in csv file.

USE THE FOLLOWING COMMANDS :

bash
python3 NetWave.py -e (expression data) -g (ground truth) -f (filter method) -c (Network Inference method) -t (data type) -k (knn) -d (dx for AUC) -p (percentile (for wavelet only))
e.g.
python3 NetWave.py -e net1.txt -g goldset_net1.txt -f wavelet -c pearson -t bulk -k 40 -d 10000 -p 70
'net1.txt' consist of expression data to be filtered.

Regarding each iput parameter :
-e : Path to expression data
-g : Path to golden set
-f : Filter method to use (Options are : chebychev, wavelet, SureShrink, BayesShrink, Gaussian_MDL_hardthresh)
-c : Network Inference method to use (Options : pearson, spearman, aracne, genie3, phi, rho)
-t : data type (Options : single_cell / bulk)
-k : KNN value 
-d : dx for AUC
-p : Percentile Frequency cutoff (for 'wavelet' only)

Introduction

Installation

Requirement.txt

Usage of Module : Denoising Gene Expression

Usage of Module : Differential Centrality

Usage of Module : Differential Centrality with Already Inferred Network

Usage of Module : Improvement In Overlap with Ground Truth after Denoising

Download manual in pdf

Download Graph Wavelet Code

Denoising Gene Expression

Differential Centrality

Differential Centrality with already Inferred Network

IMPROVEMENT IN OVERLAP WITH GROUND TRUTH AFTER DENOISING