NIRD offers a matrix factorization-based framework that allows to interpret high-dimensional data into interpretable modules to infer gene regulatory network. It is designed to scale efficiently for large single-cell RNA-Sequencing datasets to allow the robust identification of key regulatory signals, as well as temporal dynamics.
This vignette represents a comprehensive tutorial on the NIRD tool for interpretation of biological datasets using network-based approach. NIRD permits researchers to perceive the contribution of the regulatory components by transforming datasets into networks, using dimensionality reduction and assessing the overlapping structure with AUC metrics. This methodology can be applied to describe a number of biological situations, such as disease vs. control comparisons and temporal dynamics.
Installing NIRD within a Conda environment is recommended.
Step 1: Create a Conda environment
conda create -n nird python=3.8 pip -y
OR
conda env create -f nird.yml
Step 2: Activate conda environment
conda activate nird
Step 3: Install required dependencies
Install all required libraries using the provided requirements.txt file:
pip install -r requirements.txt
The NIRD tool requires a gene expression matrix as input. This matrix should be in CSV or TSV format. The matrix rows should represent samples (e.g. GSM IDs from GEO datasets) and columns should represent genes. Each cell should contain the expression value of a gene in a given sample (e.g., TPM, FPKM, or raw counts, depending on your preprocessing pipeline).
| | CDC42BPA | ARHGAP1 | CADM1 | CSNK1A1 | SLC25A3 |
| ---------- | --------- | --------- | --------- | --------- | --------- |
| GSM2172403 | 59.00775 | 14.55039 | 50.11628 | 31.77519 | 156.81395 |
| GSM2172457 | 0.74419 | 2.31783 | 573.31008 | 395.65891 | 119.65891 |
| GSM2172483 | 87.41085 | 44.32558 | 160.48837 | 63.75194 | 664.98450 |
| GSM2172489 | 68.16279 | 50.72868 | 84.61240 | 521.28682 | 0 |
| GSM2172507 | 208.06202 | 266.34884 | 351.10078 | 82.31008 | 163.42636 |
This tool also supports datasets that come with gold standard regulatory interactions, commonly used in benchmarks like DREAM5. You can use the provided script (Gold_Data) to load both expression data and gold standard regulatory networks for evaluation or training.
The gold standard file should contain three with columns:
Regulator Target Label
Aff1 Wfdc18 1
Arnt2 Abca13 1
Atf3 Igf2 1
Note: Duplicates are allowed in the file, they will be handled internally by the tool.
Apart from standard expression-expression (expr-expr) network inference, the Double_Expr.py script also supports gene regulatory network inference using transcription velocity data. This allows the tool to infer Expr-Velo networks for linking steady-state expression with future transcriptional dynamics.
This feature enables NIRD to capture causal, time-lagged, or dynamic regulations that are not detectable from static expression data alone.
🧬 Input Format for Velocity Data
The transcription velocity matrix should be a CSV file where:
| | LINC02593 | NOC2L | C1orf159 | SDF4 | UBE2J2 |
| ---------- | --------- | ------- | -------- | ------- | ------- |
| SRR2978582 | -0.4771 | -0.4657 | -1.2073 | -0.0402 | 0.2096 |
| SRR2978568 | -0.3028 | -0.5002 | -0.6872 | -0.0317 | -0.1525 |
| SRR2978599 | -0.2604 | -0.5261 | -0.4434 | -0.0435 | 0.1729 |
Once you’ve completed the primary setup, you’re ready to run the NIRD tool for network inference and evaluation.
The NIRD tool supports four different modes depending on the type of data available:
1. Single Expression Mode
Use this mode when only one expression dataset is available.
python NIRD.py \
--datasets single_expr \
--file1 MF_Datasets/mESC/smartSeq.csv \
--outdir inferred_networks
2. Double Expression Mode
Use this mode to infer and compare GRNs from two expression datasets.
python NIRD.py \
--datasets double_expr \
--file1 MF_Datasets/mESC/dropSeq.csv \
--file2 MF_Datasets/mESC/smartSeq.csv \
--outdir inferred_networks
3. Gold Data Mode
Use this mode when expression data, transcription factor data, and a gold standard network are available.
python NIRD.py \
--datasets gold_data \
--expr_file MF_Datasets/dream5/net2/dream5_net2_expression_data.tsv \
--tf_file MF_Datasets/dream5/net2/dream5_net2_transcription_factors.tsv \
--gold_file MF_Datasets/dream5/net2/dream5_net2_gold.tsv \
--outdir inferred_networks
4. NIRD_Velo Mode
Use this mode when time-course expression and RNA velocity data are available.
python NIRD_Velo.py \
--file1 MF_Datasets/transcription_velocity/00h_time_course_expr.csv \
--file2 MF_Datasets/transcription_velocity/0th_hr_endo_RNA_Velo.csv \
--outdir inferred_networksIf you're unsure about the available command-line options or want to check how to properly format your input arguments, you can always view the detailed usage information using:
python NIRD.py --help
python NIRD_Velo.py --helpNIRD.py Arguments
usage: NIRD.py [-h] [--datasets {single_expr,double_expr,gold_data}] [--methods METHODS] [--evaluations EVALUATIONS] [--do_eval] [--file1 FILE1] [--file2 FILE2]
[--expr_file EXPR_FILE] [--tf_file TF_FILE] [--gold_file GOLD_FILE] --outdir OUTDIR
Run matrix factorization methods on biological datasets.
optional arguments:
-h, --help Show this help message and exit.
--datasets {single_expr,double_expr,gold_data}
Dataset name: single_expr, double_expr or gold_data.
--methods METHODS Comma-separated list of method names.
--evaluations EVALUATIONS
Comma-separated list of evaluation function names.
--do_eval If set, perform evaluation and generate plots.
--file1 FILE1 For single_expr: expression data file | For double_expr: first expression data file.
--file2 FILE2 For double_expr: second expression data file.
--expr_file EXPR_FILE For gold_data: expression data file (.tsv).
--tf_file TF_FILE For gold_data: transcription factors file.
--gold_file GOLD_FILE For gold_data: gold standard file.
--outdir OUTDIR Directory where inferred networks and results will be saved.
NIRD_Velo.py Arguments
usage: NIRD_Velo.py [-h] [--datasets {double_expr}] [--methods METHODS] [--evaluations {Eval_EdgeOverlapping}] [--do_eval] --file1 FILE1 --file2 FILE2 --outdir OUTDIR
Run matrix factorization methods on biological datasets.
optional arguments:
-h, --help Show this help message and exit.
--datasets {double_expr}
Dataset name (only double_expr is supported in NIRD_Velo).
--methods METHODS Comma-separated list of method names.
--evaluations {Eval_EdgeOverlapping}
Evaluation function to use (only Eval_EdgeOverlapping is supported).
--do_eval If set, perform evaluation and generate plots.
--file1 FILE1 First expression data file.
--file2 FILE2 Second expression data file.
--outdir OUTDIR Directory where inferred networks and results will be saved.
NIRD includes the following 13 core matrix factorization-based methods for gene regulatory network (GRN) inference: SVD, NMF, ICM, BD, BMF, LSNMF, KLD_NMF, ENMF, PMF, SNMF, PMFCC, SepNMF, and Kernel_PCA.
These methods represent novel or hybrid GRN inference techniques tailored for expression and transcription velocity data.
Additionally, several traditional GRN inference methods are included for benchmarking purposes only: ARACNE, RELNET, MRNET, C3NET, GENIE3, and GrnBoost2.
This allows you to compare the performance of NIRD’s methods against widely used classical algorithms.
The final inferred network will be a symmetric matrix where each cell represents a score of interaction strength / feature importance score between genes, presumably based on reduced-dimensional representations of the expression matrix.
| | Saal1 | Xrcc1 | Ldb1 | Nr6a1 | Slc7a6os | Chchd7 | Emc10 | Ptms | Meaf6 | Tor1b |
| -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Saal1 | 0 | 3.29E-05 | 2.38E-05 | 2.92E-05 | 3.01E-05 | 1.75E-04 | 7.83E-06 | 1.59E-05 | 2.83E-05 | 2.18E-05 |
| Xrcc1 | 3.29E-06 | 0 | 3.16E-05 | 1.66E-05 | 5.17E-05 | 1.23E-04 | 9.67E-06 | 2.32E-05 | 2.71E-05 | 3.00E-05 |
| Ldb1 | 4.35E-06 | 3.37E-05 | 0 | 3.78E-05 | 3.03E-05 | 1.19E-04 | 7.09E-06 | 2.62E-05 | 3.25E-05 | 2.50E-05 |
| Nr6a1 | 3.34E-06 | 2.54E-05 | 4.15E-05 | 0 | 2.78E-05 | 1.49E-04 | 6.92E-06 | 3.36E-05 | 3.30E-05 | 3.04E-05 |
| Slc7a6os | 3.55E-06 | 3.35E-05 | 2.14E-05 | 1.88E-05 | 0 | 1.20E-04 | 7.72E-06 | 1.82E-05 | 2.34E-05 | 2.22E-05 |
| Chchd7 | 3.56E-06 | 3.67E-05 | 3.05E-05 | 2.37E-05 | 2.67E-05 | 0 | 7.56E-06 | 2.81E-05 | 2.10E-05 | 4.22E-05 |
| Emc10 | 3.74E-06 | 3.90E-05 | 2.50E-05 | 1.96E-05 | 3.58E-05 | 1.35E-04 | 0 | 1.50E-05 | 2.24E-05 | 2.12E-05 |
| Ptms | 3.62E-06 | 3.30E-05 | 2.96E-05 | 2.60E-05 | 3.49E-05 | 1.92E-04 | 8.40E-06 | 0 | 2.64E-05 | 2.40E-05 |
| Meaf6 | 3.39E-06 | 3.39E-05 | 2.35E-05 | 1.84E-05 | 3.83E-05 | 1.45E-04 | 8.57E-06 | 1.45E-05 | 0 | 2.47E-05 |
| Tor1b | 2.86E-06 | 3.26E-05 | 2.60E-05 | 1.95E-05 | 3.20E-05 | 2.15E-04 | 9.34E-06 | 1.70E-05 | 2.52E-05 | 0 |
1. Network Centrality Analysis
After inferring gene-to-gene interaction matrices using NIRD, centrality measures like PageRank and degree are calculated for each gene. These scores help identify influential or hub genes in the network. High-centrality genes are often key regulators or signaling components and may play essential roles in cellular processes or disease mechanisms.
2. Differential Network Analysis
NIRD enables comparative network analysis between conditions (e.g., normal vs disease). By computing differences in PageRank and degree for each gene, it identifies genes that gain or lose influence across conditions. These differential scores highlight candidate genes that may drive disease progression or represent therapeutic targets.
3. Functional Enrichment Analysis
Genes with the highest differential network scores are subjected to pathway enrichment analysis. This determines which biological pathways are significantly overrepresented, helping link network-level changes to known cellular processes such as inflammation, ECM remodeling, or signaling dysregulation.
4. Module and Cluster Analysis
Community detection or clustering techniques can be applied to the inferred network to identify gene modules. These modules often correspond to co-regulated genes or functionally coherent groups, offering insight into coordinated biological responses or cell-type-specific activities.