FAQs - NIRD

1. What are gene regulatory networks (GRNs)? Why are they important?

Answer: Gene Regulatory Networks (GRNs) represent how genes interact with and regulate each other, often through transcription factors. They help understand cellular functions, disease mechanisms, and biological responses, making them crucial for systems biology and precision medicine.

2. What is matrix factorization (MF) and how can it help in gene regulatory inference?

Answer: Matrix Factorization (MF) is a technique that breaks down a high-dimensional gene expression matrix into low-rank components. It helps uncover hidden patterns and relationships between genes, enabling more accurate and scalable inference of gene regulatory networks.

3. How is matrix factorization better than other dimension reduction techniques like PCA and t-SNE?

Answer: Unlike PCA and t-SNE, which focus on variance or visualization, MF emphasizes reconstructing meaningful latent factors. It offers better interpretability for regulatory inference, especially in noisy and sparse biological data like single-cell RNA-seq.

4. How NIRD works?

Answer: NIRD reduces high-dimensional gene expression data into a low-dimensional space using matrix factorization. It then applies a Conditional Random Forest model to rank features based on their importance. By projecting these rankings back to the original space, NIRD estimates true feature contributions. These are used to reconstruct a gene regulatory network that reflects the most likely biological dependencies among genes.

5. Which methods are offered by NIRD?

Answer: NIRD provides a diverse set of matrix factorization techniques to capture various biological signals and network structures. The supported methods include:

Singular Value Decomposition (SVD)
Non-negative Matrix Factorization (NMF) and its variants:

Bayesian Decomposition (BD)
Binary Matrix Factorization (BMF)
Iterated Conditional Modes (ICM)
Fisher NMF for Learning Local Features (LFNMF)
Least Squares NMF (LSNMF)
Probabilistic NMF (PMF)
Kullback-Leibler Divergence-based NMF (KLD-NMF)
Euclidean NMF (ENMF)
Sparse NMF (SNMF)
Separable NMF (SepNMF)
Penalized Matrix Factorization for Constrained Clustering (PMFCC)

These methods offer flexibility to tailor the dimensionality reduction process based on the nature of your dataset and the specific inference goals.

6. What types of datasets are supported?

Answer: NIRD supports variety of datasets like bulk and single-cell RNA-seq data, time course data, transcription velocity data and categorical data like mutations.

7. Can I use my own factorization method?

Answer: Yes. NIRD is modular and supports custom methods.

8. How GRNs are constructed from reduced data?

Answer: After dimensionality reduction, NIRD models the influence of regulatory genes (like transcription factors) on other genes using machine learning methods, typically random forests. These models operate in the low-dimensional space to capture non-linear dependencies. Feature importance scores from the regressors are combined with matrix factorization weights to estimate gene-gene regulatory links. This results in a weighted gene regulatory network that reflects the likelihood of biological interactions, with improved signal clarity and flexibility for downstream analysis.

9. How are inferred networks evaluated?

Answer: NIRD evaluates inferred gene regulatory networks (GRNs) using an edge-overlapping approach that compares ranked gene-gene interactions across networks. It calculates how many top-ranked edges from one network are shared with another, building an edge-overlap curve. The area under this curve (AUC) quantifies similarity—higher AUC means greater agreement between networks. This method ensures the inferred GRNs are consistent, stable, and biologically meaningful across datasets and conditions.

10. How to select the best suitable method for a particular dataset?

Answer: To choose the most suitable matrix factorization method for your dataset in NIRD, you can follow two key approaches:

Compare AUC scores across methods: Run multiple MF methods on your dataset and evaluate the inferred networks using the edge-overlap AUC metric. The method with the highest AUC score generally reflects better consistency and reliability.
Validate against gold standard or PPI data: Compare the inferred network with known gene interactions (e.g., gold standard GRNs or protein-protein interaction networks). Calculate the AUC of overlapping edges to assess biological relevance. The method with the highest overlap provides the most biologically meaningful inference.

By combining these evaluations, you can systematically select the best-performing method tailored to your specific data type and experimental condition.

NIRD: Network Inference by Reduced Dimensions

Frequently Asked Questions (FAQs)