Mirpipe

3 minute read

Published: October 28, 2025

miRPipe: A Smarter, Unified Pipeline for Small RNA Discovery

Unveiling the Hidden RNA World

In the vast cellular universe, microRNAs (miRNAs) and piwi-interacting RNAs (piRNAs) act as the molecular regulators of life — controlling gene expression, development, and even cancer progression. But while sequencing these tiny RNA molecules is easy, accurately detecting and quantifying them isn’t.

That’s where miRPipe comes in — a robust, reproducible, and high-accuracy computational framework designed to detect, annotate, and quantify small RNAs from Next Generation Sequencing (NGS) data.

Why miRPipe?

Most existing tools (like miRDeep2, miRPro, miRge2.0, etc.) suffer from:

Lowest High false positives and false negatives
Ability to detect functionally similar miRNAs (paralogues)
Never Missed reverse-complement miRNA sequences
Integrated piRNA analysis
Benchmarked using synthetic “ground truth” data

miRPipe fixes them all, combining biological accuracy with computational scalability.

️ The Complete miRPipe Workflow

miRPipe Workflow

(Figure adapted from Frontiers in Bioinformatics, Ruhela et al. 2022)

Input & Pre-processing
- Accepts FASTQ/FASTQ.GZ files
- Removes adapters with TrimGalore
- Filters reads by quality and length
- Splits into 17–24 nt (miRNAs) and 25–31 nt (piRNAs)
Parallel Alignment
- Uses miRDeep* for miRNAs and Bowtie for piRNAs
- Multi-threaded execution for large cohorts
Post-processing & Re-annotation
- Detects reverse-complement miRNAs using DASHR BLAST search
- Identifies paralogues via seed-based clustering (CD-HIT)
- Functionally annotates novel miRNAs
Differential Expression Analysis
- Employs DESeq2 for expression statistics
- Generates final counts and dysregulated miRNA/piRNA tables
Output
- Annotated miRNAs, novel candidates, paralogues, and piRNAs
- DE tables ready for visualization or downstream analysis

Benchmarking Results

miRPipe outperformed all seven popular pipelines on synthetic and real RNA-Seq datasets:

RNA Type	Accuracy	F1-Score	Competing Average
Known miRNAs	96.58%	89.95%	~85%
Novel miRNAs	99.55%	97.55%	~80%
piRNAs	98.91%	94.35%	~74%

Validated across Chronic Lymphocytic Leukemia, Lung, and Breast Cancer datasets — miRPipe achieved the highest literature and RT-qPCR agreement (87–90%), proving its real-world reliability.

Built-in Synthetic Data Engine: miRSim

To truly test miRNA pipelines, we need synthetic RNA-Seq data with known ground truth — that’s what miRSim provides.

miRSim Workflow

miRSim generates realistic small-RNA reads from miRBase and piRNAdb references, including:

Seed-region and x-seed mutations
Reverse-complement variants
Adjustable error profiles and read depths
Output in FASTQ/FASTA with ground truth tables

Perfect for benchmarking any RNA-Seq pipeline.

Installation

Option 1 — via Docker (Recommended)

# Pull docker image
docker pull vivekruhela/mirpipe:latest

# Run container
docker run -it --name mirpipe   -v /path/to/data:/data   vivekruhela/mirpipe bash

Option 2 — From Source (Linux / macOS)

# Clone repository
git clone https://github.com/vivekruhela/miRPipe.git
cd miRPipe

# Install dependencies
sudo apt-get install bowtie bedtools trim-galore
pip install -r requirements.txt

Running miRPipe

# Example usage
python3 miRPipe.py   --input data/sample.fastq   --genome hg38   --miRbase v22   --threads 8   --out results/

Or use the interactive Jupyter Notebook:

jupyter notebook miRPipe_notebook.ipynb

Example Output

miRNA ID	log2FC	p-adj	Type	Annotation
hsa-miR-155	+2.47	0.0004	Known	Oncogenic
novel-miR-45*	−1.33	0.009	Novel	Paralog of miR-30a
hsa-piR-32963	+3.46	0.01	piRNA	Lung cancer marker

miRPipe vs. the Rest

Benchmark Chart

miRPipe achieves >95% accuracy, outperforming traditional tools like miRDeep2, miRPro, and miRge2.0 in identifying both known and novel small RNAs — all while running faster and more reproducibly in Docker.

Why Researchers Love miRPipe

✅ Detects paralogues and reverse-complement miRNAs
✅ Simultaneous analysis of miRNAs and piRNAs
✅ Built-in synthetic benchmarking (via miRSim)
✅ Dockerized & Jupyter-based for reproducibility and ease
✅ Works with non-human genomes after minor configuration

Citation

Ruhela, V., Gupta, A., Sriram, K., Ahuja, G., Kaur, G., & Gupta, R. (2022).
A unified computational framework for a robust, reliable, and reproducible identification of novel miRNAs from the RNA sequencing data.
Frontiers in Bioinformatics, 2:842051.
https://doi.org/10.3389/fbinf.2022.842051

Final Thoughts

miRPipe bridges the gap between experimental biology and computational precision — bringing trustworthy small RNA discovery to your lab, one read at a time.

Ready to try it?
👉 Download miRPipe on GitHub
👉 Generate synthetic data with miRSim

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Vivek Ruhela