Mirpipe
Published:
miRPipe: A Smarter, Unified Pipeline for Small RNA Discovery
Unveiling the Hidden RNA World
In the vast cellular universe, microRNAs (miRNAs) and piwi-interacting RNAs (piRNAs) act as the molecular regulators of life — controlling gene expression, development, and even cancer progression. But while sequencing these tiny RNA molecules is easy, accurately detecting and quantifying them isn’t.
That’s where miRPipe comes in — a robust, reproducible, and high-accuracy computational framework designed to detect, annotate, and quantify small RNAs from Next Generation Sequencing (NGS) data.
Why miRPipe?
Most existing tools (like miRDeep2, miRPro, miRge2.0, etc.) suffer from:
- Lowest High false positives and false negatives
- Ability to detect functionally similar miRNAs (paralogues)
- Never Missed reverse-complement miRNA sequences
- Integrated piRNA analysis
- Benchmarked using synthetic “ground truth” data
miRPipe fixes them all, combining biological accuracy with computational scalability.
️ The Complete miRPipe Workflow

(Figure adapted from Frontiers in Bioinformatics, Ruhela et al. 2022)
- Input & Pre-processing
- Accepts FASTQ/FASTQ.GZ files
- Removes adapters with
TrimGalore - Filters reads by quality and length
- Splits into 17–24 nt (miRNAs) and 25–31 nt (piRNAs)
- Parallel Alignment
- Uses
miRDeep*for miRNAs andBowtiefor piRNAs - Multi-threaded execution for large cohorts
- Uses
- Post-processing & Re-annotation
- Detects reverse-complement miRNAs using DASHR BLAST search
- Identifies paralogues via seed-based clustering (CD-HIT)
- Functionally annotates novel miRNAs
- Differential Expression Analysis
- Employs DESeq2 for expression statistics
- Generates final counts and dysregulated miRNA/piRNA tables
- Output
- Annotated miRNAs, novel candidates, paralogues, and piRNAs
- DE tables ready for visualization or downstream analysis
Benchmarking Results
miRPipe outperformed all seven popular pipelines on synthetic and real RNA-Seq datasets:
| RNA Type | Accuracy | F1-Score | Competing Average |
|---|---|---|---|
| Known miRNAs | 96.58% | 89.95% | ~85% |
| Novel miRNAs | 99.55% | 97.55% | ~80% |
| piRNAs | 98.91% | 94.35% | ~74% |
Validated across Chronic Lymphocytic Leukemia, Lung, and Breast Cancer datasets — miRPipe achieved the highest literature and RT-qPCR agreement (87–90%), proving its real-world reliability.
Built-in Synthetic Data Engine: miRSim
To truly test miRNA pipelines, we need synthetic RNA-Seq data with known ground truth — that’s what miRSim provides.

miRSim generates realistic small-RNA reads from miRBase and piRNAdb references, including:
- Seed-region and x-seed mutations
- Reverse-complement variants
- Adjustable error profiles and read depths
- Output in FASTQ/FASTA with ground truth tables
Perfect for benchmarking any RNA-Seq pipeline.
Installation
Option 1 — via Docker (Recommended)
# Pull docker image
docker pull vivekruhela/mirpipe:latest
# Run container
docker run -it --name mirpipe -v /path/to/data:/data vivekruhela/mirpipe bash
Option 2 — From Source (Linux / macOS)
# Clone repository
git clone https://github.com/vivekruhela/miRPipe.git
cd miRPipe
# Install dependencies
sudo apt-get install bowtie bedtools trim-galore
pip install -r requirements.txt
Running miRPipe
# Example usage
python3 miRPipe.py --input data/sample.fastq --genome hg38 --miRbase v22 --threads 8 --out results/
Or use the interactive Jupyter Notebook:
jupyter notebook miRPipe_notebook.ipynb
Example Output
| miRNA ID | log2FC | p-adj | Type | Annotation |
|---|---|---|---|---|
| hsa-miR-155 | +2.47 | 0.0004 | Known | Oncogenic |
| novel-miR-45* | −1.33 | 0.009 | Novel | Paralog of miR-30a |
| hsa-piR-32963 | +3.46 | 0.01 | piRNA | Lung cancer marker |
miRPipe vs. the Rest

miRPipe achieves >95% accuracy, outperforming traditional tools like miRDeep2, miRPro, and miRge2.0 in identifying both known and novel small RNAs — all while running faster and more reproducibly in Docker.
Why Researchers Love miRPipe
- ✅ Detects paralogues and reverse-complement miRNAs
- ✅ Simultaneous analysis of miRNAs and piRNAs
- ✅ Built-in synthetic benchmarking (via miRSim)
- ✅ Dockerized & Jupyter-based for reproducibility and ease
- ✅ Works with non-human genomes after minor configuration
Citation
Ruhela, V., Gupta, A., Sriram, K., Ahuja, G., Kaur, G., & Gupta, R. (2022).
A unified computational framework for a robust, reliable, and reproducible identification of novel miRNAs from the RNA sequencing data.
Frontiers in Bioinformatics, 2:842051.
https://doi.org/10.3389/fbinf.2022.842051
Final Thoughts
miRPipe bridges the gap between experimental biology and computational precision — bringing trustworthy small RNA discovery to your lab, one read at a time.
Ready to try it?
👉 Download miRPipe on GitHub
👉 Generate synthetic data with miRSim