Mirpipe

3 minute read

Published:

miRPipe: A Smarter, Unified Pipeline for Small RNA Discovery

Unveiling the Hidden RNA World

In the vast cellular universe, microRNAs (miRNAs) and piwi-interacting RNAs (piRNAs) act as the molecular regulators of life — controlling gene expression, development, and even cancer progression. But while sequencing these tiny RNA molecules is easy, accurately detecting and quantifying them isn’t.

That’s where miRPipe comes in — a robust, reproducible, and high-accuracy computational framework designed to detect, annotate, and quantify small RNAs from Next Generation Sequencing (NGS) data.


Why miRPipe?

Most existing tools (like miRDeep2, miRPro, miRge2.0, etc.) suffer from:

  • Lowest High false positives and false negatives
  • Ability to detect functionally similar miRNAs (paralogues)
  • Never Missed reverse-complement miRNA sequences
  • Integrated piRNA analysis
  • Benchmarked using synthetic “ground truth” data

miRPipe fixes them all, combining biological accuracy with computational scalability.


️ The Complete miRPipe Workflow

miRPipe Workflow

(Figure adapted from Frontiers in Bioinformatics, Ruhela et al. 2022)

  1. Input & Pre-processing
    • Accepts FASTQ/FASTQ.GZ files
    • Removes adapters with TrimGalore
    • Filters reads by quality and length
    • Splits into 17–24 nt (miRNAs) and 25–31 nt (piRNAs)
  2. Parallel Alignment
    • Uses miRDeep* for miRNAs and Bowtie for piRNAs
    • Multi-threaded execution for large cohorts
  3. Post-processing & Re-annotation
    • Detects reverse-complement miRNAs using DASHR BLAST search
    • Identifies paralogues via seed-based clustering (CD-HIT)
    • Functionally annotates novel miRNAs
  4. Differential Expression Analysis
    • Employs DESeq2 for expression statistics
    • Generates final counts and dysregulated miRNA/piRNA tables
  5. Output
    • Annotated miRNAs, novel candidates, paralogues, and piRNAs
    • DE tables ready for visualization or downstream analysis

Benchmarking Results

miRPipe outperformed all seven popular pipelines on synthetic and real RNA-Seq datasets:

RNA TypeAccuracyF1-ScoreCompeting Average
Known miRNAs96.58%89.95%~85%
Novel miRNAs99.55%97.55%~80%
piRNAs98.91%94.35%~74%

Validated across Chronic Lymphocytic Leukemia, Lung, and Breast Cancer datasets — miRPipe achieved the highest literature and RT-qPCR agreement (87–90%), proving its real-world reliability.


Built-in Synthetic Data Engine: miRSim

To truly test miRNA pipelines, we need synthetic RNA-Seq data with known ground truth — that’s what miRSim provides.

miRSim Workflow

miRSim generates realistic small-RNA reads from miRBase and piRNAdb references, including:

  • Seed-region and x-seed mutations
  • Reverse-complement variants
  • Adjustable error profiles and read depths
  • Output in FASTQ/FASTA with ground truth tables

Perfect for benchmarking any RNA-Seq pipeline.


Installation

# Pull docker image
docker pull vivekruhela/mirpipe:latest

# Run container
docker run -it --name mirpipe   -v /path/to/data:/data   vivekruhela/mirpipe bash

Option 2 — From Source (Linux / macOS)

# Clone repository
git clone https://github.com/vivekruhela/miRPipe.git
cd miRPipe

# Install dependencies
sudo apt-get install bowtie bedtools trim-galore
pip install -r requirements.txt

Running miRPipe

# Example usage
python3 miRPipe.py   --input data/sample.fastq   --genome hg38   --miRbase v22   --threads 8   --out results/

Or use the interactive Jupyter Notebook:

jupyter notebook miRPipe_notebook.ipynb

Example Output

miRNA IDlog2FCp-adjTypeAnnotation
hsa-miR-155+2.470.0004KnownOncogenic
novel-miR-45*−1.330.009NovelParalog of miR-30a
hsa-piR-32963+3.460.01piRNALung cancer marker

miRPipe vs. the Rest

Benchmark Chart

miRPipe achieves >95% accuracy, outperforming traditional tools like miRDeep2, miRPro, and miRge2.0 in identifying both known and novel small RNAs — all while running faster and more reproducibly in Docker.


Why Researchers Love miRPipe

  • ✅ Detects paralogues and reverse-complement miRNAs
  • ✅ Simultaneous analysis of miRNAs and piRNAs
  • ✅ Built-in synthetic benchmarking (via miRSim)
  • Dockerized & Jupyter-based for reproducibility and ease
  • ✅ Works with non-human genomes after minor configuration

Citation

Ruhela, V., Gupta, A., Sriram, K., Ahuja, G., Kaur, G., & Gupta, R. (2022).
A unified computational framework for a robust, reliable, and reproducible identification of novel miRNAs from the RNA sequencing data.
Frontiers in Bioinformatics, 2:842051.
https://doi.org/10.3389/fbinf.2022.842051


Final Thoughts

miRPipe bridges the gap between experimental biology and computational precision — bringing trustworthy small RNA discovery to your lab, one read at a time.

Ready to try it?
👉 Download miRPipe on GitHub
👉 Generate synthetic data with miRSim