Ultrafast genome-wide gene expression analysis. Try RNA-Seq now. Learn More
Back
Technical Documentation

RNA-Seq Technical Documentation

Bioinformatics analysis

Our RNA-seq analysis pipeline consists of the following steps:

  1. FastQ generation and demux with BCL Convert v4.3.6 and fqtk v0.3.1.
  2. Read-filtering using FastP v0.24.0: poly-X tail trimming, 3' quality-based tail trimming, a minimum Phred quality score of 15, and a minimum length requirement of 50 bp.
  3. Alignment to the appropriate reference genome using STAR aligner v2.7.11 with non-canonical splice junction removal and output of unmapped reads.
  4. Coordinate sorting of BAM files using samtools v1.22.1.
  5. UMI based de-duplication: Removal of PCR and optical duplicates using UMICollapse v1.1.0.
  6. Mapping QC: Alignment quality metrics, strand specificity, and read distribution across genomic features using RSeQC v5.0.4 and Qualimap v2.3.
  7. Generation of comprehensive QC report using MultiQC v1.32.
  8. Gene-expression quantification using featureCounts (subread package v2.1.1) with strand-specific counting, multi-mapping read fractional assignment, exons and three prime UTR as the feature identifiers, and grouped by gene_id. Final gene counts were annotated with gene biotype and other metadata extracted from the reference GTF file.
  9. Sample-sample correlations for sample-sample heatmap and PCA were calculated on normalized counts (TMM, trimmed mean of M-values) using Pearson correlation.
  10. Differential expression using edgeR v4.0.16 with filtering for low-expressed genes with edgeR::filterByExpr with default values.
  11. Functional enrichment performed for human and mouse samples using gene set enrichment analysis with GSEApy v0.12 using the MSigDB Hallmark gene set.