Technical Documentation

RNA-Seq Technical Documentation

Technical details Sequencing technlogy Sample prep Bioinformatics analysis Data deliverables and file types Troubleshooting

Technical details

Sequencing technology

Plasmidsaurus RNA-Seq utilizes Illumina sequencing and a 3’ end counting approach. We extract total RNA using a bead-based extraction approach from cells preserved in Zymo DNA/RNA Shield. Alternatively, you may send extracted total RNA, provided it is sent on dry ice (you should send at least 30 μl of sample volume, of at least 10 ng/μl, free of DNA and nucleases.)

We assess post-extraction (or received purified RNA) concentration using a fluorescence based plate reader assay and normalize sample concentrations before proceeding. Please note we do not assess an RNA Integrity Number (RIN) before proceeding, nor do we hold samples if the extracted RNA concentration is low. Making sure you send an adequate number of intact cells is key to getting optimal results.

For sequencing library preparation, we convert mRNA into complementary DNA (cDNA) via reverse transcription and second-strand synthesis, followed by tagmentation, library indexing, and amplification. Our approach uses 3’ end counting to capture differential gene expression. This method allows for extremely efficient and accurate counts of mRNA transcripts; for a little more info please check out this post.

Figure 1. We generate cDNA using a poly(dT)VN primer. Our single end sequencing is stranded (i.e. the output reads are in same orientation as the original input mRNA) and will sequence towards the 3' end. We use unique molecular identifiers (UMIs) to deduplicate, and unique dual indices (UDIs) to prevent index hopping.

Figure 2. Our method is heavily biased towards the 3' end. The read length is ~90bp and all of the reads will align within the last ~400nt of the transcript.

Sample prep

We ask you to submit ~100k cells in 50 μL Zymo DNA/RNA Shield. As transcriptional activity varies widely, this cell number is approximate guidance for transcriptionally active cells. If you’re unsure how active your cells are, we strongly recommend performing a bulk RNA extraction and quantitation to calibrate the typical expression levels of your specific cells, and use a cell count number that will provide at least 500ng of total bulk RNA.

You may also elect to send fewer cells (e.g. to accommodate culture in 96-well plates) but the number of deduplicated reads produced may be substantially lower. If your material availability is not limited, you can also double or triple the amount of volume sent, as backup in the event of any process issues that may be encountered, requiring a rerun. If you do this, please do not exceed 200 μl total volume, and strongly prioritize maintaining a high RNA concentration (vs providing additional volume), as we cannot use more than 50 μl per processing iteration (or 30 μl per iteration for purified RNA).

We aim to return 10M deduplicated reads from about 20M raw reads, as well as count tables, sequencing QC, and raw data. For more detailed sample prep information please check out our sample prep instructions.

Bioinformatics analysis

Our bioinformaticians have crafted industry-leading pipelines. Our major analysis goal is to build such robust workflows that even the pickiest of bioinformaticians wouldn’t want to squander their time by redoing any of the steps. Our RNA-seq analysis pipeline consists of the following steps:

FastQ generation and demux with BCL Convert v4.3.6 and fqtk v0.3.1.
Read-filtering using FastP v0.24.0: poly-X tail trimming, 3' quality-based tail trimming, a minimum Phred quality score of 15, and a minimum length requirement of 50 bp.
Alignment to the appropriate reference genome using STAR aligner v2.7.11 with non-canonical splice junction removal and output of unmapped reads.
Coordinate sorting of BAM files using samtools v1.22.1.
UMI based de-duplication: Removal of PCR and optical duplicates using UMICollapse v1.1.0.
Mapping QC: Alignment quality metrics, strand specificity, and read distribution across genomic features using RSeQC v5.0.4 and Qualimap v2.3.
Generation of comprehensive QC report using MultiQC v1.32.
Gene-expression quantification using featureCounts (subread package v2.1.1) with strand-specific counting, multi-mapping read fractional assignment, exons and three prime UTR as the feature identifiers, and grouped by gene_id. Final gene counts were annotated with gene biotype and other metadata extracted from the reference GTF file.
Sample-sample correlations for sample-sample heatmap and PCA were calculated on normalized counts (TMM, trimmed mean of M-values) using Pearson correlation.
Differential expression using edgeR v4.0.16 with filtering for low-expressed genes with edgeR::filterByExpr with default values.
Functional enrichment performed for human and mouse samples using gene set enrichment analysis with GSEApy v0.12 using the MSigDB Hallmark gene set.

Data deliverables and file types

Plasmidsaurus has a big team of bioinformaticians who are constantly improving the background analyses that you see in your results page, and their goal is that these detailed interactive visualizations will allow you to jump right into your experiments and start getting insight, without needing to do the data processing yourself.

The following are all available on the Results UI page, or downloadable:

Interactive volcano plot | Visualize differential gene expression patterns of up- and downregulation.
Functional enrichment plot | Track the most affected regulatory pathways in your samples (human and mouse only at this time)
Per-sample expression profiles | Select genes or pathways you want to analyze and see them visualized.
Gene count tables | Review and download information about differentially expressed genes.
Enriched pathway tables | Review and download information cellular pathway representation
Deduplicated .bam file | UMI-deduplicated reads aligned to reference
Sample correlation matrix | Review pairwise similarity (Pearson’s) between your samples
Quality control summary | FastQC .html plots showing sequencing summary metrics
Reads | Raw reads. Please note that these reads are NOT delivered in the default download, but can be downloaded separately by clicking the "Download Raw fastq" button at the top of the "Order Information" page

Troubleshooting

Guarantees and rerun policy

We typically observe 10M or more unique (deduplicated) reads when samples meet our input guidelines. We do not guarantee a specific read count per sample, as this can vary heavily with the number of cells sent, which may occur inadvertently due to user overestimation of RNA levels, or intentionally (e.g. to accommodate culture in small volumes).

If we observe a process deviation (error) in our workflow, we will do our best to rerun your sample and increase the read count. As discussed above, sending additional sample volume, when practical, is a good approach to ensure we can do repeat analyses if necessary.