RNA-Seq: Ultrafast gene expression analysis. Now with ambient shipping for cells and purified RNA. Learn More

Resource Center

FAQ

Shotgun Metagenomics

Technical Details

Plasmidsaurus Shotgun Metagenomics is performed using the newest short-read sequencing technology from Illumina. We use transposase-mediated library preparation chemistry to profile the total genomic DNA present in your sample—capturing bacteria, archaea, fungi, and viruses in a single assay—and return a comprehensive taxonomic analysis of your microbial community.

We have tested our pipeline and analysis for a range of communities. We see strong reproducibility and correlation with known mock community abundances in Figure 1:

Figure 1: Replicates shotgun metagenomics sequencing runs compared to theoretical abundance for Zymo Research Microbial Community DNA Standard.

 

Raw paired-end reads undergo quality control which filters low-quality bases (Phred < 15), removes short reads (< 50 bp), auto-detects and trims adapters, and trims poly-G tails. Samples must pass a minimum read count threshold (1M reads after QC) to proceed.


Next, host-derived reads are removed by aligning against reference genomes (human and mouse by default) using minimap2. Only unmapped (non-host) read pairs are carried forward for taxonomic analysis.


Taxonomic profiling is performed using Sylph, an ultrafast abundance-corrected MinHash profiler. Reads are profiled against four databases covering bacteria/archaea (GTDB r226), fungi (FungiRefSeq), protists, and eukaryotic DNA viruses. 

Sylph is a sketch-based profiler, it compares compressed k-mer representations of your reads against reference sketches rather than aligning reads individually. This makes it fast and accurate but means there's no per-read classification. Abundance estimates reflect the fraction of your data belonging to each organism, not a one-to-one read-to-taxon mapping. Host-removed FASTQs are provided so you can see the non-host reads that went into the analysis.

In our in-house mock community benchmarks, Sylph consistently outperformed other profiling methods. For example, read-level classifiers are prone to false positives and rely on heuristic thresholds. Sylph uses a statistically principled ANI-based approach and accurately detects low-abundance organisms that read-level tools often miss or misclassify. The tradeoff is no per-read assignments, but the result is a cleaner, more reliable profile.

Quality-controlled, host-filtered read pairs are passed into HUMAnN, which runs a tiered search: nucleotide mapping against pangenomes of detected species, followed by translated search of unmapped reads against UniRef90 via DIAMOND. Hits are aggregated into UniRef90 gene family abundances and regrouped into functional pathways. 

Differential analysis compares two user-defined sample groups using curated Sylph taxonomic profiles and HUMAnN pathway abundances. Abundances are transformed with a centered log-ratio approach appropriate for compositional data, then tested feature-by-feature for taxa and pathways with FDR correction. We also test overall community differences using PERMANOVA on Aitchison distances and provide PCoA coordinates for visualization.

Every order includes interactive taxonomic and functional reports on the results page. These tools allow you to explore your community composition via dynamic sankey and bar plots and explore functional pathway abundances within minutes of receiving your results. You also receive raw FASTQ files, QC metrics, and detailed abundance tables for downstream use.

When your order is ready, you'll receive an email with a download link and access to your interactive results on our website. We provide an order-level summary with exportable bar and area charts showing taxonomic and functional abundances, plus individual sample reports with more detail. You can also download the raw data for custom analyses or visualizations.

The limit of detection depends on the species composition of your community and the level of host DNA present. In our testing, we have frequently detected community members with <0.1% abundance.

We recommend selecting the Big tier when profiling samples with significant host DNA, studying low abundance species or analyzing functional pathways in diverse samples. 

We can sequence samples with any level of host DNA. However, because this untargeted approach sequences all DNA present, host DNA will occupy a portion of the sequencing depth, proportionally reducing the data available for microbial profiling. Our pipeline identifies and filters out human and mouse host reads. 

Results for the shotgun metagenomics services are typically returned within 6-8 business days.

There are numerous approaches to extracting metagenomic DNA from the source material (swabs, soil, feces, et cetera), so we do not provide specific recommendations. Any extraction method that yields high quality, high purity, high molecular weight, double-stranded gDNA that is free of nicks, gaps, breaks, and contaminants (enzymatic inhibitors, RNA, etc.) is suitable for this service. 


See sample prep for more information.