RNA-Seq: Ultrafast gene expression analysis. Now with ambient shipping for cells and purified RNA. Learn More
Back
Technical Documentation

Shotgun Metagenomics Technical Documentation

Technical details

Plasmidsaurus Metagenomics enables you to comprehensively profile diverse communities. 

 

Sequencing technology

We sequence each sample using the newest short-read sequencing technology from Illumina. We use transposase-mediated library preparation chemistry to profile the total genomic DNA present in your sample. For sequencing, we use a 2 x 150 pair-end read configuration.

 

Bioinformatics analysis

All samples are processed through our automated pipeline: 

  • Quality control | Raw paired-end reads are filtered using fastp with the following parameters:
    • Minimum Phred quality score: 15
    • Minimum read length: 50 bp
    • Automatic adapter detection for paired-end reads
    • Poly-G tail trimming (removes artifacts from NovaSeq two-color chemistry)
    • Samples must yield at least 1M reads after QC to proceed to taxonomic analysis
  • Host removal | Reads are aligned against human and mouse reference genomes using minimap2 to remove host-derived sequences. Only unmapped (non-host) read pairs are retained for downstream analysis.
  • Taxonomic profiling | Non-host reads are profiled using Sylph with abundance-corrected MinHash, against curated reference databases covering bacteria, archaea (GTDB r226), fungi (RefSeq), protists, and eukaryotic DNA viruses. Taxonomy is assigned via sylph-tax.
  • Functional profiling | Non-host reads are profiled with HUMAnN 3.9 to quantify the functional potential of the community. Reads are first screened against species-specific ChocoPhlAn nucleotide pangenomes using bowtie2; reads that do not map are then aligned against the UniRef90 protein database using diamond translated search. The resulting UniRef90 gene family abundances are:
    • Regrouped to EggNOG v5 orthologous groups and annotated with human-readable names
    • Aggregated into MetaCyc metabolic pathways, reported both as relative abundance and as pathway coverage scores.
    • All per-sample functional tables are delivered as community totals (unstratified). 
  • Differential Analysis |  On your results page select up to two groups of samples to generate differential abundance comparisons of taxonomic and functional profiles. We use centered log-ratio transformation for compositional abundances, feature-level statistical testing with multiple-testing correction, and PERMANOVA on Aitchison distances for community-level comparisons. Outputs include: 
    • Interactive plots showing changes in top pathways and taxa
    • Tables with all differentially abundant pathways and taxa with FDR-adjusted p-values and fold-changes. 
    • PERMANOVA results for overall community differences with PCoA coordinates for visualization.

       

 

Service levels

  Sample inputs
Service LevelSequencing ConfigurationRaw Data TargetSample Submission
StandardIllumina
2 x 150 paired-end reads
2 Gb20 µL at 10-20 ng/µL normalized concentration
Big10 Gb

 

Data deliverables and file types

Per-sample outputs

  • Raw reads (.fastq.gz) — unfiltered paired-end Illumina reads
  • QC report (.html) — read quality, filtering, and adapter trimming summary from fastp
  • Taxonomic profile (.tsv) — species-level abundance table across bacteria, archaea, fungi, protists, and DNA viruses
  • Alpha diversity (.json / .tsv) — species- and genus-rank community metrics: observed taxa, Shannon, inverse-Simpson, Pielou evenness, Berger–Parker dominance, Hill numbers (q=0,1,2), and unclassified fraction
  • Pathway abundance (.tsv) — MetaCyc pathway relative abundances, community totals 
  • Pathway coverage (.tsv) — MetaCyc pathway coverage scores [0, 1], confidence each pathway is present in community
  • EggNOG gene families (.tsv) — UniRef90 families regrouped to EggNOG v5 orthologous groups with human-readable names

Order-level outputs

  • Genus abundance (.tsv) — genera × samples matrix, values = relative abundance percent
  • Species abundance (.tsv) — same format, species rank
  • Pathway abundance (.tsv) — MetaCyc pathways × samples matrix, values = relative abundance
  • Pathway completeness (.tsv) — same format, values = coverage scores

     

 

Troubleshooting

Likely causes of microbiome sequencing failure

For Metagenomic samples, "failure" means that your sample did not produce the minimum raw data target. The outcomes of the taxonomic analyses will be highly variable depending on the specific microbial community, so these analyses are not used to determine the sequencing status (complete or fail) for this service. Status is strictly based on raw data yield.

Although we do not provide definitive reasons for failure, common reasons are:

  • Samples contain sequencing inhibitors or contaminants
    • Depending on the biological source material and extraction method, extracted metagenomic DNA may still contain abundant or persistent impurities that require additional purification before library prep in order to obtain high quality sequencing results. Please purify your DNA using a spin column (we highly recommend Zymo OneStep PCR Inhibitor Removal Kit for metagenomic gDNA), with AMpure XP beads, or an equivalent method, and submit the purified gDNA in elution buffer (10 mM Tris, pH 8.5) or nuclease-free water
  • Samples are not prepared at the required DNA concentration of 10 ng/µl
    • The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent fluorometric method

To achieve optimal sequencing results, please follow our recommended sample prep instructions

 

Guarantees and rerun policy

In cases where the DNA concentration and quality are adequate, but we do not hit the service level data target, we will evaluate the results of the initial sequencing attempt to determine whether additional sequencing may produce a more successful outcome, and if so we will repeat the sequencing at no additional charge. We will also combine the data from the two runs together to increase chances of success on the repeat attempt.

If we are not able to achieve the data target after the free rerun, we will not perform further reruns on the sample. We still charge for failed samples.