Technical Documentation

Shotgun Metagenomics Technical Documentation

Technical details Sequencing technology Bioinformatics analysis Service levels Data deliverables and file types Troubleshooting

Technical details

Plasmidsaurus Metagenomics enables you to comprehensively profile diverse communities.

Sequencing technology

We sequence each sample using the newest short-read sequencing technology from Illumina. We use transposase-mediated library preparation chemistry to profile the total genomic DNA present in your sample. For sequencing, we use a 2 x 150 pair-end read configuration.

Bioinformatics analysis

All samples are processed through our automated pipeline:

Quality control | Raw paired-end reads are filtered using fastp with the following parameters:
- Minimum Phred quality score: 15
- Minimum read length: 50 bp
- Automatic adapter detection for paired-end reads
- Poly-G tail trimming (removes artifacts from NovaSeq two-color chemistry)
- Samples must yield at least 1M reads after QC to proceed to taxonomic analysis
Host removal | Reads are aligned against human and mouse reference genomes using minimap2 to remove host-derived sequences. Only unmapped (non-host) read pairs are retained for downstream analysis.
Taxonomic profiling | Non-host reads are profiled using Sylph with abundance-corrected MinHash, against curated reference databases covering bacteria, archaea (GTDB r226), fungi (RefSeq), protists, and eukaryotic DNA viruses. Taxonomy is assigned via sylph-tax.
Functional profiling | Non-host reads are profiled with HUMAnN 3.9 to quantify the functional potential of the community. Reads are first screened against species-specific ChocoPhlAn nucleotide pangenomes using bowtie2; reads that do not map are then aligned against the UniRef90 protein database using diamond translated search. The resulting UniRef90 gene family abundances are:
- Regrouped to EggNOG v5 orthologous groups and annotated with human-readable names
- Aggregated into MetaCyc metabolic pathways, reported both as relative abundance and as pathway coverage scores.
- All per-sample functional tables are delivered as community totals (unstratified).
Differential Analysis | On your results page select up to two groups of samples to generate differential abundance comparisons of taxonomic and functional profiles. We use centered log-ratio transformation for compositional abundances, feature-level statistical testing with multiple-testing correction, and PERMANOVA on Aitchison distances for community-level comparisons. Outputs include:
- Interactive plots showing changes in top pathways and taxa
- Tables with all differentially abundant pathways and taxa with FDR-adjusted p-values and fold-changes.
- PERMANOVA results for overall community differences with PCoA coordinates for visualization.

Service levels

		Sample inputs
Service Level	Sequencing Configuration	Raw Data Target	Sample Submission
Standard	Illumina 2 x 150 paired-end reads	2 Gb	20 µL at 10-20 ng/µL normalized concentration
Big	Illumina 2 x 150 paired-end reads	10 Gb	20 µL at 10-20 ng/µL normalized concentration

Data deliverables and file types

Per-sample outputs

Raw reads (.fastq.gz) — unfiltered paired-end Illumina reads
QC report (.html) — read quality, filtering, and adapter trimming summary from fastp
Taxonomic profile (.tsv) — species-level abundance table across bacteria, archaea, fungi, protists, and DNA viruses
Alpha diversity (.json / .tsv) — species- and genus-rank community metrics: observed taxa, Shannon, inverse-Simpson, Pielou evenness, Berger–Parker dominance, Hill numbers (q=0,1,2), and unclassified fraction
Pathway abundance (.tsv) — MetaCyc pathway relative abundances, community totals
Pathway coverage (.tsv) — MetaCyc pathway coverage scores [0, 1], confidence each pathway is present in community
EggNOG gene families (.tsv) — UniRef90 families regrouped to EggNOG v5 orthologous groups with human-readable names

Order-level outputs

Genus abundance (.tsv) — genera × samples matrix, values = relative abundance percent
Species abundance (.tsv) — same format, species rank
Pathway abundance (.tsv) — MetaCyc pathways × samples matrix, values = relative abundance
Pathway completeness (.tsv) — same format, values = coverage scores

Troubleshooting

Likely causes of microbiome sequencing failure

For Metagenomic samples, "failure" means that your sample did not produce the minimum raw data target. The outcomes of the taxonomic analyses will be highly variable depending on the specific microbial community, so these analyses are not used to determine the sequencing status (complete or fail) for this service. Status is strictly based on raw data yield.

Although we do not provide definitive reasons for failure, common reasons are:

Samples contain sequencing inhibitors or contaminants
- Depending on the biological source material and extraction method, extracted metagenomic DNA may still contain abundant or persistent impurities that require additional purification before library prep in order to obtain high quality sequencing results. Please purify your DNA using a spin column (we highly recommend Zymo OneStep PCR Inhibitor Removal Kit for metagenomic gDNA), with AMpure XP beads, or an equivalent method, and submit the purified gDNA in elution buffer (10 mM Tris, pH 8.5) or nuclease-free water
Samples are not prepared at the required DNA concentration of 10 ng/µl
- The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent fluorometric method

To achieve optimal sequencing results, please follow our recommended sample prep instructions.

Guarantees and rerun policy

In cases where the DNA concentration and quality are adequate, but we do not hit the service level data target, we will evaluate the results of the initial sequencing attempt to determine whether additional sequencing may produce a more successful outcome, and if so we will repeat the sequencing at no additional charge. We will also combine the data from the two runs together to increase chances of success on the repeat attempt.

If we are not able to achieve the data target after the free rerun, we will not perform further reruns on the sample. We still charge for failed samples.