Technical Documentation

Whole Plasmid Technical Documentation

Technical details Sequencing technology Bioinformatics analysis Data deliverables and file types Troubleshooting

Technical details

Plasmidsaurus was the first company to bring a nanopore based plasmid sequencing service to the market to make it easier for scientists to verify their full plasmid sequence with continuous long reads to enable detection of critical errors. This document provides more technical details about the service technology and deliverables.

Sequencing technology

We sequence each sample with Oxford Nanopore long reads to very high depth before generating a consensus/assembly using the latest basecalling and polishing software:

We construct an amplification-free long-read sequencing library using the newest v14 library prep chemistry, including linearization of the circular input DNA in a sequence-independent manner.
We sequence the library with a primer-free protocol using the most accurate R10.4.1 flow cells (raw data is delivered in .fastq format).

We use the latest flowcells and chemistry kits from Oxford Nanopore, along with the latest Super Accurate basecalling model. The vast majorities of our plasmid assemblies contain no errors compared to the reference, which you can read more about here. Consensus accuracy is often above Q60, which corresponds to 99.9999%, or one error per 1,000,000 bases.

Bioinformatics Analysis

We use the latest Super Accurate basecalling model. To learn more about common basecalling errors and how we address them, please see this post.

We generate a high-accuracy circular consensus sequence from the raw reads. For standard size plasmids, we will also return a set of feature annotations.

Data deliverables and file types

Consensus sequence (.fasta file): Provides the polished consensus sequence of the plasmid, generated from the raw reads.
Consensus sequence (.gbk file): Provides the polished consensus sequence of the plasmid, generated from the raw reads. Also includes a plasmid map and feature annotations from the excellent pLannotate tool from the Barrick Lab: McGuffie,M.J. and Barrick,J.E. (2021) pLannotate: engineered plasmid annotation. Nucleic Acids Research DOI: 10.1093/nar/gkab374
Plasmid map (.html file): An interactive version of the pLannotate plasmid map.
Read length histogram (.png file): Displays the read length distribution of the raw reads produced by your sample, thereby providing unique insight into the contents of your samples. See more details about how to interpret your histograms here.
Virtual gel (.png file): Displays the raw read lengths from all samples in the order in a virtual gel format, resembling what you’d see if you ran the DNA fragments on a gel.
Chromatogram (.ab1 file): Displays the relative abundance of each nucleotide (A, T, G, C) for all raw reads that align to the consensus at each position of the sequence. See more details about how to interpret your chromatograms below.
Coverage plot (.png file): Displays the relative sequencing coverage at each position of the consensus sequence. A region with a large gap or a sudden large increase suggests either an assembly issue or a mixture of multiple plasmid species.
Per-base data (.txt and .tsv files): Includes 3 sub-files for each sample:
- SAMPLE.tsv: Indicates how well the raw reads agree with the consensus sequence at each position. The list includes the consensus basecalls at each position, along with number of total raw reads aligning at that position and the basecall distributions in the raw reads for that position (A, T, G, C, matches, mismatches, insertions, deletions, etc.).
- SAMPLE_multimer_analysis.txt: Indicates the % distribution of the various concatemer forms of the consensus sequence (monomer, dimer, trimer, etc.).
- SAMPLE_summary.tsv: Indicates the length, average coverage, relative composition (by moles and mass), total reads, total bases, and %. E. coli genomic DNA contamination for the consensus sequence.
Raw read sequences (.fastq.gz file): Provides the sequences of individual raw reads that align to the consensus. Please note that these reads are NOT delivered in the default download, but can be downloaded separately by clicking the "Download Raw FASTQ" button at the top of the "Order Information" page. Note that any raw reads that do not align to the consensus (e.g. host genomic DNA, lower abundance molecular species) are excluded.
FASTQ is a sequence format similar to FASTA, with additional Phred quality score information that can displayed graphically by most modern sequence viewers. Since each basecalled position only has one quality score, certain sequence features, such as insertions or deletions, must be inferred from looking at adjacent bases.

Causes of sample failure

For plasmids, "failure" means that your sample did not produce data of sufficient quality and quantity for the pipeline to generate a consensus sequence.

Our low sequencing prices and fast turnaround times do not include extensive QC to determine why your plasmid samples failed (or had low coverage). Although we do not provide definitive reasons for failure, by far the most common reasons are:

Samples are not prepared at the required DNA concentration.
The most common cause of this is using a Nanodrop to quantify DNA concentration. We strongly recommend using a Qubit or equivalent.
You may see evidence of this failure mode in the low amount of total data reported in the raw read length histogram and in the low consensus coverage reported in the SAMPLE_summary.tsv file.
Samples contain a mixture of plasmid species and/or fragmented genomic DNA or fragmented plasmids. You may see evidence of this failure mode in a wide range of read lengths reported in the raw read length histogram.

To achieve optimal sequencing results, please follow our recommended plasmid sample prep instructions.

Guarantees

We do not guarantee any specific level of coverage, as the number of raw reads generated can vary substantially depending on sample quality.

Successful samples sent at the required concentration typically yield in the high dozens to hundreds (or thousands!) of raw sequencing reads.

Average coverage is reported in the SAMPLE_summary.tsv file. Coverage over ~20x indicates a very accurate consensus.

Rerun policy

You are welcome to submit a rerun request for any failed sample through your Order Info page or via the support@plasmidsaurus.com email address. We will evaluate whether you sample quality and quantity permits rerunning the sample (and we may also ask you to provide a reference sequence).

Sample quality checks may require any of the following: