Keywords: phage sequencing, low DNA input, Illumina, Nanopore, PCR-free
Preparing phage DNA in sufficient quantities for sequencing is often a challenging task, especially when a sensitive bacterial host is not available for phage propagation.[1] This limitation poses a significant obstacle in phage research as the availability of adequate phage DNA is often considered crucial for various analyses, including genome sequencing, functional studies, and therapeutic developments. Also, because DNA extraction from phage samples (e.g., from bacterial induction) can yield low amounts of genomic DNA, many studies utilize tagmentation for amplification-free quantitative sequencing. However, this technique has the drawback of losing phage genome ends (termini) and creating biases in genome coverage.[2],[3]
Polymerase chain reaction (PCR)-free sequencing is often recommended or even necessary to obtain an unbiased characterization of phage genomes or communities. However, sequencing very low quantities of DNA without PCR amplification is challenging, and sequencing service providers, as well as library kit manufacturers, will only guarantee products and results with relatively high DNA inputs. In this study, we aimed to assess the feasibility of sequencing phage genomic DNA with very low DNA starting material and to determine the impact of decreasing DNA input on sequencing quality using both Illumina short-read and Nanopore long-read technologies. We analyzed the quantity and quality of output sequences (and their impact on genome assemblies) for different ranges of input DNA concentrations, starting at the recommended DNA inputs for each technology. We concluded that it is achievable to perform sequencing of high quality with DNA inputs that are lower (i.e., 1000-fold lower) than manufacturers’ recommendations or requirements. In this study, we successfully sequenced phage genomic DNA (without PCR amplification) using as little as 1 ng of total input DNA (or 0.02 ng/uL in 50 uL eluted volume) for short-read sequencing with Illumina technology and 0.4 ng (or 0,036 ng/uL in 11 uL eluted volume) for long-read sequencing with Nanopore technology.
Address correspondence to: Julian R. Garneau, Department of Fundamental Microbiology, University of Lausanne, CH-1015 Lausanne, Switzerland (E-mail: [email protected]; Phone: +41782552723)
Conflict of Interest: The authors declare no competing financial interests.
Human or Animal Subjects: Not applicable.
Bacteriophage genomic DNA was purified from deoxyribonuclease-pretreated pure phage lysates by a standard phenol-chloroform extraction procedure as previously described[4]. DNA quantity was measured using the Qubit fluorometer, and DNA quality was assessed with the Fragment Analyzer (Agilent Technologies, Santa Clara, CA, USA). For DNA fragmentation, 1 uL of each sample (1000 ng/uL, 100 ng/uL, 10 ng/uL, or 1 ng/uL) was added to 54 uL of Resuspension Buffer (RSB) in a Covaris glass tube. DNA fragmentation was then performed using the Covaris system (Covaris, Woburn, MA, USA), targeting a fragment size of 500 bp. The fragmented DNA was then used to prepare four libraries with the Illumina TruSeq DNA PCR-Free kit (Cat # 20015963). Sequencing was conducted on the Illumina MiniSeq platform with a 2x150 bp paired-end configuration.
Quality control and trimming of the raw sequencing reads were performed using fastp v0.23.4 using default parameters, and multi-sample consolidated quality reports were generated with MultiQC version 1.21. Genome assembly was carried out using SPAdes version 3.13.1[5] with the command: $ spades.py -1 forward_reads_R1_001.fastq.gz -2 reversed_reads_R2_001.fastq.gz --isolate --threads 10. Phage termini identification and DNA packaging mechanism analyses were conducted with PhageTermVirome [4] version 4.3 using the command: $ PhageTerm.py -f phage_forward_reads.fastq -p phage_reverse_reads.fastq -r phage_assembled_genome.fasta --report_title phagetermvirome_results_report -c 10.
If the final DNA libraries for certain samples are unquantifiable, it is recommended to combine these unquantifiable libraries in a pool with at least one quantifiable library to achieve the target pool concentration typically required for Illumina flow cells (e.g., 1 nanoM for the intermediate concentrated pool done prior performing the next dilutions to reach final flow cell loading concentration typically around 10 picoM). The volume of the intermediate concentrated pool should be predetermined, taking into account the volumes of the unquantifiable libraries to be added (these are usually added in their entirety into the pool). After mixing the quantifiable and unquantifiable libraries, the pool volume can be adjusted with RSB to reach the predetermined final volume and target library concentrations for loading onto the flow cell. Before proceeding with deep sequencing of the pooled libraries, it is recommended (but not mandatory) to perform a low-depth, low-cost sequencing test run (e.g., using the iSeq 100 system). This initial test run allows for estimating the number of reads (and coverage) that will be achieved for the unquantifiable libraries in a subsequent deeper sequencing run. Based on the results of the test run, the user can then adjust the parameters for the future high-depth sequencing run. For example, if the test run yields 10 000 reads for the unquantifiable libraries but the target is 100 000 reads, the user can calculate the necessary adjustments to achieve the desired read count in the high-depth sequencing run.
DNA quantity was measured using the Qubit fluorometer, and purity was measured using Nanodrop. DNA quality was assessed with the Fragment Analyzer. Libraries were prepared for the four different starting DNA inputs tested (400 ng, 40 ng, 4 ng, 0.4 ng) using the Native Barcoding Kit 24 V14 (Cat # SQK-NBD114.24, protocol version NBA_9168_v114_revM_15Sep2022, Oxford Nanopore, Oxford, United Kingdom). All DNA cleaning steps were carried out with a bead ratio of 1:1. Sequencing was performed using an R10.4.1 flow cell for 72 hours using an Oxford Nanopore GridION. High accuracy basecalling was performed with MinKNOW version 24.02.16 software. For non-quantifiable libraries, the entire volume was used in the library pooling step.
Quality control of the raw sequencing data was performed using pycoqc version 2.5.2. Genome assembly was carried out using SPAdes version 3.15.5, using reads subsampled at length ≤ 3 kb, with the command: $ spades.py --only-assembler --nanopore phage_reads.fastq.gz -s phage_reads-copy.fastq.gz. Phage termini identification and DNA packaging mechanism analyses were conducted with PhageTermVirome[4] version 4.4 using the command: $ PhageTerm.py -f phage_reads.fastq.gz -r phage_assembled_genome.fasta --report_title phagetermvirome_results_report -c 10.
Raw sequences were deposited in the Sequence Read Archive (SRA) under the Bioproject accession PRJNA1187591.
In this study, we performed whole genome sequencing on genomic DNA from purified phage particles (phage LF82_10 for Illumina and HK97 for Nanopore), with the aim of assessing if it is possible to obtain high-quality, fully assembled genomes using very low genomic DNA input as starting material. DNA integrity and quality were verified, and serial dilutions were performed starting from the manufacturer’s recommended DNA input quantity (1000 ng for Illumina short-read PCR-Free sequencing and 400 ng for Nanopore long-read sequencing), until low DNA quantity was reached (1 ng for short-read and 0.4 ng for long-read sequencing). The objective was also to assess if the quality of sequences and assembled genomes obtained with very low DNA input is similar to the quality of genomes obtained from recommended DNA input. The step-by-step workflow for this study is presented in Figure 1.
Summary of the workflow for the comparison of sequencing with recommended and low DNA input. Green boxes represent steps performed up to sequencing. Blue boxes represent post-sequencing operations and analyses.
The quantity of raw reads obtained after sequencing the samples at different DNA concentrations and with the two different technologies is reported in Table 1. The results of the assemblies for each sample are also given in Table 1, and we show that we obtain full genome with identical lengths for every DNA input. We also show that we achieve proper identification of phages’ termini type and packaging mechanism, using PhageTermVirome, which is another indication of successful genome sequencing and assembly. Importantly, we report that the quality of the reads is highly stable and virtually unaffected with decreasing DNA inputs, after sequencing with both the Illumina and Nanopore platforms.
Post-sequencing assembly information for phage LF82_P10 and HK97 using decreasing DNA inputs. | ||||||||
LF82_P10 (Illumina) | HK97 (Nanopore) | |||||||
DNA input (ng) | 1000 | 100 | 10 | 1 | 400 | 40 | 4 | 0,4 |
|---|---|---|---|---|---|---|---|---|
Total reads | 2 405 987 | 3 287 479 | 839 227 | 86 754 | 233 615 | 41 978 | 30 899 | 8 710 |
% of bases >Q30 pre-filtering | 96,2 | 96,4 | 96,4 | 95,6 | - | - | - | - |
Median Phred score (Error probability P in %) | - | - | - | - | 16,5 (2,2%) | 16,3 (2,3%) | 15,7 (2,7%) | 14,8 (3,2%) |
Assembly | Fully assembled (87 317 bp) | Fully assembled (39 861 bp) | ||||||
Termini | DTR short | Cos3' | ||||||
Based on the experiments performed in this study, we conclude and suggest that sequencing with ultra-low inputs of genomic DNA (i.e., at concentrations 1000-fold lower than the input recommended by the manufacturers) is feasible, using both Illumina and Nanopore platforms, without the need to perform additional DNA amplification steps, thus limiting sequencing biases. Phage genomic DNA in this work was successfully sequenced using as little as 1 ng of total input DNA (or 0.02 ng/uL in 50 uL eluted volume) for short-read sequencing with Illumina technology and 0.4 ng (or 0,036 ng/uL in 11 uL eluted volume) for long-read sequencing with Oxford Nanopore technology.
The authors would like to express their gratitude to Marie-Agnès Petit (phage HK97) and Laurent Debarbieux (phage LF82) for providing phage gDNA used in this study. Biomics Platform, C2RT, Institut Pasteur, Paris, France, is supported by France Génomique (ANR-10-INBS-09) and IBISA. M.M., FJ and J.G. were supported by the “CDPhages” ANR JCJC grant from ANR-18-CE35-0011.