Keywords: automation, full-length sequencing, miniaturization, PLATE-seq, RNA-seq, single-cell analysis, single-cell library prep
Single-cell RNA sequencing (scRNA-seq) has the ability to classify each cell and determine the transcriptomic profile of specific cell types and cells of a given disease state; however, sensitivity of the gene count for each cell can be a critical component to the success of a single-cell study. The recently introduced SMART-Seq Single Cell PLUS Kit (SSsc PLUS) claims to provide higher sensitivity and reproducibility versus popular methods for the sequencing analysis of single cells. Here, the cDNA-generation component of the kit, SMART-Seq Single Cell Kit (SSsc), was compared with the popular homebrew protocol, Smart-seq2, and its update, Smart-seq3. The SMART-Seq Library Prep Kit from SSsc PLUS was benchmarked against a commonly used scRNA-seq library preparation method, Illumina Nextera XT. Finally, the SSsc chemistry was tested in both full and fractional volumes on 2 popular liquid-handler devices to investigate whether the high sensitivity was maintained in miniaturization. We demonstrate that SSsc PLUS outperforms these other full-length methods in convenience, sensitivity, gene identification, and reproducibility while also offering full compatibility with automation platforms.
ADDRESS CORRESPONDENCE TO: Andrew Farmer, Takara Bio USA, Inc., 2560 Orchard Parkway, San Jose, CA 95131, USA (Phone: 650-919-7347; E-mail: [email protected]).
Conflict of interest: Authors are employees of Takara Bio USA, Inc.
Keywords: automation, full-length sequencing, miniaturization, PLATE-seq, RNA-seq, single-cell analysis, single-cell library prep
Groundbreaking efforts, such as the Human Cell Atlas project,[1] have underscored the importance of deciphering the transcriptome of complex organisms at the most basic level—cells. Advances in sequencing technologies and library preparation have allowed the single-cell analysis community to investigate the nucleic-acid content of each cell with increasing accuracy. Leveraging the higher resolution of the transcriptome available using RNA sequencing (RNA-seq),[2] researchers are unraveling, with high definition, the biology that makes up single cells, especially under-expressed biological events such as rare fusions and isoforms. In particular, single-cell RNA-seq (scRNA-seq) can provide cell identification in heterogeneous cell types and pinpoint the regulators of cell function, such as in the mammalian brain,[3] or the presence of specific mutations in diseases like cancer.[4]
There are 2 main technologies used for scRNA-seq: droplet and full-length sequencing. Droplet sequencing can be used for an initial, high-level overview of a single-cell population; however, it is typically less sensitive and provides limited information on the full transcript, focusing on either the 3ʹ end of the mRNA or, less frequently, the 5ʹ end. Alternatively, full-length, pooled library amplification for transcriptome expression sequencing (PLATE-seq) mRNA methods give a deeper view into the datasets, both in terms of gene detection and transcript analysis, enabling identification of alternative splices, gene fusions, and single-nucleotide polymorphisms. Traditionally, full-length PLATE-seq methods were not amenable to high-throughput workflows; however, automation instrumentation has allowed the development of full-length scRNA-seq experiments on a larger scale.
Full-length, PLATE-seq, scRNA-seq workflows include homebrew chemistries (i.e., not available in a prefabricated kit) and commercial kits. Although homebrew chemistries are cost-effective in terms of per-unit prices, these methods have many disadvantages. One of the primary downsides is reproducibility, as each component is ordered independently from multiple sources rather than from a single kit or even a single source. Individual researchers, therefore, have to ensure quality control for each component and the overall results. In most cases, the quality of the components and their performance in concert with each other is not known until sequencing and analysis are complete, which can waste precious samples, take additional time, and increase sequencing costs. Another challenge researchers face with homebrew protocols is that they vary from laboratory to laboratory. This lack of uniformity makes it difficult to confidently compare results or findings from one laboratory to another. On the other hand, although kits have a higher per-unit price, they eliminate much of the uncertainty introduced by homebrew methods that was just described and can allow for more highly reproducible experiments.
In this paper, the most recent single-cell RNA–sequencing kit from Takara Bio USA, Inc., SMART-Seq Single Cell Kit (SSsc), was benchmarked against 2 popular homebrew methods: Smart-seq2[5],[6] (SS2) and an updated method from the same laboratory, Smart-seq3[7] (SS3). As with all previous kits, SSsc incorporates Takara Bio’s SMART (Switching Mechanism at 5′ end of RNA Template) technology.[8] Next, cDNA generated by the SSsc kit was used to compare the performance of the SMART-Seq Library Prep Kit (SSlp) with that of the SMART-Seq Single Cell PLUS Kit (SSsc PLUS), an “all-in-one” solution for generating sequence-ready libraries from a single cell or RNA, with libraries prepared with the commonly used Nextera XT preparation from Illumina, Inc. In addition to the benchmarking analysis, the compatibility of the SSsc kit with automation and miniaturization was also evaluated using 2 popular automation systems.
The lymphoblastoid cell line, GM12878 (American Type Culture Collection, Manassas, VA, USA), was cultured according to American Type Culture Collection recommendations. Frozen peripheral blood mononuclear cells (PBMCs) were obtained from BioIVT (Hicksville, NY, USA), thawed for use according to BioIVT recommendation, and labeled with anti-CD3-FITC (Sigma-Aldrich, St. Louis, MO, USA) to isolate T cells. Both GM12878 and PBMCs were labeled with 7-AAD (BioLegend, San Diego, CA, USA) to distinguish live cells from dead and then sorted with a FACSJazz instrument (BD, Franklin Lakes, NJ, USA) into 96-well plates. Single Chinese Hamster Ovary (CHO) cells were dispensed into 384-well plates using a single-cell dispenser.
Unless otherwise noted, all libraries were created with the SSsc kit (Takara Bio USA, Inc., San Jose, CA, USA) per the manufacturer’s instructions. RNA inputs were either 10 pg of Mouse Brain Total RNA (Takara Bio USA, Inc.) or single cells.
The SS2 samples were processed per the protocol.[6] For the SS2 versus SSsc comparisons, 19 cycles of PCR were used to amplify the cDNA.
The SS3 samples were processed per the protocol.[7] For the SS3 versus SSsc comparisons, 21 cycles of cDNA amplification were used for SS3 chemistry, and 23 cycles were used for SSsc chemistry.
cDNA generation with the MANTIS Liquid Handler (Formulatrix, Bedford, MA, USA) and mosquito HV (SPT Labtech, Boston, MA, USA) was done with standard or exact fractions (one-quarter and one-eighth volume, respectively) of all reagents in the SSsc kit. Reactions were run per the respective recommendations available from Takara Bio for the MANTIS (https://www.takarabio.com/a/111238) and mosquito HV (https://www.takarabio.com/a/111241).
Three replicate cDNA samples were generated with SSsc from 10 pg of mouse brain RNA. Two methods of library preparation were used. SSsc PLUS sequencing libraries were prepared with the SMART-Seq Single Cell PLUS Kit (Takara Bio USA, Inc.) per the manufacturer’s instructions. Although SSlp allows for flexible library inputs from the SSsc kit, for the work presented here, libraries were generated from 1 ng of cDNA and 15 cycles of PCR. Nextera XT sequencing libraries were generated with the Nextera XT DNA Library Preparation Kit (Illumina, Inc., San Diego, CA, USA) and a protocol optimized for use with SSsc-generated cDNA. This optimized protocol uses 125 pg of SSsc cDNA and a 10-min incubation at 55°C versus the cDNA input and incubation time in the standard Nextera XT protocol. For each method, libraries were prepared in duplicate for the 3 replicates of cDNA (i.e., 6 libraries per method). The libraries for each method were normalized and pooled for sequencing, respectively.
Libraries were sequenced on a NextSeq 500 instrument (Illumina, Inc.) using 2 x 75 bp paired-end reads. Sequencing analysis was performed with the Cogent NGS Analysis Pipeline (Takara Bio USA, Inc.) and CLC Genomics Workbench (Qiagen Digital Insights, Redwood City, CA, USA) mapping to the human (hg38) genome with Ensembl annotation.
The SS2 protocol,[6] which was updated recently by the release of the SS3 protocol,[4] and Takara Bio’s SMART-Seq technology are the most widely used methods in the scientific community to generate in-depth characterization of the transcriptome at the single-cell level. The goal was to compare the performance between these homebrew protocols and commercial methods. For comparing SSsc versus SS2, sorted, single cells from the lymphoblastoid cell line GM12878 were used (Fig. 1A); for the SSsc versus SS3 comparison, single T cells were sorted out from primary PBMCs (Fig. 1B).
SSsc shows greater exon mapping (blue) (Fig. 2A) and greater sensitivity (Fig. 2B) compared with SS2. The median gene count for GM12878 cells using SSsc is significantly higher (P > 0.05) at 9980 genes, whereas the median for SS2 is 8801 genes. SS2 has a lower sensitivity because of the greater percentage of intron (purple), intergenic (green), and mitochondrial (dark blue) mapping (Fig. 2A). Although the exon mapping (light blue) was comparable between SSsc and SS3, SS3 had higher intron mapping (purple) (Fig. 2A). Moreover, SSsc showed statistically greater sensitivity (median 6202 versus 5108 genes, P > 0.05) than SS3 (Fig. 2B). SSsc did have higher ribosomal mapping relative to both SS2 and SS3.
The Spearman correlation (ρ) was calculated for every possible unique pairing of samples for the SS2 (n = 190) versus SSsc (n = 153) comparison and the SS3 (n = 762) versus SSsc (n = 762) comparison. The median ρ for SS2 was 0.684 and for SSsc was 0.814 for GM12878 cells, which was a significantly higher ρ for SSsc (P > 0.05). The median ρ for SS3 was 0.411 and for SSsc was 0.458 for T cells. Low ρ for T cells is expected, as these were primary cells that tend to show less consistency of expression than cell lines. Although the difference in median ρ value between SS3 and SSsc for T cells was small, the median ρ for SSsc was significantly greater (P > 0.05). Boxplots showing the distribution of the ρ values for each sample type are shown in Fig. 2C. Both SS2 and SS3 show a slightly more uniform gene-body coverage versus SSsc, which shows a slight 3ʹ bias (Fig. 2D).
To compare reproducibility of library preparation between different samples using SSlp or Nextera XT, triplicate cDNA were first generated using SSsc from 10 pg of mouse brain RNA (Fig. 1C). This cDNA was then used as input in SSlp or Nextera XT procedures to compare library preparation performances (Fig. 1C). Libraries were produced in duplicate for each cDNA sample, which resulted in a total of 6 libraries total for each method (Fig. 3). As noted in the Materials and Methods section, library preparation using the Nextera XT kit with cDNA generated by the SSsc kit had previously been optimized by Takara Bio with a modification to the manufacturer’s recommended procedure. Using the optimized Nextera XT protocol versus the standard SSlp method for SSsc PLUS, the resulting library yields were, on average, 5× higher for SSlp versus Nextera XT (median = 55.4 nM and 10.5 nM, respectively; Fig. 3A). The average read distribution values (exon, intron, intergenic, rRNA, and mitochondria) across the 6 libraries prepared from the same cDNA for both SSlp and Nextera XT show comparable values (Fig. 3B); this indicates that SSlp performs as well as Nextera XT. The gene counts for the SSlp method are greater than those for the Nextera XT protocol for the same samples [medians = 14,643 (SSlp) and 14,494 (Nextera XT); Fig. 3C; P > 0.05].
Comparable and high correlations were found between all possible pairwise comparisons (n = 15) of the 6 replicate libraries for SSlp (average R2 = 0.955) or Nextera XT (average R2 = 0.949), respectively. For all possible pairwise comparisons (n = 21) for the SSlp versus Nextera XT, the average R2 = 0.911. The direct comparison of SSlp versus Nextera XT libraries prepared from the same cDNA shows an average R2 = 0.952. Directly comparing the sequencing libraries from the same cDNA for SSlp and Nextera XT showed correlation. For the 3 paired libraries for SSlp, the R2 = 0.998, 0.998, and 0.997. For the 3 paired libraries for Nextera XT, the R2 = 0.998, 0.996, and 0.991. Figure 3D shows representative x-y plots and the corresponding R2 values from comparisons of libraries prepared from the same SSsc cDNA.
As automation and miniaturization are becoming increasingly important in order to increase scale, increase reproducibility, reduce hands-on time, and reduce costs, it was important to determine whether SSsc maintains high performance using automated liquid handlers, even at smaller reagent volumes, compared with manual methods. The Formulatrix MANTIS and SPT Labtech mosquito HV, 2 of the most commonly used liquid handlers in the single-cell RNA-seq community, were used to compare SSsc’s sensitivity at full volume (FV) versus the commonly used miniaturized volume for each instrument: quarter volume (Quarter) on the MANTIS and one-eighth volume (Eighth) on the mosquito HV.
For the tests on the MANTIS Liquid Handler, cells from the cell line GM12878 (B lymphocytes) were processed in either FV (standard) or Quarter of all components of the SSsc kit (Fig. 1A). The distribution of read types between the 2 methods of volume processing were comparable (Fig. 4A). There was also no statistical difference between the FV and Quarter processing for gene count, which was a median value >9600 for both preparations (Fig. 4B; P > 0.05). The distribution of the ρ values is consistent and overlapping for both preparation types (Fig. 4B), with median ρ values for FV = 0.814 and Quarter = 0.800. The median ρ for all possible comparisons of the FV versus Quarter samples was 0.787.
For tests on the mosquito HV liquid handler, 384-well plates of dispensed single CHO cells were tested (Fig. 1A). These CHO cells were processed with the SSsc chemistry at FV (standard) and Eighth. Although exonic-mapped reads were greater for the FV samples relative to the Eighth protocol (Fig. 5A), there was no significant difference between the gene counts, with both methods showing a median value of >16,000 genes identified (Fig. 5B; P > 0.05). The distribution of the ρ values is consistent and overlapping for both preparation types (Fig. 5C), with median ρ values on the mosquito HV for FV = 0.606 and Eighth = 0.590. The median ρ for all possible comparisons of the FV versus Eighth volume samples was 0.592.
Unraveling the biology that underlies single-cell transcriptomes is critical to a full understanding of the multitude of cells that make up complex organisms and, by extension, complex diseases. The data in this paper indicate that the SSsc PLUS produces high-quality RNA-sequencing libraries from single cells. This kit bundles cDNA-generation and library-preparation reagents into a single method that was shown to perform better than the most popular cDNA-generation homebrew protocol and to be comparable with the popular library preparation method. In addition, SSsc maintains its high sensitivity even when adapted to automated workflows, which is a requirement that is increasingly important to large-scale sample processing laboratories.
SSsc outperforms SS2,[6] the popular homebrew full-length method, and SS3,[7] the update to SS2. Higher exonic counts were found for SSsc versus SS2, and although comparable exonic counts were seen between SSsc and SS3, the SSsc chemistry provided greater sensitivity and reproducibility than both SS2 and SS3 did. Homebrew protocols are attractive to researchers because of a reduced per-unit price, but these results argue that although the price point may initially seem appealing, there are high negative costs that relate to reproducibility and performance. SSsc provides a highly dependable kit to get the most out of single-cell studies that involve precious samples.
The quality of the sequencing library preparation can significantly affect confidence in the resulting data quality, which is particularly true for scRNA-seq studies. The new SSlp method of SSsc PLUS shows 5× greater yield than what could be achieved with the optimized SSsc-compatible Nextera XT protocol. Also, SSlp demonstrated the same great reproducibility and quality expected from Nextera XT. Of more interest are the greater gene counts (i.e., sensitivity) seen with the SSlp samples relative to those prepared with Nextera XT. SSsc PLUS provides an output with the continued, expected, exceptional reproducibility and high sensitivity.
A full-length, PLATE-seq, scRNA-seq workflow that can decrease required hands-on time, reduce human error, lower the cost of reagents, and increase the throughput is greatly needed by core laboratories, consortia, and other groups processing large numbers of samples. A way to address this is translating the chemistry to automation. However, ease of use cannot supersede the quality of output, especially at miniaturized volumes. Using instrumentation to increase throughput and reproducibility while reducing hands-on time and costs is only useful if the performance of the original chemistry is maintained. The data presented using 2 widely used liquid handlers, the MANTIS and mosquito HV, indicate that SSsc chemistry is both compatible with automation and miniaturization, maintaining high and consistent reproducibility and sensitivity compared with manual methods.
Extracting meaningful biological information from single cells and the small amount of mRNA present in each is critical for true understanding of the heterogeneity that underlies normal and disease-related biology. A deeper understanding of cell states and transcriptional networks underlying them requires more sensitive techniques that can detect more genes and enable analysis across the full length of a transcript to aid in the identification of alternative splicing or pathological changes, such as gene fusions. The cDNA extraction and library preparation results generated from SSsc/SSsc PLUS compared to other popular scRNA-seq protocols demonstrate that it is the superior option for producing high-quality, reproducible and highly sensitive scRNA-seq data. This, plus the ability to automate the SSsc workflow at both full and miniaturized volumes without loss of sensitivity, allows for the deep interrogation of individual cells within a population of interest more quickly and on a larger scale as demanded by today’s researchers.