Maximizing the potential of high-throughput next-generation sequencing through precise normalization

Next-generation sequencing technologies have enabled many advances across diverse areas of biology, with many benefiting from increased sample size. Although the cost of running next-generation sequencing instruments has dropped substantially over time, the cost of sample preparation methods has lagged behind. To counter this, researchers have adapted library miniaturization protocols and large sample pools to maximize the number of samples that can be prepared by a certain amount of reagents and sequenced in a single run. However, due to high variability of sample quality, over and underrepresentation of samples in a sequencing run has become a major issue in high-throughput sequencing. This leads to misinterpretation of results due to increased noise, and additional time and cost rerunning underrepresented samples.

To overcome this problem, University of California San Diego researchers present a normalization method that uses shallow iSeq sequencing to accurately inform pooling volumes based on read distribution. This method is superior to the widely used fluorometry methods, which cannot specifically target adapter-ligated molecules that contribute to sequencing output. This normalization method not only quantifies adapter-ligated molecules but also allows normalization of feature space; for example, the researchers can normalize to reads of interest such as non-ribosomal reads. As a result, this normalization method improves the efficiency of high-throughput next-generation sequencing by reducing noise and producing higher average reads per sample with more even sequencing depth.

Flowchart of experimental design

1. KAPA HyperPlus shotgun libraries are quantified using the PicoGreen fluorescence assay (ThermoFisher, Inc) and pooled to approximately equimolar fractions. 2. Pool is sequenced on Illumina’s iSeq. 3. The resulting raw reads Passing Filter (PF) is used to calculate a Loading Factor for each library, which is the ratio between the index representing the highest proportion of the total reads PF and the index of each library’s proportion of total reads PF (Illumina. [Internet]. 2019. Available from: https://www.illumina.com/content/dam/illumina-marketing/documents/systems/iseq/single-cell-library-qc-app-note-770-2019-029.PDF). This in turn scales the fluorescent quantified pooled volumes to calculate new pooling volumes. The new pooling volumes are clipped within a reasonable range for acoustic droplet ejection (typically between the range of 10 nL and 1,000 nL, using the Labcyte Echo 550). 4. Libraries are pooled using new pooling volumes. 5. The resulting read count normalized pool is sequenced on illumina’s iSeq.

Brennan C, Salido RA, Belda-Ferre P, Bryant M, Cowart C, Tiu MD, González A, McDonald D, Tribelhorn C, Zarrinpar A, Knight R. (2023) Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution. mSystems [Epub ahead of print]. [article]

Related Posts

CHEUI – prediction of m6A and m5C at single-molecule resolution

The Caenorhabditis RNA-seq Browser – a web-based application for on-demand analysis of publicly available Caenorhabditis spp. bulk RNA-sequencing data

CASi – a framework for cross-timepoint analysis of single-cell RNA sequencing data