High temporal resolution Nanopore sequencing dataset of SARS-CoV-2 and host cell RNAs

Google+ Pinterest LinkedIn Tumblr +


Recent studies have disclosed the genome, transcriptome, and epigenetic compositions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the effect of viral infection on gene expression of the host cells. It has been demonstrated that, besides the major canonical transcripts, the viral genome also codes for noncanonical RNA molecules. While the structural characterizations have revealed a detailed transcriptomic architecture of the virus, the kinetic studies provided poor and often misleading results on the dynamics of both the viral and host transcripts due to the low temporal resolution of the infection event and the low virus/cell ratio (multiplicity of infection [MOI] = 0.1) applied for the infection. It has never been tested whether the alteration in the host gene expressions is caused by aging of the cells or by the viral infection.

University of Szeged researchers used Oxford Nanopore‘s direct cDNA and direct RNA sequencing methods for the generation of a high-coverage, high temporal resolution transcriptomic dataset of SARS-CoV-2 and of the primate host cells, using a high infection titer (MOI = 5). Sixteen sampling time points ranging from 1 to 96 hours with a varying time resolution and 3 biological replicates were used in the experiment. In addition, for each infected sample, corresponding noninfected samples were employed. The raw reads were mapped to the viral and to the host reference genomes, resulting in 49,661,499 mapped reads (54,62 Gbs). The genome of the viral isolate was also sequenced and phylogenetically classified.

Schematic representation of the workflow applied in this project

Schematic representation of the workflow applied in this project. (A) Isolation and detection of a Hungarian isolate of the SARS-CoV-2 virus. The sample was collected from a human nasopharyngeal swab. The SARS-CoV-2 infection was validated by reverse transcription PCR using the RNA extracted from the sample. The virus was isolated from the sample and was maintained on Vero cells. (B) Experimental workflow of the study. Vero cells were infected with SARS-CoV-2 and the cells were incubated at 37°C for 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 36, 48, 72, and 96 hours post infection. Uninfected control cells were also propagated. Each time-point experiment was carried out in 3 biological replicates. RNAs were purified from the samples, which was followed by the preparation of libraries and then sequencing using direct cDNA and direct RNA methods. Altogether, 9 MinION flow cells (ONT) were used for this study. (C) Bioinformatics workflow. The ONT's Guppy basecaller was used to identify the base sequence of the obtained reads, and then they were aligned to the viral and host reference genomes by using the minimap2 mapper. Statistical data were generated with seqtools [25] and a custom R-workflow [33]. (D) Quality of RNA samples was detected with a TapeStation 2200 system with RNA ScreenTape. TapeStation gel image shows that intact, high-quality RNAs were isolated from the samples and used for sequencing. The image shows the following samples: A1: marker; B1: 8-hour postinfection (pi) sample C; 12-hour pi sample A; 16-hour pi sample A; 18-hour pi sample B, 20-hour pi sample C; 36-hour pi sample A; 48-hour pi sample A; 96-hour pi sample B.

(A) Isolation and detection of a Hungarian isolate of the SARS-CoV-2 virus. The sample was collected from a human nasopharyngeal swab. The SARS-CoV-2 infection was validated by reverse transcription PCR using the RNA extracted from the sample. The virus was isolated from the sample and was maintained on Vero cells. (B) Experimental workflow of the study. Vero cells were infected with SARS-CoV-2 and the cells were incubated at 37°C for 1, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 24, 36, 48, 72, and 96 hours post infection. Uninfected control cells were also propagated. Each time-point experiment was carried out in 3 biological replicates. RNAs were purified from the samples, which was followed by the preparation of libraries and then sequencing using direct cDNA and direct RNA methods. Altogether, 9 MinION flow cells (ONT) were used for this study. (C) Bioinformatics workflow. The ONT’s Guppy basecaller was used to identify the base sequence of the obtained reads, and then they were aligned to the viral and host reference genomes by using the minimap2 mapper. Statistical data were generated with seqtools [25] and a custom R-workflow [33]. (D) Quality of RNA samples was detected with a TapeStation 2200 system with RNA ScreenTape. TapeStation gel image shows that intact, high-quality RNAs were isolated from the samples and used for sequencing. The image shows the following samples: A1: marker; B1: 8-hour postinfection (pi) sample C; 12-hour pi sample A; 16-hour pi sample A; 18-hour pi sample B, 20-hour pi sample C; 36-hour pi sample A; 48-hour pi sample A; 96-hour pi sample B.

This dataset can serve as a valuable resource for profiling the SARS-CoV-2 transcriptome dynamics, the virus-host interactions, and the RNA base modifications. Comparison of expression profiles of the host gene in the virally infected and in noninfected cells at different time points allows making a distinction between the effect of the aging of cells in culture and the viral infection. These data can provide useful information for potential novel gene annotations and can also be used for studying the currently available bioinformatics pipelines.


Tombácz D, Dörmő Á, Gulyás G, Csabai Z, Prazsák I, Kakuk B, Harangozó Á, Jankovics I, Dénes B, Boldogkői Z. (2022) High temporal resolution Nanopore sequencing dataset of SARS-CoV-2 and host cell RNAs. Gigascience 11:giac094. [article]
Share.