Researchers develop RNA-Seq analysis tool for the safer design of gene editing

Google+ Pinterest LinkedIn Tumblr +


Scheme of safety evaluation by DANGER analysis. When off-target regions are affected by genome editing, unexpected changes in the mRNA quantity and sequence can emerge. The DANGER analysis is designed to analyze the effects from RNA-seq data at the Gene Ontology (GO) level.

Genome editing, or gene editing, refers to technologies that allow researchers to change the genomic DNA of an organism. With these technologies, researchers can add, remove or alter genetic material in the genome.

CRISPR-Cas9 is a well-known gene editing technology. It has a reputation for being more accurate, faster, and less expensive than other similar technologies. However, gene editing using CRISPR technology presents some challenges. The first challenge is that the phenotypic, or observable, effects caused by unexpected CRISPR dynamics are not quantitatively monitored.

A second challenge is that the CRISPR technology generally depends on basic genomic data, including the reference genome. The reference genome is like a template that provides researchers with general information on the genome. Unexpected sequence editing with mismatches can occur. These off-target sites are always unexpected. So researchers need a way to observe factual genomic sequences and limit potential off-target effects.

“The design of genome editing requires a well-characterized genomic sequence. However, the genomic information of patients, cancers, and uncharacterized organisms is often incomplete,” said Kazuki Nakamae, an assistant professor of the PtBio Collaborative Research Laboratory at the Genome Editing Innovation Center, Hiroshima University.

The research team set out to devise a method to deal with the issues of the phenotypic effects and the dependence on a reference genome. The team’s DANGER analysis software overcomes these challenges. The team used gene-edited samples of human cells and zebrafish brains to conduct their risk-averse on- and off-target assessment in RNA-sequencing data.

The team demonstrated that the DANGER analysis pipeline achieves several goals. It detected potential DNA on- and off-target sites in the mRNA-transcribed region on the genome using RNA-sequencing data. It evaluated phenotypic effects by deleterious off-target sites based on the evidence provided by gene expression changes. It quantified the phenotypic risk at the gene ontology term level, without a reference genome. This success showed that DANGER analysis can be performed on various organisms, personal human genomes, and atypical genomes created by diseases and viruses.

Overview of DANGER analysis and on-target region
constructed by de novo transcriptome assembly

Overview of DANGER analysis and on-target region constructed by de novo transcriptome assembly. (A) Bioinformatic workflow of DANGER analysis. Our analysis requires RNA-seq data derived from WT and edited (each n ≥ 3). DANGER analysis has two steps in the workflow: (i) de novo transcriptome assembly (upper background box) and (ii) annotation analysis (lower background box). The de novo transcriptome assembly step is processed with Trinity and preprocessing tools, such as cutadapt and bbduk.sh. Crisflash performs the search of on/off-target sequences. The RSEM quantifies gene expression in edited RNA-seq samples in comparison to the WT de novo transcriptome (dot allow). The step of annotation analysis was involved processing with TransDedoder, ggsearch, org. XX.eg.db (e.g. org. Hs.eg.db in the transcriptome related to humans), and topGO. We implemented specific modules, colored in pink, for considering the phenotypic effect of deleterious off-targets. (B) Comparison between the hg38 reference genome and transcript sequence constructed by de novo assembly of RNA-seq samples derived from WT iPSC-derived cortical neurons on the GRIN2B on-target region. The on-target region of the hg38 reference genome is illustrated with annotations of the GRIN2B CDS, the protospacer, and the NGG PAM sequence of SpCas9. The detected GRIN2B isoforms (1–5) are lined up in the box. The Cas9–sgRNA binding sites are highlighted. (C) Genome completeness of de novo transcriptome assembly RNA-seq data derived from WT iPSC-derived cortical neurons was assessed using conserved mammal BUSCO genes (mammalia_odb10). The result was 79.1% of “complete,” 20.7% of “single-copy,” 58.4% of “duplicated,” 3.2% of “fragmented,” and 17.7% of “missing” (n = 9226).

(A) Bioinformatic workflow of DANGER analysis. Our analysis requires RNA-seq data derived from WT and edited (each n ≥ 3). DANGER analysis has two steps in the workflow: (i) de novo transcriptome assembly (upper background box) and (ii) annotation analysis (lower background box). The de novo transcriptome assembly step is processed with Trinity and preprocessing tools, such as cutadapt and bbduk.sh. Crisflash performs the search of on/off-target sequences. The RSEM quantifies gene expression in edited RNA-seq samples in comparison to the WT de novo transcriptome (dot allow). The step of annotation analysis was involved processing with TransDedoder, ggsearch, org. XX.eg.db (e.g. org. Hs.eg.db in the transcriptome related to humans), and topGO. We implemented specific modules, colored in pink, for considering the phenotypic effect of deleterious off-targets. (B) Comparison between the hg38 reference genome and transcript sequence constructed by de novo assembly of RNA-seq samples derived from WT iPSC-derived cortical neurons on the GRIN2B on-target region. The on-target region of the hg38 reference genome is illustrated with annotations of the GRIN2B CDS, the protospacer, and the NGG PAM sequence of SpCas9. The detected GRIN2B isoforms (1–5) are lined up in the box. The Cas9–sgRNA binding sites are highlighted. (C) Genome completeness of de novo transcriptome assembly RNA-seq data derived from WT iPSC-derived cortical neurons was assessed using conserved mammal BUSCO genes (mammalia_odb10). The result was 79.1% of “complete,” 20.7% of “single-copy,” 58.4% of “duplicated,” 3.2% of “fragmented,” and 17.7% of “missing” (n = 9226).

The DANGER analysis pipeline identifies the genomic on- and off-target sites based on de novo transcriptome assembly using RNA-sequencing data. A transcriptome includes a collection of all the active gene readouts in a cell. With de novo transcriptome assembly, the transcriptome is assembled without the help of a reference genome. Next, the DANGER analysis identifies the deleterious off-targets. These are off-targets on the mRNA-transcribed regions that represent the downregulation of expression in edited samples compared to wild-type ones. Finally, the software quantifies the phenotypic risk using the gene ontology of the deleterious off-targets.

“Our DANGER analysis is a novel software that enables quantifying phenotypic effects caused by estimated off-target. Furthermore, our tool uses de novo transcriptome assembly whose sequences can be built from RNA-sequencing data of treated samples without a reference genome,” said Hidemasa Bono, a professor at the Genome Editing Innovation Center, Hiroshima University.

Looking ahead, the team hopes to expand their research using the DANGER analysis.

“We will apply the software to various genome editing samples from patients and crops to clarify the phenotypic effect and establish safer strategies for genome editing,” said Nakamae.

DANGER analysis is open-source and freely adjustable. So the algorithm of this pipeline could be repurposed for the analysis of various genome editing systems beyond the CRISPR-Cas9 system. It is also possible to enhance the specificity of DANGER analysis for CRISPR-Cas9 by incorporating CRISPR-Cas9-specific off-target scoring algorithms. The team believes that the DANGER analysis pipeline will expand the scope of genomic studies and industrial applications using genome editing.

SourceHiroshima University

Availability – The Script for the DANGER analysis pipeline is available at https://github.com/KazukiNakamae/DANGER_analysis.


Nakamae K, Bono H. (2023) DANGER analysis: risk-averse on/off-target assessment for CRISPR editing without a reference genome. Bioinform Adv 3(1):vbad114. [article]
Share.