LeukemiaDB – a comprehensive human leukemia transcriptome database

Google+ Pinterest LinkedIn Tumblr +

A recent study aggregated RNA sequencing data from more than 3000 samples to create a user-friendly database of transcription profiles in leukemia and related cell lines.

Researchers aggregated RNA sequencing data for 14 leukemia subtypes and 53 leukemia-related cell lines to develop a user-friendly database of key transcriptomic characteristics, in a recent study published in Blood Advances.

The origin and differentiation stage of leukemia are typically how the disease is classified, but the varied transcriptomic profiles and clinical characteristics seen in different subtypes suggest there are different mechanisms underlying leukemogenesis at the transcription level.

“Thus, an integrated analysis of leukemia expression data can provide comprehensive insights for elucidating the common and specific regulatory mechanisms among different leukemia subtypes,” the authors wrote.

While previous studies have shown certain transcriptomic characteristics to play crucial roles in leukemia progression, many have focused on just one subtype or a limited number of subtypes. In the new study, 3036 samples from 14 leukemia subtypes and 53 related cell lines were analyzed and the findings were developed into a comprehensive, user-friendly source of information for the research community.

LeukemiaDB, the data repository created by the authors, encompasses 5 main modules that provide insight into important RNA features at the transcriptional level that are thought to play key roles in leukemia development or progression based on current research. These features include protein-coding genes, long noncoding RNA (lncRNA), circular RNA (circRNA), alternative splicing (AS), and fusion genes. The database integrates information on expression levels, regulatory modules, and molecular information related to each subtype or cell line.

LeukemiaDB website


(A) LeukemiaDB’s homepage. (B) Six LeukemiaDB modules (Protein-coding genes, Fusion genes, Alternative splicing, LncRNA, CircRNA, and TCGA-LAML) for integrated analysis of public RNA-Seq data.

The authors also analyzed the data within LeukemiaDB to explore different expression characteristics and RNA molecule variants in the various leukemia subtypes and cell lines. Approximately 20,000 protein-coding genes; 60,000 lncRNAs; 8882 circRNAs; 5 AS event types, and various fusion genes were found in the dataset.

Different subtypes of leukemia or related cell lines were found to have similar expression distribution of protein-coding genes and lncRNA in the study, and some of the AS events detected were shared among most of the leukemia subtypes. The data also showed some protein-coding genes and fusion genes to be involved in leukemogenesis. Certain highly correlated regulatory modules were also found in different subtypes of leukemia.

Overall, the aggregated data provided significant insight into the oncogenesis and progression of leukemia, and LeukemiaDB is the most comprehensive resource of its kind to the authors’ knowledge.

“Collectively, we have provided comprehensive and multidimensional transcriptome profiles (protein-coding genes, lncRNAs, circRNAs, AS events, and fusion genes) of leukemia patients and leukemia cell lines,” the authors concluded. “In the future, we plan to update the LeukemiaDB data portal along with the massive cohorts released, and further analysis will be performed to solve urgent problems in the field of leukemia research.”



Luo M, Miao Y, Ke Y, Guo AY, Zhang Q. (2023) A comprehensive landscape of transcription profiles and data resource for human leukemia. Blood Adv [Epub ahead of print]. [abstract]