scMDR – learning cell annotation under multiple reference datasets by multisource domain adaptation

Accurate and efficient cell type annotation is essential for single-cell sequence analysis. Currently, cell type annotation using well-annotated reference datasets with powerful models has become increasingly popular. However, with the increasing amount of single-cell data, there is an urgent need to develop a novel annotation method that can integrate multiple reference datasets to improve cell type annotation performance. Since the unwanted batch effects between individual reference datasets, integrating multiple reference datasets is still an open challenge. To address this, researchers at the Nanjing University of Science and Technology have developed scMDR and scMultiR, respectively, using multisource domain adaptation to learn cell type-specific information from multiple reference datasets and query cells. Based on the learned cell type-specific information, scMDR and scMultiR provide the most likely cell types for the query cells. Benchmark experiments demonstrated their state-of-the-art effectiveness for integrative single-cell assignment with multiple reference datasets.

Liu Y, Yan H, Shen LC, Yu DJ. (2022) Learning Cell Annotation under Multiple Reference Datasets by Multisource Domain Adaptation. J Chem Inf Model [Epub ahead of print]. [abstract]