NASA GeneLab releases workflow for processing RNA sequencing data

Google+ Pinterest LinkedIn Tumblr +

GeneLab wrapped the RNA sequencing concensus pipeline (RCP) into a Nextflow workflow, which is a workflow manager that enables reproducible and scalable workflows using software containers such as Docker and Singularity. The workflow includes 3 subworkflows:

  • The Analysis Staging subworkflow – which extracts the metadata and raw reads files needed for processing from the Open Science Data Repository (OSDR)
  • The RNAseq Consensus Pipeline subworkflow – which processes those data using the RCP
  • The V&V Pipeline subworkflow – which performs validation and verification (V&V) on the processed data files in real-time

The workflow is available for download, along with installation instructions, on the GeneLab Data Processing (DP) GitHub Repository and is used by the GeneLab data processing team to process RNAseq data hosted on the OSDR. Thus far the workflow has been used to process RNAseq data from numerous datasets including Rodent Research-4 (RR-4) quadriceps, OSD-326, RR-3 brain, OSD-352, RR-3 heart, OSD-270, RR-9 retina, OSD-255, Arabidopsis thaliana seedlings from the NASA and European Space Agency (ESA) Seedling Growth-1 project, OSD-346, A. thaliana leaves from NASA’s APEX04-EPX project, OSD-427, A. thaliana seedlings from NASA’s NNX12AN71G project, OSD-321, and A. thaliana seedlings from the NASA and ESA Seedling Growth-2 and -3 projects, OSD-313, just to name a few. You can find additional processed datasets in the processed section of the latest data releases page. As this workflow is available for download, now you too can use the RCP to process your own novel RNAseq data or to process GeneLab datasets. RNAseq data processed with the GeneLab RCP, as well as existing processed data on the OSDR, can be used for re-analysis in the context of other datasets to generate new hypotheses and knowledge about how the space environment alters gene expression. Let us know what you discover.

