RNA WTS

DRAGEN Recipe - RNA Whole Transcriptome Sequencing (WTS)

Overview

This recipe is for processing Whole Transcriptome Sequencing data for RNA workflows.

Example Command Line

For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.

  • Configure the INPUT options

  • Configure the OUTPUT options

  • Configure the RNA MAP/ALIGN options

  • Configure the QUANT options

  • Configure the SPLICE options

  • Configure the FUSION options

  • Configure the VARIANT options

We recommend using a linear (non-pangenome) reference for RNA analysis. For more details, refer to Dragen Reference Support.

The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.

#!/bin/bash
set -euo pipefail

# Path to DRAGEN hashtable
DRAGEN_HASH_TABLE=<REF_DIR>

# Path to output directory for the DRAGEN run
OUTPUT=<OUT_DIR>

# File prefix for DRAGEN output files
PREFIX=<OUT_PREFIX>

# Define the input sources, select fastq list, fastq, bam, or cram.
INPUT_FASTQ_LIST="
  --fastq-list $FASTQ_LIST \
  --fastq-list-sample-id $FASTQ_LIST_SAMPLE_ID \
"

INPUT_FASTQ="
  --fastq-file1 $FASTQ1 \
  --fastq-file2 $FASTQ2 \
  --RGSM $RGSM \
  --RGID $RGID \
"

# You could use the tumor fastq options to provide the FASTQ files.
INPUT_TUMOR_FASTQ="
  --tumor-fastq1 $FASTQ1 \
  --tumor-fastq2 $FASTQ2 \
  --RGSM $RGSM \
  --RGID $RGID \
"

INPUT_BAM="
  --bam-input $BAM \
"

INPUT_CRAM="
  --cram-input $CRAM \
"

# Select input source, here in this example we use INPUT_FASTQ_LIST
INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  $INPUT_FASTQ_LIST \
"

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
"

# RNA aligner requires an annotation file in GTF or GFF3 format.
GTF=<GTF_PATH>

# RNA pipeline requires map-align to be true.
RNA_MAP_OPTIONS="
  --enable-rna true \
  --enable-map-align true \
  --annotation-file $GTF \
"

# You should set the library according to the read orientations.
# The options are IU, ISR, ISF, U, SR, or SF. Or set it to A to automatically detect the correct read orientation.
QUANT_OPTIONS="
  --enable-rna-quantification true \
  --rna-library-type IU \
  --rna-quantification-gc-bias true \
"

SPLICE_OPTIONS="
  --enable-rna-splice-variant true \
"

FUSION_OPTIONS="
  --enable-rna-gene-fusion true \
"

# To call variants, you need to set a bed file with target regions to call. 
# This bed could contain all exones.
VARIANT_OPTIONS="
  --enable-variant-caller true \
  --vc-target-bed $TARGET_BED
"

# Construct final command line
CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $RNA_MAP_OPTIONS \
  $QUANT_OPTIONS \
  $SPLICE_OPTIONS \
  $FUSION_OPTIONS \
  $VARIANT_OPTIONS 
"

# Execute
echo $CMD
bash -c $CMD

Additional Notes and Options

For SPLICE options, you can provide a list of normal slice variants to reduce noisy calls. The file should be a tab separated file with the following first four columns:

  • contig name

  • first base of the splice junction (1-based)

  • last base of the splice junction (1-based)

  • strand (0: undefined, 1: +, 2: -) Use the optional option --rna-splice-variant-normals <SPLICE_NORMAL_FILE_PATH> to provide the normal splice variants.

Last updated