DRAGEN RNA Pipeline
DRAGEN includes an RNA-seq (splice-aware) aligner, as well as RNA specific analysis components for gene expression quantification, gene fusion detection, splice variant calling, and small variant calling. All these analysis components require the aligner to be enabled.

Most of the functionality and options described in Host Software Options and DNA Mapping also apply to RNA applications. Additional RNA-specific aspects are described in this section.
Input Files
To pass in reads, you can use a FASTQ, BAM, or CRAM file as input. Use the following command line options for FASTQ input files.
Use the following command line options for a list of FASTQ input files.
Use the following command line options for a BAM input file.
Gene Annotation File
In addition to the standard input files (reads from fastq or bam, reference genome, etc.), DRAGEN can take a gene annotations file as input. A gene annotations file aids in the alignment of reads to known splice junctions and is required for gene expression quantification, splice variant calling, and gene fusion calling. For human data, the annotation files used for validation and benchmarking (associated with the hg19, hg38 and hs37d5 assemblies) are available for download at: DRAGEN Software Support site page
To specify a gene annotation file, use the -a (--annotation-file) command line option. The input file must conform to the GTF or GFF3 specifications, including the following requirements:
Each gene record must include a gene_id attribute
Each transcript record must include a transcript_id attribute
If the annotation (GTF only) file does not include genes, their identities will be inferred from transcript records, which must include a gene_id attribute. If it is missing both genes and transcripts, their identities will be inferred from exon records, which must include gene_id and transcript_id attributes.
If the annotation is in GFF3 format, the feature hierarchy is described explicitly. Gene records must have an ID attribute, and transcript and exon records mush have ID and Parent attributes. Those are required in addition to gene_id and transcript_id attributes.
An example of a valid GTF file is shown below.
An example of a valid GFF3 file is shown below.
For proper handling of genes in the PAR regions of chromosome X and Y, it is required that the gene_id attribute of all exons of the same gene be distinct between the two chromosomes, in order to distinguish exons within the PAR region of chromosome X from the ones within the PAR region of chromosome Y. That is, it is often the case that the gene_id of all exons of a transcript from geneA is equal to gene_id=geneA in chromosome X, and gene_id=geneA_PAR_Y in chromosome Y. This allows the annotation parser and downstream components to discriminate data associated with PAR genes in chromosome X from data associated with the same PAR genes in chromosome Y.
The DRAGEN host software parses the file for exons within the transcripts and produces splice junctions. After parsing is complete, it prints out the number of splice junctions detected.
The splice junctions that are detected from the annotation file are also written to *.sjdb.annotations.out.tab. Splice junctions below a minimum length are excluded, which helps filter annotation artifacts. This minimum annotation splice junction length is controlled by the --rna-ann-sj-min-len option, which has a default value of 6.
Annotation parser options
The user can modify the behavior of the annotation parser using the following optional arguments:
--annotation-gene-features
Gene records to process based on the 3rd column. If not specified, only records named "gene" will be process. Use a comma-separated list (case-insensitive) to specify allowed values, e.g., "gene,pseudogene".
--annotation-transcript-features
Transcript records to process based on the 3rd column. If not specified, the following records (from the RefSeq annotation) will be processed: transcript, primary_transcript, pseudogenic_transcript, unconfirmed_transcript, processed_transcript, mrna, mirna, snrna, snorna, ncrna, scrna, rrna, trna, telomerase_rna, antisense_rna, vault_rna, v_gene_segment, d_gene_segment, j_gene_segment, c_gene_segment, y_rna, rnase_mrp_rna, rnase_p_rna, lnc_rna. Use a comma-separated list (case-insensitive) to specify allowed values, e.g., "transcript,primary_transcript,mrna". If the parent gene of a transcript is excluded, the transcript will be excluded as well, even if its type is in the allowed list. Note: in most annotation files, only "transcript" is used as a feature type.
--annotation-exon-features
Exon records to process based on the 3rd column. If not specified, only records named "exon" will be process. Use a comma-separated list (case-insensitive) to specify allowed values, e.g., "CDS,start_codon,stop_codon". If the parent gene or parent transcript of an exon are excluded, the exon will be excluded as well, even if its type is in the allowed list.
--annotation-bed-file
Name of a .BED file with allowed ranges for parsing. Only features that overlap with any of the ranges described in the file will be included. Default: none.
--annotation-min-transcript-len
Restrict annotated features to transcripts of at least this number of bases in length.
--annotation-max-intron-len
Restrict annotated features to exclude transcripts with introns longer than this number of bases.
Two-Pass Splice-junction Alignment
Instead of using a GTF file for annotated splice junctions, the DRAGEN software is also capable of reading in an SJ.out.tab file (see SJ.out.tab). This file enables DRAGEN to run in a two-pass mode, where the splice junctions discovered in the first pass (output as SJ.out.tab file) are used to guide the mapping and alignment reads during a second run through DRAGEN. This mode of operation is useful to increase sensitivity for spliced alignments in cases when a gene annotations file is not readily available for the target genome. If a well curated GTF is already availble for your target genome, then there is no need to run a second pass with the SJ.out.tab.
Please be aware that depending on the characteristics of the input file (i.e. read depth and distribution) the second pass using the first pass SJ.out.tab may take longer than the first pass.
NOTE: Components downstream of aligner like gene expression quantification, gene fusion detection, and splice variant caller require GTF file as the input annotations file and are NOT compatible with two-pass splice-junction alignment mode.
Last updated
Was this helpful?