RNA Variant Calling

RNA Variant Calling

DRAGEN RNA variant calling uses the DRAGEN Somatic Small Variant Caller to call SNVs and indels. DRAGEN uses somatic variant calling to account for nongermline variant allele frequencies in RNA-seq data caused by differential expression. To perform variant calling, DRAGEN uses a probability model that weighs the evidence of a real variant against evidence for various noise models. If the quality score for a variant exceeds a certain threshold, then the variant is reported in the output VCF with the PASS label. DRAGEN also applies filters, such as weak_evidence and base_quality, that might indicate if the variant does not reach the thresholds required to qualify as a passing call. For more information on DRAGEN DNA somatic variant calling, see Somatic Mode.

DRAGEN RNA also supports forced genotyping (ForceGT). A ForceGT VCF that contains variants of interest can be provided to DRAGEN RNA VC, and the output VCF will contain all variants from the input with annotation. ForceGT might be unable to accurately call complex variants or variants with long deletions (> 50 bp). Complex variants are variants that require more than one substitution, insertion, or deletion event to transform the REF allele into the ALT allele.

DRAGEN RNA does not attempt to accurately genotype variants as heterozygous or homozygous (since it uses the DRAGEN somatic caller and somatic variants do not normally fall into those classes). Instead, a heuristic is applied based on the variant allele frequency: if the AF is at least 85%, then the GT field will be set to 1/1. Otherwise GT will always be reported as 0/1. This behavior and threshold can be adjusted with the following options:

--rna-vc-enable-homozygous-genotype (default=true)
--rna-vc-homozygous-genotype-af-threshold (default=0.85)

Input Options

You can use a FASTQ, BAM, or CRAM file as input. Optionally, you can provide a GTF annotation file for more accurate split junction mapping.

Use the following command line options for FASTQ input files.

--fastq-file1=<fastq1_file> \
--fastq-file2=<fastq2_file> \
--RGID=<read_group_id> \
--RGSM=<read_group_sample_name> \

Use the following command line options for a list of FASTQ input files.

--fastq-list=<fq_list_file> \

--fastq-list-sample-id=<sample_id>

Use the following command line options for a BAM input file.

--bam-input=<bam_file> \

--enable-map-align=false \

--enable-sort=false \

--enable-duplicate-marking=false

Run RNA Variant Calling

To enable RNA variant calling, set --enable-rna and --enable-variant-caller to true. To enable ForceGT, use --vc-forcegt-vcf <forcegt_vcf_file>.

RNA variant calling outputs a VCF file that includes PASS variants and variants that did not pass, due to filters or weak evidence. For more information on filters and additional command line options, see Somatic Mode.

(NOTE: gVCF mode is not supported with RNA variant calling.)

The following is an example RNA variant calling command line.

dragen \
--fastq-file1=<fastq1_file> \
--fastq-file2=<fastq2_file> \
--RGID=<read_group_id> \
--RGSM=<read_group_sample_name> \
--enable-duplicate-marking=true \
--dupmark-version=hash \
--enable-rna=true \
--enable-variant-caller=true \
--ref-dir=<ref_hashtable_dir> \
--output-directory=<output_dir> \
--output-file-prefix=<output_prefix> \
--annotation-file=<gtf_annotation_file> \
--vc-forcegt-vcf=<forcegt_vcf_file> 

Running RNA Variant Calling with other RNA workflows (e.g. RNA quantification and RNA fusion calling)

RNA quantification and/or fusion calling can be performed along with RNA variant calling by adding the appropriate option(s) in addition to --enable-rna=true and --enable-variant-caller=true.

Options:

  • RNA quantification: --enable-rna-quantification=true

  • RNA gene fusion calling: --enable-rna-gene-fusion=true

For more information and options related to RNA quantification and fusion calling, see those sections of the user guide.

Last updated