Multi Caller

Overview

The DRAGEN offering encompasses a multitude of bioinformatics tools and allows for rapid end-to-end analysis of NGS data. The most common workflow is running FASTQ data through the DRAGEN map/align component and streaming directly to the small variant caller. This eliminates the need for a user to construct a workflow from off-the-shelf tools, dealing with interfaces, unfortunate incompability issues, and external library dependencies. In this section, we expand on the capabilities of DRAGEN to ease the workflow needs of common bioinformatics analyses.

Component Model

Most components in DRAGEN can be enabled or disabled independently. These are controlled by enable-<component> flags on the command line. Based on which components are enabled, DRAGEN will resolve any inconsistencies (if applicable) and construct the desired workflow. Where possible, DRAGEN will run components in parallel to save time and compute costs. Some examples of the top level options are listed here:

  • enable-map-align

  • enable-sort

  • enable-duplicate-marking

  • enable-variant-caller

  • enable-cnv

  • enable-sv

Each component has its own set of options which are used to configure the behavior of the component. These options typically control specific input settings, internal algorithm parameters, or output files and filtering criteria. Refer to the individual component sections for more details. As an example, a different BED file may be provided separately for each caller:

  • cnv-target-bed

  • sv-call-regions-bed

  • vc-target-bed

Additionally, some options are shared amongst callers, such as output-directory and sample-sex. Each variant caller will also produce its own set of VCFs and metric output files.

Input Formats

DRAGEN accepts the following common standard NGS input formats:

  • FASTQ (fastq-file1 and fastq-file2)

  • FASTQ List (fastq-list)

  • BAM (bam-input)

  • CRAM (cram-input)

Somatic workflows can use tumor equivalent input files (eg, tumor-bam-input).

When running from unaligned reads, the reads first go through the map/align component to produce alignments which continue downstream to the variant callers. When running from prealigned reads, the user has the choice to re-align with the DRAGEN map/align component or to use the existing alignments from the source input. It is common to run with enable-map-align false if you already have DRAGEN alignments available in BAM or CRAM format.

Multicaller Command Line

For most scenarios, simply creating the union of the command line options from the single caller scenarios will work. In this section we outline some best practices for doing so.

  • Configure the INPUT options

  • Configure the OUTPUT options

  • Configure MAP/ALIGN depending on if realignment is desired or not

  • Configure the VARIANT CALLERs based on the application

  • Build up the necessary options for each component separately, so that it can be re-used in the final command line.

The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.

#!/bin/bash
set -euo pipefail

DRAGEN_HASH_TABLE=<REF_DIR>
FASTQ1=<fastq1>
FASTQ2=<fastq2>
RGSM=<RGSM>
RGID=<RGID>
OUTPUT=<OUT_DIR>
PREFIX=<OUT_PREFIX>

INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  --fastq-file1 $FASTQ1 \
  --fastq-file2 $FASTQ2 \
  --RGSM $RGSM \
  --RGID $RGID \
  "

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
  "

MA_OPTIONS="
  --enable-map-align true \
  ... <any other optional settings> \
  "

CNV_OPTIONS="
  --enable-cnv true \
  ... <any other optional settings> \
  "

SNV_OPTIONS="
  --enable-variant-caller true \
  ... <any other optional settings> \
  "

SV_OPTIONS="
  --enable-sv true \
  ... <any other optional settings> \
  "

CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $MA_OPTIONS \
  $CNV_OPTIONS \
  $SNV_OPTIONS \
  $SV_OPTIONS \
  "

echo $CMD
bash -c $CMD

Germline

The following table summarizes the support for different input formats and variant callers.

GERMLINEFASTQ w/ Map/AlignBAM/CRAMBAM/CRAM w/ Map/Align

CNV+SNV

Supported

Supported

Supported

CNV+SV

Supported

Supported

Supported

SNV+SV

Supported

Supported

Supported

CNV+SNV+SV

Supported

Supported

Supported

For brevity, other features and callers are not listed in the table even though they may be supported. Examples include repeat genotyping, SMA, CYP2D6, and ploidy calling. DRAGEN can run all germline callers for WGS analysis in a single command line (CNV + SNV + SV + ...). Similar support also exists for WES analysis, if the component is supported in single caller mode and there is no conflict with the input configurations.

Somatic

The somatic workflows can be constructed in a similar manner by specifying tumor and normal inputs. The need for potentially two input files (tumor and matched normal) as well as the need for a matched normal SNV VCF for the Somatic CNV caller means extra care has to be taken.

One recommended tumor/normal workflow first starts with running the matched normal through the Germline Workflow.

  1. Run matched normal through Germline workflow (CNV + SNV + SV + ...). This is required to first generate the matched normal SNV VCF. See the Somatic CNV section for more details.

  2. Run tumor and matched normal through Somatic workflow (CNV + SNV + SV + ...)

Optionally, a full tumor/normal analysis can be done in a single execution if both the SNV and CNV modules are enabled, by leveraging the BAF information directly from the small variant caller. See the Somatic CNV section for more details. In brief, this requires the use of --enable-variant-caller true and --cnv-use-somatic-vc-baf true.

#!/bin/bash
set -euo pipefail

DRAGEN_HASH_TABLE=<REF_DIR>
TUMOR_BAM=<TUMOR_BAM>
NORMAL_BAM=<NORMAL_BAM>
OUTPUT=<OUT_DIR>
PREFIX=<OUT_PREFIX>

INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  --tumor-bam-input $TUMOR_BAM \
  --bam-input $NORMAL_BAM \
  "

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
  "

MA_OPTIONS="
  --enable-map-align false \
  ... <any other optional settings> \
  "

CNV_OPTIONS="
  --enable-cnv true \
  ... <any other optional settings> \
  "

SNV_OPTIONS="
  --enable-variant-caller true \
  ... <any other optional settings> \
  "

SV_OPTIONS="
  --enable-sv true \
  ... <any other optional settings> \
  "

CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $MA_OPTIONS \
  $CNV_OPTIONS \
  $SNV_OPTIONS \
  $SV_OPTIONS \
  "

echo $CMD
bash -c $CMD

The following table lists the various combinations that are supported under the tumor/normal mode of operation.

TUMOR NORMALFASTQ w/ Map/AlignBAM/CRAMBAM/CRAM w/ Map/Align

CNV+SNV

Supported

Supported

Not Supported

CNV+SV

Supported

Supported

Not Supported

SNV+SV

Supported

Supported

Not Supported

CNV+SNV+SV

Supported

Supported

Not Supported

Running in tumor only mode just requires removing the matched normal input from the INPUT options and configuring each individual caller to run in tumor only mode (for example, CNV uses a population B-allele VCF instead of the matched normal SNP VCF).

The following table lists the combinations that are supported under the tumor only mode of operation.

TUMOR ONLYFASTQ w/ Map/AlignBAM/CRAMBAM/CRAM w/ Map/Align

CNV+SNV

Supported

Supported

Supported

CNV+SV

Supported

Supported

Supported

SNV+SV

Supported

Supported

Supported

CNV+SNV+SV

Supported

Supported

Supported

These modes are for WGS analysis. Similar support also exists for WES analysis, if the mode is supported in single caller mode and there is no conflict in the input configurations. For WES analysis, note that CNV requires a panel of normals regardless of whether it is Tumor Normal or Tumor Only analysis.

Last updated