Multi Caller
Overview
The DRAGEN offering encompasses a multitude of bioinformatics tools and allows for rapid end-to-end analysis of NGS data. The most common workflow is running FASTQ data through the DRAGEN map/align component and streaming directly to the small variant caller. This eliminates the need for a user to construct a workflow from off-the-shelf tools, dealing with interfaces, unfortunate incompability issues, and external library dependencies. In this section, we expand on the capabilities of DRAGEN to ease the workflow needs of common bioinformatics analyses.
Component Model
Most components in DRAGEN can be enabled or disabled independently. These are controlled by enable-<component>
flags on the command line. Based on which components are enabled, DRAGEN will resolve any inconsistencies (if applicable) and construct the desired workflow. Where possible, DRAGEN will run components in parallel to save time and compute costs. Some examples of the top level options are listed here:
enable-map-align
enable-sort
enable-duplicate-marking
enable-variant-caller
enable-cnv
enable-sv
Each component has its own set of options which are used to configure the behavior of the component. These options typically control specific input settings, internal algorithm parameters, or output files and filtering criteria. Refer to the individual component sections for more details. As an example, a different BED file may be provided separately for each caller:
cnv-target-bed
sv-call-regions-bed
vc-target-bed
Additionally, some options are shared amongst callers, such as output-directory
and sample-sex
. Each variant caller will also produce its own set of VCFs and metric output files.
Input Formats
DRAGEN accepts the following common standard NGS input formats:
FASTQ (
fastq-file1
andfastq-file2
)FASTQ List (
fastq-list
)BAM (
bam-input
)CRAM (
cram-input
)
Somatic workflows can use tumor equivalent input files (eg, tumor-bam-input
).
When running from unaligned reads, the reads first go through the map/align component to produce alignments which continue downstream to the variant callers. When running from prealigned reads, the user has the choice to re-align with the DRAGEN map/align component or to use the existing alignments from the source input. It is common to run with enable-map-align false
if you already have DRAGEN alignments available in BAM or CRAM format.
Multicaller Command Line
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work. In this section we outline some best practices for doing so.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Build up the necessary options for each component separately, so that it can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Germline
The following table summarizes the support for different input formats and variant callers.
CNV+SNV
Supported
Supported
Supported
CNV+SV
Supported
Supported
Supported
SNV+SV
Supported
Supported
Supported
CNV+SNV+SV
Supported
Supported
Supported
For brevity, other features and callers are not listed in the table even though they may be supported. Examples include repeat genotyping, SMA, CYP2D6, and ploidy calling. DRAGEN can run all germline callers for WGS analysis in a single command line (CNV + SNV + SV + ...). Similar support also exists for WES analysis, if the component is supported in single caller mode and there is no conflict with the input configurations.
Somatic
The somatic workflows can be constructed in a similar manner by specifying tumor and normal inputs. The need for potentially two input files (tumor and matched normal) as well as the need for a matched normal SNV VCF for the Somatic CNV caller means extra care has to be taken.
One recommended tumor/normal workflow first starts with running the matched normal through the Germline Workflow.
Run matched normal through Germline workflow (CNV + SNV + SV + ...). This is required to first generate the matched normal SNV VCF. See the Somatic CNV section for more details.
Run tumor and matched normal through Somatic workflow (CNV + SNV + SV + ...)
Optionally, a full tumor/normal analysis can be done in a single execution if both the SNV and CNV modules are enabled, by leveraging the BAF information directly from the small variant caller. See the Somatic CNV section for more details. In brief, this requires the use of --enable-variant-caller true
and --cnv-use-somatic-vc-baf true
.
The following table lists the various combinations that are supported under the tumor/normal mode of operation.
CNV+SNV
Supported
Supported
Not Supported
CNV+SV
Supported
Supported
Not Supported
SNV+SV
Supported
Supported
Not Supported
CNV+SNV+SV
Supported
Supported
Not Supported
Running in tumor only mode just requires removing the matched normal input from the INPUT
options and configuring each individual caller to run in tumor only mode (for example, CNV uses a population B-allele VCF instead of the matched normal SNP VCF).
The following table lists the combinations that are supported under the tumor only mode of operation.
CNV+SNV
Supported
Supported
Supported
CNV+SV
Supported
Supported
Supported
SNV+SV
Supported
Supported
Supported
CNV+SNV+SV
Supported
Supported
Supported
These modes are for WGS analysis. Similar support also exists for WES analysis, if the mode is supported in single caller mode and there is no conflict in the input configurations. For WES analysis, note that CNV requires a panel of normals regardless of whether it is Tumor Normal or Tumor Only analysis.
Last updated