Multi Caller
Overview
The DRAGEN offering encompasses a multitude of bioinformatics tools and allows for rapid end-to-end analysis of NGS data. The most common workflow is running FASTQ data through the DRAGEN map/align component and streaming directly to the small variant caller. This eliminates the need for a user to construct a workflow from off-the-shelf tools, dealing with interfaces, unfortunate incompability issues, and external library dependencies. In this section, we expand on the capabilities of DRAGEN to ease the workflow needs of common bioinformatics analyses.
Component Model
Most components in DRAGEN can be enabled or disabled independently. These are controlled by enable-<component>
flags on
the command line. Based on which components are enabled, DRAGEN will resolve any inconsistencies (if applicable) and construct
the desired workflow. Where possible, DRAGEN will run components in parallel to save time and compute costs. Some examples of
the top level options are listed here:
enable-map-align
enable-sort
enable-duplicate-marking
enable-variant-caller
enable-cnv
enable-sv
Each component has its own set of options which are used to configure the behavior of the component. These options typically control specific input settings, internal algorithm parameters, or output files and filtering criteria. Refer to the individual component sections for more details. As an example, a different BED file may be provided separately for each caller:
cnv-target-bed
sv-call-regions-bed
vc-target-bed
Additionally, some options are shared amongst callers, such as output-directory
and sample-sex
. Each variant caller will also produce its own set of VCFs and metric output files.
Input Formats
DRAGEN accepts the following common standard NGS input formats:
FASTQ (
fastq-file1
andfastq-file2
)FASTQ List (
fastq-list
)BAM (
bam-input
)CRAM (
cram-input
)
Somatic workflows can use tumor equivalent input files (eg, tumor-bam-input
).
When running from unaligned reads, the reads first go through the map/align component to produce alignments which continue downstream
to the variant callers. When running from prealigned reads, the user has the choice to re-align with the DRAGEN map/align component
or to use the existing alignments from the source input. By default, DRAGEN will re-map reads input in the BAM or CRAM formats. However, it is common to run with --enable-map-align false
if you wish to bypass the mapper and use the existing BAM/CRAM alignments in variant calling.
Germline and Somatic Multi-Caller Workflows
DRAGEN can run all germline callers for WGS analysis in a single command line (CNV + SNV + SV + ...) from any input type. Similar support also exists for WES analysis, if the component is supported in single caller mode and there is no conflict with the input configurations. Other features such as repeat genotyping, SMA, CYP2D6, and ploidy calling may also be enabled in the germline workflow, allowing flexibility for custom analyses.
The somatic workflows can be constructed in a similar manner by specifying tumor and normal inputs and enabling the desired variant callers. DRAGEN supports running any combination of somatic variant callers (CNV + SNV + SV) from any input type for both WGS and WES data. Additional features such as HRD, TMB and MSI can also be enabled in the analysis. The need for potentially two input files (tumor and matched normal) as well as the need for additional, somatic-specific inputs to the callers (matched normal SNV VCF for the Somatic CNV caller, systematic noise files for the Somatic SNV and SV callers) means extra care has to be taken. Please refer to the documentation for each caller for details on command line options and required inputs.
Multicaller Command Line
Germline
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work. In this section we outline some best practices for doing so.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Build up the necessary options for each component separately, so that it can be re-used in the final command line.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Somatic
One recommended tumor/normal workflow first starts with running the matched normal through the Germline Workflow.
Run matched normal through Germline workflow (CNV + SNV + SV + ...). This is required to first generate the matched normal SNV VCF. See the Somatic CNV section for more details.
Run tumor and matched normal through Somatic workflow (CNV + SNV + SV + ...)
Optionally, a full tumor/normal analysis can be done in a single execution if both the SNV and CNV modules are enabled, by leveraging the BAF information directly from the small variant caller. See the Somatic CNV section for more details. In brief, this requires the use of --enable-variant-caller true
and --cnv-use-somatic-vc-baf true
.
The sample command line shown above is for tumor-normal WGS analysis. Running in tumor-only mode just requires removing the matched normal input from the INPUT
options and configuring each individual caller to run in tumor only mode (for example, CNV uses a population B-allele VCF instead of the matched normal SNP VCF). Similar support also exists for WES analysis, if the mode is supported in single caller mode and there is no conflict in the input configurations. For WES analysis, note that CNV requires a panel of normals regardless of whether it is Tumor Normal or Tumor Only analysis.
Last updated
Was this helpful?