DNA Somatic Tumor-Only Heme WGS

A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SSD /staging 
--output-file-prefix $PREFIX 
# Inputs 
--tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
--tumor-fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true                 #optional with BAM/CRAM input 
--enable-map-align-output true          #optionally save the output BAM 
--enable-sort true                      #default=true 
--enable-duplicate-marking true         #default=true 
# Small variant caller 
--enable-variant-caller true 
--vc-systematic-noise $PATH             #Required 
--vc-target-vaf $NUM                    #Default = 0.03 (>= 3% VAF) 
# SV 
--heme-sv true 
--sv-systematic-noise $PATH             #Recommended 
--enable-oncovirus-detection true       #Optional 
--oncovirus-detection-db $PATH          #Optional 
# DUX4 
--enable-dux4-caller true 
# CNV 
--heme-cnv true 
--cnv-population-b-allele-vcf $POP_VCF 
--cnv-enable-self-normalization true 
# Annotation 
--variant-annotation-data $NIRVANA_PATH 
--vc-enable-germline-tagging true 
--vc-germline-tag-hotspots false        #When germline tagging is enabled, disable it only for somatic hotspot variants 

Notes and additional options

Hashtable

For DRAGEN somatic runs it is recommended to use the linear hashtable.

See: Product Filesarrow-up-right

Input options

DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

FQ list Input

FQ Input

BAM Input

CRAM Input

Mapping and Aligning

Option
Description

--enable-map-align true

Optionally disable map & align (default=true).

--enable-map-align-output true

Optionally save the output BAM (default=false).

--Aligner.clip-pe-overhang 2

Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

Duplicate Marking

Option
Description

--enable-duplicate-marking true

By default, DRAGEN marks duplicate reads and exclude them from variant calling.

Fractional (Raw Reads) Downsampling

DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

Option
Description

--enable-fractional-down-sampler

Set to true to enable fractional downsampling. The default value is false.

--down-sampler-normal-subsample

Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

--down-sampler-tumor-subsample

Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

--down-sampler-random-seed

Specify the random seed for different runs of the same input data. The default value is 42.

SNV

Option
Description

--vc-target-bed

Limit variant calling to region of interest.

--vc-combine-phased-variants-distance INT

Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

--vc-emit-ref-confidence GVCF

To enable gVCF output.

--vc-enable-vcf-output

To enable VCF file output during a gVCF run, set to true. The default value is false.

--vc-systematic-noise $PATH

Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

--vc-somatic-hotspots $PATH

DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

--vc-sq-filter-threshold $NUM

Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

--vc-systematic-noise-filter-threshold $INT

Threshold for sensitivity-specificity tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Pipeline specific default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

--vc-systematic-noise-filter-threshold-in-hotspot $INT

Threshold for sensitivity-specificity tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Pipeline specific default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

--vc-excluded-regions-bed $BED

Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collectionarrow-up-right

For more detail on the small variant caller in somatic mode please refer to Somatic Mode

CNV

Option
Description

--cnv-enable-gcbias-correction true

Enable or disable GC bias correction when generating target counts.

--cnv-segmentation-mode $SEG_MODE

Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

--cnv-segmentation-bed $PATH

Specify a segmentation bed file to add pre-defined segments to be called.

--cnv-population-b-allele-vcf $POP_VCF

Specify a population SNP VCF. This option can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

--cnv-enable-cyto-output true

Enable Cytogenetics-compatible output (default false).

--heme-cnv true

Configures DRAGEN to use CNV settings for HEME.

Annotation

For instructions on how to download the Nirvana annotation database, please refer to Nirvana

SV

Option
Description

--sv-call-regions-bed

Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

--sv-exclusion-bed

Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

--enable-variant-deduplication true

Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

--sv-systematic-noise $BEDPE

Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES, enrichment and amplicon panels.

--sv-somatic-ins-tandup-hotspot-regions-bed $BED

Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

--sv-min-candidate-variant-size

Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

--sv-min-scored-variant-size

After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

--enable-oncovirus-detection

Set to enable detection of oncoviral integration. See Oncovirus Detection for more information.

--oncovirus-detection-db

Specifies the directory containing oncovirus detection resource files. See Oncovirus Detection for more information.

Option
Recommended Value for Liquid Tumors (e.g. AML/MLL)

--heme-sv true

Configures DRAGEN to use SV settings for Liquid Tumors (e.g., AML/MLL).

--sv-min-scored-variant-size $INT

100000

For more information, see Structural Variant Calling.

DUX4

Option
Description

--dux4-skip-sanity-check true

Bypass the requirements checks if the input datasets don't comply with parameters listed in prerequisites

For more information, see DUX4-rearrangement Calling.

Resource Files

DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

SNV Systematic Noise

Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

DRAGEN has pre-built systematic noise files for WGS, WES and for Pillar Amplicons. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

Prebuilt

Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Filesarrow-up-right.

Prebuilt WES/WGS noise files
Description

WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FF

FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FFPE (only hg38)

WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WES FF and FFPE

Custom

This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

Step 2. Generate the final noise file.

This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpacearrow-up-right or the DRAGEN Systematic Noise File Builder Pipeline on ICAarrow-up-right.

SV Systematic Noise

SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available.

Prebuilt

Prebuilt WGS SV systematic noise files can be downloaded here: Product Filesarrow-up-right.

Prebuilt WGS noise files
Description

WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

For WGS, FF/FFPE

IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

For WGS, HEME

Custom

Custom systematic noise files can be generated for WES, Panels or Amplicon. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

Step 2. Build the BEDPE file using input VCFs from previous step.

Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpacearrow-up-right or the DRAGEN Systematic Noise File Builder Pipeline on ICAarrow-up-right.

Last updated

Was this helpful?