Somatic WGS Tumor Normal
DRAGEN Recipe - Somatic WGS Tumor Normal
Overview
This recipe is for processing whole genome sequencing data for somatic tumor normal workflows.
Example Command Line
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
We recommend using a linear (non-pangenome) reference for somatic analysis. For more details, refer to Dragen Reference Support.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Additional Notes and Options
Optional settings per component are listed below. Full option list at this page.
CNV
--heme-cnv true
Configures DRAGEN to use CNV settings for HEME.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
SNV
--vc-sq-filter-threshold $THRESHOLD
Threshold for sensitivity-specificity tradeoff. The default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise $SYSTEMATIC_NOISE_FILE
--vc-somatic-hotspots somatic_hotspots_GRCh38.vcf.gz
Hotspots file. By default, DRAGEN treats positions in the COSMIC database as hotspots, assigning an increased prior probability to variants at these positions. Use this option to override with a custom hotspots file if a list of positions of interest is available.
--vc-combine-phased-variants-distance $DIST
Combining phased variants. By default, DRAGEN will not combine nearby phased calls into MNVs or indels. To combine such calls, set this parameter to a value greater than zero indicating the maximum distance at which calls should be combined. If the user wants to enable the combining of phased variants the recommended value of the distance is 15 base pairs. The valid range is [0; 15]
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-target-vaf FLOAT
This option is only available starting in V4.2. The vc-target-vaf is used to select the variant allele frequencies of interest. The variant caller will aim to detect variants with allele frequencies equal to and larger than this setting. This setting will not apply a hard filter and it is possible to detect variants with allele frequencies lower than the selected threshold. On high coverage and clean datasets, a lower target-vaf may help increase sensitivity. On noisy samples (like FFPE) a higher target-vaf maybe help reduce false positives. Using a low target-vaf may also increase runtime. The valid range is [0, 1]. The default is 0.03 (or 0.001 when --vc-enable-umi-liquid=true
).
--vc-systematic-noise-method
The 'max' method is recommended for WGS and results in a more aggressive filter. The 'mean' method is recommended for UMI/PANELs/WES and results in a less aggressive filter. The default is specified in the noise file header.
SNV library specific settings
--vc-excluded-regions-bed $BED
Some FFPE samples may have a high rate of FP calls in SINE (and specifically in ALU) regions. Optionally use an ALU bed to hard filter all calls in this region. Steps are provided below to download an ALU region bed.
SNV systematic noise file
Generic SNV noise files can be downloaded here: DRAGEN Software Support Site page
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
To download a SINE/ALU regions bed for SNV excluded regions
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
The ALU bed file can be downloaded as part of the Bed File Collection: DRAGEN Software Support Site page
SV
--heme-sv true
Configure DRAGEN to use SV settings for HEME.
--sv-enable-liquid-tumor-mode true
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
--sv-systematic-noise $SYSTEMATIC_NOISE_BEDPE
Generating SV systematic noise BEDPE file You can generate systematic noise BEDPE files from normal samples collected using library prep, sequencing system, and panels.
To build the SV systematic noise file
Run DRAGEN somatic tumor-only on normal samples with
--sv-detect-systematic-noise
set to true to generate VCF output per normal sample.Build the BEDPE file using the VCFs and the
--sv-build-systematic-noise-vcfs-list
: List of input VCFs from previous step. Enter one VCF per line. Example command line is provided below
You can also build systematic noise BEDPE files in the cloud using the DRAGEN Baseline Builder App on BaseSpace.
SNV-SV deduplication
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
MSI
Microsatellite sites file
Microsatellite sites file can be downloaded here: DRAGEN Software Support Site page
HLA
enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
hla-enable-class-2
Extend genotyping to HLA class 2 genes
Last updated