Somatic WES Tumor Only
DRAGEN Recipe - Somatic WES Tumor Only
Overview
This recipe is for processing whole exome sequencing data for somatic tumor only workflows.
Example Command Line
For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.
Configure the INPUT options
Configure the OUTPUT options
Configure MAP/ALIGN depending on if realignment is desired or not
Configure the VARIANT CALLERs based on the application
Configure any additional options
Build up the necessary options for each component separately, so that they can be re-used in the final command line.
We recommend using a linear (non-pangenome) reference for somatic analysis. For more details, refer to Dragen Reference Support.
The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.
Additional Notes and Options
Optional settings per component are listed below. Full option list at this page.
CNV
Generating Panel of Normals (PON)
Somatic WES CNV requires PON files. Follow the two steps below to generate CNV PON:
Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.
Combined counts generation: Individual PON counts can be merged into a single file as a
<prefix>.combined.counts.txt.gz
file.
$CNV_NORMALS_LIST
is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz
or .target.counts.gc-corrected.gz
). Output will have a PON file with suffix .combined.counts.txt.gz
file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts
option.
For more information, see Panel of Normals.
SNV
SNV library specific settings
SNV systematic noise file
Generic SNV noise files can be downloaded here: DRAGEN Software Support Site page
When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:
Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:
Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file with:
To download a SINE/ALU regions bed for SNV excluded regions
ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED
.
The ALU bed file can be downloaded as part of the Bed File Collection: DRAGEN Software Support Site page
SNV-SV deduplication
We recommend using --enable-variant-deduplication true
to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS
in the FILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
MSI
Microsatellite sites file
Microsatellite sites file can be downloaded here: DRAGEN Software Support Site page
Build Normal references of miscrosatellite repeat distribution
Normal reference files can be generated by running collect-evidence
mode on a panel of normal samples. This ONLY works with DRAGEN germline mode.
The --msi-microsatellites-file
should be the same file used for running tumor-only
mode. --msi-coverage-threshold
should also be the same value used for running tumor-only
mode.
A minimum of 20 normal samples is required for tumor-only mode.
HLA
Last updated