DNA Somatic Tumor-Normal Solid WGS UMI
The DRAGEN recipe includes the recommended pipeline specific commands.
New in DRAGEN V4.3.17:
The somatic tumor-normal pipeline now natively supports UMIs. UMIs can be present in either the tumor sample alone or in both the tumor and normal samples. It is no longer necessary to run a separate pre-processing step to generate BAMs. In addition, DRAGEN V4.3.17 now also supports starting from BAM or CRAM files with the --enable-map-align=true
option enabled.
Important Note for Earlier V4.3.X Versions:
In these versions, UMI support is limited. You can only use UMIs in tumor-only mode or by first performing UMI collapsing separately on both the tumor and normal samples. After collapsing, the resulting BAM files can be used as input for the variant calling step.
Notes and additional options
Hashtable
For DRAGEN somatic runs it is recommended to use the linear hashtable.
Input options
DRAGEN input sources include: fastq list, fastq, bam, or cram.
FQ list Input
FQ Input
BAM Input
CRAM Input
Mapping and Aligning
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
Fractional (Raw Reads) Downsampling
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
UMI
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname
, fastq
, bamtag
.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex
, random-simplex
, nonrandom-duplex
.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI inputs reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2.
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--tumor-normal-has-umi STRING
Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.
SNV
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $INT
Threshold for sensitivity-specificity tradeoff. The default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
HLA
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
CNV
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm
for germline WGS, aslm
for somatic WGS, and hslm
for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Annotation
TMB
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
MSI
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
SV
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix
followed by sv.small_indel_dedup
. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for Tumor-Normal, but strongly recommended for Tumor-Only.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
Resource Files
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
SNV Systematic Noise
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
Prebuild
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
Custom
Prebuilt systematic noise files are available for WES or WGS applications. For these applications, it is considered optional to build custom noise files. For high-sensitivity applications, including panels, it is required to build custom noise files. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30–70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST}
by specifying 1 file per line.
Step 2. Generate the final noise file.
SV Systematic Noise
Systematic noise files are also recommended for Tumor-Normals workflows, but are considered essential for reducing FP calls in Tumor-Only workflows.
Prebuilt
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS/WES FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For HEME
Custom
It is considered optional to build a custom systematic noise file for WES or WGS applications, but for high sensitivity applications like panels it is strongly recommended. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise
set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Last updated
Was this helpful?