Somatic WGS Tumor Normal

DRAGEN Recipe - Somatic WGS Tumor Normal

Overview

This recipe is for processing whole genome sequencing data for somatic tumor normal workflows.

Example Command Line

For most scenarios, simply creating the union of the command line options from the single caller scenarios will work.

  • Configure the INPUT options

  • Configure the OUTPUT options

  • Configure MAP/ALIGN depending on if realignment is desired or not

  • Configure the VARIANT CALLERs based on the application

  • Configure any additional options

  • Build up the necessary options for each component separately, so that they can be re-used in the final command line.

The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.

#!/bin/bash
set -euo pipefail

# Path to DRAGEN hashtable
DRAGEN_HASH_TABLE=<REF_DIR>

# Path to output directory for the DRAGEN run
OUTPUT=<OUT_DIR>

# File prefix for DRAGEN output files
PREFIX=<OUT_PREFIX>

# Path to VC systematic noise BED file. In tumor-normal variant calling, this filter
# is recommended for removing systematic noise observed in normal samples. Prebuilt
# systematic noise files are available for download on the DRAGEN Software 
# Support Site page. Alternatively, running the somatic TO pipeline on
# normal samples can generate a systematic noise file. We recommend using a
# systematic noise file based on normal samples that match the library prep of
# the tumor samples. A prebuilt systematic noise BED file can be downloaded from
# https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html
VC_SYSTEMATIC_NOISE_FILE=<VC_SYSTEMATIC_NOISE_BED_FILE_PATH>

# Define the input sources, select fastq list, fastq, bam, or cram.
INPUT_FASTQ_LIST="
  --tumor-fastq-list $TUMOR_FASTQ_LIST \
  --tumor-fastq-list-sample-id $TUMOR_FASTQ_LIST_SAMPLE_ID \
  --fastq-list $FASTQ_LIST \
  --fastq-list-sample-id $FASTQ_LIST_SAMPLE_ID \
"

INPUT_FASTQ="
  --tumor-fastq1 $TUMOR_FASTQ1 \
  --tumor-fastq2 $TUMOR_FASTQ2 \
  --RGSM-tumor $RGSM_TUMOR \
  --RGID-tumor $RGID_TUMOR \
  --fastq-file1 $FASTQ1 \
  --fastq-file2 $FASTQ2 \
  --RGSM $RGSM \
  --RGID $RGID \
"

INPUT_BAM="
  --tumor-bam-input $TUMOR_BAM \
  --bam-input $BAM \
"

INPUT_CRAM="
  --tumor-cram-input $TUMOR_CRAM \
  --cram-input $CRAM \
"

# Select input source, here in this example we use INPUT_FASTQ_LIST
INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  $INPUT_FASTQ_LIST \
"

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
"

MA_OPTIONS="
  --enable-map-align true \
  --enable-sort true \
  --enable-duplicate-marking true \
"

CNV_OPTIONS="
  --enable-cnv true \
  --cnv-use-somatic-vc-baf true \
"

# HRD requires enabling CNV
HRD_OPTIONS="
--enable-hrd=true \
"

SNV_OPTIONS="
  --enable-variant-caller true \
  --vc-systematic-noise $VC_SYSTEMATIC_NOISE_FILE \
"

SV_OPTIONS="
  --enable-sv true \
"

SNV_SV_DEDUPLICATION_OPTIONS="
  --enable-variant-deduplication true \
"

TMB_OPTIONS="
--enable-tmb=true
# Nirvana settings required for TMB
--enable-variant-annotation=true  
--variant-annotation-data=PATH
--variant-annotation-assembly=GRCh37/8
"

MSI_OPTIONS="
--msi-command=tumor-normal \
--msi-coverage-threshold=60 \
--msi-microsatellites-file=$MSI_MICROSATELLITES_FILE \
"

HLA_OPTIONS="
--enable-hla=true \
--hla-enable-class-2=true \ # only if the panel has sufficient coverage for class II HLA typing 
"

# Construct final command line
CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $MA_OPTIONS \
  $CNV_OPTIONS \
  $SNV_OPTIONS \
  $SV_OPTIONS \
  $SNV_SV_DEDUPLICATION_OPTIONS \
  $HRD_OPTIONS \
  $TMB_OPTIONS \
  $MSI_OPTIONS \
  $HLA_OPTIONS \
"

# Execute
echo $CMD
bash -c $CMD

Additional Notes and Options

Optional settings per component are listed below. Full option list at this page.

CNV

SNV

SNV library specific settings

SNV systematic noise file

Generic SNV noise files can be downloaded here: DRAGEN Software Support Site page

When possible it is recommended to build a pipeline specific systematic noise file that matches the library prep and sequencer of interest:

Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:

### choose input either from
### i) BAM
INPUT="--tumor-bam-input ${NORMAL_BAM}"
### ii) FASTQs
INPUT="--tumor-fastq-list ${NORMAL_FASTQ_LIST} \
  --tumor-fastq-list-sample-id ${NORMAL_FASTQ_LIST_SAMPLE_ID}"
###

dragen \
-r ${REFERENCE} \
${INPUT} \
--vc-detect-systematic-noise=true \
--vc-enable-germline-tagging=true \
--enable-variant-annotation=true \
--variant-annotation-data ${NIRVANA_ANNOTATION_FOLDER} \
--variant-annotation-assembly ${REF_TYPE} \  # GRCh37 or GRCh38
--intermediate-results-dir ${TMP} \
--output-directory ${DIR} \
--output-file-prefix ${PREFIX}

Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.

Step 2. Generate the final noise file with:

dragen \
-r ${REF_DIR} \
--build-sys-noise-vcfs-list ${VCF_LIST} \  
--build-sys-noise-method=max \ # sets the default noise mode for this noise file by tagging the noise file header with '##NoiseMethod=max'
--output-directory ${DIR} \
--output-file-prefix ${PREFIX}

To download a SINE/ALU regions bed for SNV excluded regions

ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED.

The ALU bed file can be downloaded as part of the Bed File Collection: DRAGEN Software Support Site page

SV

Generating SV systematic noise BEDPE file You can generate systematic noise BEDPE files from normal samples collected using library prep, sequencing system, and panels.

To build the SV systematic noise file

  1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

  2. Build the BEDPE file using the VCFs and the --sv-build-systematic-noise-vcfs-list: List of input VCFs from previous step. Enter one VCF per line. Example command line is provided below

dragen \
-r <HASHTABLE> \
--sv-build-systematic-noise-vcfs-list <LIST OF VCF FILES>
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \

You can also build systematic noise BEDPE files in the cloud using the DRAGEN Baseline Builder App on BaseSpace.

SNV-SV deduplication

We recommend using --enable-variant-deduplication true to filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF (PASS in the FILTER column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.

MSI

Microsatellite sites file

Microsatellite sites file can be downloaded here: DRAGEN Software Support Site page

HLA

Last updated