Somatic Tumor Normal with UMI

DRAGEN Recipe - Somatic UMI Tumor Normal

Overview

This recipe is for processing sequencing data with unique molecular identifier (UMI) for somatic tumor normal workflows.

Example Command Line

For Somatic UMI Tumor Normal inputs, tumor and normal sample need to be run separately for the Map/Align stage, and then Variant Calling is started from tumor and normal UMI collapsed BAM.

For Map/Align stage:

  • Configure the INPUT options

  • Configure the OUTPUT options

  • Configure MAP/ALIGN

  • Configure UMI options

For Variant Calling stage:

  • Configure the INPUT options

  • Configure the OUTPUT options

  • Configure the VARIANT CALLERs based on the application

  • Configure any additional options

  • Build up the necessary options for each component separately, so that they can be re-used in the final command line.

We recommend using a linear (non-pangenome) reference for somatic analysis. For more details, refer to Dragen Reference Support.

The following are partial templates that can be used as starting points. Adjust them accordingly for your specific use case.

Map/Align stage

#!/bin/bash
set -euo pipefail

# Path to DRAGEN hashtable
DRAGEN_HASH_TABLE=<REF_DIR>

# Path to output directory for the DRAGEN run
OUTPUT=<OUT_DIR>

# File prefix for DRAGEN output files
PREFIX=<OUT_PREFIX>

# Define the input sources, select fastq list, fastq, bam, or cram. Please select either tumor or normal input with UMI to generate collapsed BAM. In this example, we use tumor input option.
INPUT_FASTQ_LIST="
  --tumor-fastq-list $TUMOR_FASTQ_LIST \
  --tumor-fastq-list-sample-id $TUMOR_FASTQ_LIST_SAMPLE_ID \
"

INPUT_FASTQ="
  --tumor-fastq1 $TUMOR_FASTQ1 \
  --tumor-fastq2 $TUMOR_FASTQ2 \
  --RGSM-tumor $RGSM_TUMOR \
  --RGID-tumor $RGID_TUMOR \
"

INPUT_BAM="
  --tumor-bam-input $TUMOR_BAM \
"

INPUT_CRAM="
  --tumor-cram-input $TUMOR_CRAM \
"

# Select input source, here in this example we use INPUT_FASTQ_LIST
INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  $INPUT_FASTQ_LIST \
"

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
"

MA_OPTIONS="
  --enable-map-align true \
  --enable-sort true \
"

UMI_OPTIONS="
  --enable-umi true \
  --umi-source $UMI_SOURCE \
  --umi-library-type $UMI_LIBRARY_TYPE \
"

# Construct final command line
CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $MA_OPTIONS \
  $UMI_OPTIONS \
"

# Execute
echo $CMD
bash -c $CMD

Variant Calling (and optional biomarkers) stage:

#!/bin/bash
set -euo pipefail

# Path to DRAGEN hashtable
DRAGEN_HASH_TABLE=<REF_DIR>

# Path to output directory for the DRAGEN run
OUTPUT=<OUT_DIR>

# File prefix for DRAGEN output files
PREFIX=<OUT_PREFIX>

# Path to VC systematic noise BED file. In tumor-normal variant calling, this filter
# is recommended for removing systematic noise observed in normal samples. Prebuilt
# systematic noise files are available for download on the DRAGEN Software 
# Support Site page. Alternatively, running the somatic TO pipeline on
# normal samples can generate a systematic noise file. We recommend using a
# systematic noise file based on normal samples that match the library prep of
# the tumor samples. A prebuilt systematic noise BED file can be downloaded from
# https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html
VC_SYSTEMATIC_NOISE_FILE=<VC_SYSTEMATIC_NOISE_BED_FILE_PATH>

INPUT_BAM="
  --tumor-bam-input $TUMOR_BAM \
  --bam-input $BAM \
"

INPUT_OPTIONS="
  --ref-dir $DRAGEN_HASH_TABLE \
  $INPUT_BAM \
"

OUTPUT_OPTIONS="
  --output-directory $OUTPUT \
  --output-file-prefix $PREFIX \
"

SNV_OPTIONS="
  --enable-variant-caller true \
  --vc-enable-umi-solid true or --vc-enable-umi-liquid true \
  --vc-target-bed $VC_TARGET_BED \
  --vc-systematic-noise $VC_SYSTEMATIC_NOISE_FILE \
  --vc-systematic-noise-method mean \
"

CNV_OPTIONS="
  --enable-cnv true \
  --cnv-target-bed $CNV_TARGET_BED \
  --cnv-combined-counts $CNV_PANEL_OF_NORMALS \
  --cnv-use-somatic-vc-baf true \
"

# HRD requires enabling CNV
HRD_OPTIONS="
--enable-hrd=true \
"

SV_OPTIONS="
  --enable-sv true \
  --sv-exome true \
  --sv-call-regions-bed $SV_TARGET_BED \
"

TMB_OPTIONS="
--enable-tmb=true
# Nirvana settings required for TMB
--enable-variant-annotation=true  
--variant-annotation-data=PATH
--variant-annotation-assembly=GRCh37/8
"

MSI_OPTIONS="
--msi-command=tumor-normal \
--msi-coverage-threshold=60 \
--msi-microsatellites-file=$MSI_MICROSATELLITES_FILE \
"

HLA_OPTIONS="
--enable-hla=true \
--hla-as-filter-min-threshold=29.0 \
--hla-as-filter-ratio-threshold=0.85 \
--hla-enable-class-2=true \ # only if the panel has sufficient coverage for class II HLA typing 
"

# Construct final command line
CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $SNV_OPTIONS \
  $CNV_OPTIONS \
  $HRD_OPTIONS \
  $SV_OPTIONS \
  $HRD_OPTIONS \
  $TMB_OPTIONS \
  $MSI_OPTIONS \
  $HLA_OPTIONS \
"

# Execute
echo $CMD
bash -c $CMD

Additional Notes and Options

Optional settings per component are listed below. Full option list at this page.

UMI

SNV

SNV library specific settings

SNV systematic noise file

Generic SNV noise files can be downloaded here: DRAGEN Software Support Site page

However for UMI and panels it is strongly recommended to build a custom systematic noise file as follow:

Step 1. Run DRAGEN somatic tumor-only on each of approximately 20-50 normal samples:

### choose input either from
### i) BAM
INPUT="--tumor-bam-input ${NORMAL_BAM}"
### ii) FASTQs
INPUT="--tumor-fastq-list ${NORMAL_FASTQ_LIST} \
  --tumor-fastq-list-sample-id ${NORMAL_FASTQ_LIST_SAMPLE_ID}"
###

dragen \
-r ${REFERENCE} \
${INPUT} \
--vc-detect-systematic-noise=true \
--vc-detect-systematic-noise-mode=UMI \ # detect ultra low noise levels relevant for UMI panels
--vc-enable-germline-tagging=true \
--enable-variant-annotation=true \
--variant-annotation-data ${NIRVANA_ANNOTATION_FOLDER} \
--variant-annotation-assembly ${REF_TYPE} \ # GRCh37 or GRCh38
--intermediate-results-dir ${TMP} \
--output-directory ${DIR} \
--output-file-prefix ${PREFIX}

Gather the full paths to the VCFs from step 1 in ${VCF_LIST} by specifying 1 file per line.

Step 2. Generate the final noise file with:

dragen \
-r ${REF_DIR} \
--build-sys-noise-vcfs-list ${VCF_LIST} \  
--build-sys-noise-method=mean \ # sets the default noise mode for this noise file by tagging the noise file header with '##NoiseMethod=mean' 
--output-directory ${DIR} \
--output-file-prefix ${PREFIX}

To download a SINE/ALU regions bed for SNV excluded regions

ALUs comprise approximately 11% of the genome and are common in introns. High rates of deamination FP calls have been observed in some FFPE libraries. If the ALU regions are not clinically significant for a specific analysis, then it is recommended to simply filter out the entire ALU region using the DRAGEN excluded regions filter: --vc-excluded-regions-bed $BED.

The ALU bed file can be downloaded as part of the Bed File Collection: DRAGEN Software Support Site page

CNV

Please include the matched normal sample in the CNV panel of normals.

Generating Panel of Normals (PON)

Somatic WES CNV requires PON files. Follow the two steps below to generate CNV PON:

  1. Target counts generation (per normal sample): Target counts of individual normal sample should be generated as baseline. Any options used for panel of normals generation (BED file, GC Bias Correction, etc) should be matched when processing the case sample.

CNV_PON_OPTIONS="
  --enable-cnv true \
  --cnv-target-bed $CNV_TARGET_BED \
"

CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $CNV_PON_OPTIONS \
"
  1. Combined counts generation: Individual PON counts can be merged into a single file as a <prefix>.combined.counts.txt.gz file.

CNV_COMBINED_COUNTS_OPTIONS="
  --enable-cnv true \
  --cnv-generate-combined-counts true \
  --cnv-normals-list $CNV_NORMALS_LIST \
"

CMD="
  dragen \
  $INPUT_OPTIONS \
  $OUTPUT_OPTIONS \
  $CNV_COMBINED_COUNTS_OPTIONS \
"

$CNV_NORMALS_LIST is a single text file with paths to each target counts file generated by step1 (either .target.counts.gz or .target.counts.gc-corrected.gz). Output will have a PON file with suffix .combined.counts.txt.gz file. Use the PON file in case sample runs of DRAGEN CNV with --cnv-combined-counts option.

For more information, see Panel of Normals.

SV

TMB library specific settings

MSI

Microsatellite sites file

Microsatellite sites file can be downloaded here: DRAGEN Software Support Site page

For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files please refer to the MSI Biomarker section in the user guide.

MSI library specific settings

HLA

Last updated