DNA Somatic Tumor-Normal MRD

For conceptual background and pipeline overview, see DRAGEN MRD Pipeline Overview

A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

Step 0: Fastq generation

  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--output-directory $OUTPUT_DIR 
--sample-sheet $SAMPLE_SHEET 
--bcl-input-directory $RUN_FOLDER 
--bcl-conversion-only true 
--strict-mode true 
# if using ora compression (.fastq.ora) rather than gzip (.fastq.gz) 
--ora-reference $ORA_REFERENCE 
--fastq-compression-format dragen 

BCL conversion is optional if FASTQ data already exists. If starting from BCL files, this step must be completed before running the MRD pipeline to ensure sample-specific FASTQs are available as input.

Step 1: Read alignment and targeted variant calling

Step 1A: Read alignment and targeted germline variant calling (FFPE)

Step 1B: Read alignment and targeted germline variant calling (BC/Plasma)

Read alignment and targeted variant calling notes

For consistency, use the linear reference. For FFPE samples, use --Aligner.hard-clips=7 to use hard clipping for all alignment types; omit this parameter for Buffy Coat or Plasma samples.

Step 2: Fingerprint generation + QC

Step 2A: Fingerprint generation and FFPE normal-aware contamination QC

Fingerprint generation notes

For the DRAGEN MRD pipeline (similar to all somatic runs) it is recommended to use the linear hashtable. DRAGEN hashtables can be downloaded from Product Filesarrow-up-right.

It is recommended to use --vc-target-bed $BED or --vc-excluded-regions-bed $BED to limit fingerprint calls to high-confidence regions. Construct a BED file covering only easily mapped regions, excluding ALU or highly repetitive regions where recurring noise tends to be more frequent.

Use a systematic noise file to further reduce false positives. Prebuilt systematic noise BED files can be downloaded from Product Filesarrow-up-right:

Prebuilt WGS noise files
Description

WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FF

FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FFPE (only hg38)

For more information see SNV systematic noise. To download germline annotation files, refer to Nirvana.

FFPE normal-aware contamination QC notes

It is recommended to use --qc-somatic-contam-vcf for FFPE normal-aware contamination detection. The somatic contamination VCF files are bundled with every DRAGEN installation at /opt/dragen/<version>/resources/qc/. For the hg38 reference, use somatic_sample_cross_contamination_resource_hg38.vcf.gz.

For samples not run in matched Tumor/Normal mode (i.e., BC and Plasma samples in Step 1B), a germline contamination VCF file can be passed to --qc-cross-cont-vcf. These files are also located at /opt/dragen/<version>/resources/qc/. For the hg38 reference, use sample_cross_contamination_resource_hg38.vcf.gz.

Step 2B: FFPE/BC sample matching QC

Step 3: MRD detection

MRD detection notes

The command line parameters that control MRD detect are:

Parameter Name
Description

--enable-mrd

Enables MRD detect. Default = "false".

--mrd-probes-file

Path to the individual's tumor fingerprint VCF file

--mrd-score-threshold

Threshold used to determine the presence/absence of residual cancer DNA in the plasma. Default = 4.0.

The MRD detect module generates an output summary file using the standard DRAGEN output directory and prefix: .mrd_summary.json. The file is a valid JSON file that contains an array of JSON objects. DRAGEN supports running one sample at a time, so the array will be of length one.

The output JSON will include the following two fields of interest:

  • Run[1].TumorEstimate.illumina.eVAF

  • Run[1].TumorEstimate.illumina.score

The "eVAF" (estimated Variant Allele Frequency) is the estimated fraction of cancer DNA in the plasma sample.

The "score" can be used to determine presence/absence of residual cancer DNA in the plasma. A higher score indicates that the presence of cancer DNA is more likely. The exact threshold score that is used to indicate a positive ctDNA status may depend on sample quality and coverage, and can be optimized for a specific pipeline. It is expected that this threshold will typically be between 4 - 7.

Step 4: Plasma QC

Step 4A: Plasma/BC sample matching QC

Step 4B: Plasma contamination QC

Plasma QC notes

Similar to the MRD detection step, the output JSON will include the following field of interest:

The "eVAF", multiplied by two, can be used as a proxy for plasma contamination from a different human.

Last updated

Was this helpful?