DRAGEN MRD Pipeline

The DRAGEN MRD (Minimal Residual Disease) pipeline detects residual cancer cells in solid tumors, enabling the monitoring of treatment efficacy and disease progression. This pipeline utilizes a tumor-informed Whole Genome Sequencing (WGS) approach. To detect trace ctDNA in plasma, analysis targets sites and alleles identified as somatic variants in the patient's initial tumor (the tumor fingerprint). Due to the need for significantly higher sensitivity compared to standard ctDNA variant calling, a dedicated application is required to detect these rare molecules (down to tumor fractions as low as 10^-4). The MRD Detect component provides ultra-sensitive detection of tumor ctDNA and generates multiple quality control (QC) metrics that can be used to assess the validity of the results.

Initial Diagnosis:

At initial diagnosis, a solid tumor biopsy and a matched normal sample are collected. The DRAGEN small variant caller identifies specific genetic mutations (SNVs) unique to the patient's cancer from this matched sample pair. This set of unique markers constitutes the "tumor fingerprint." It is recommended to prepare libraries with greater than 80X average tumor coverage and greater than 30X average normal coverage. Tumor samples can be FFPE (Formalin-Fixed Paraffin-Embedded) or fresh frozen. Buffy Coat (BC) matched normal samples are recommended.

Follow-up Plasma Samples:

After treatment (e.g., surgery, chemotherapy, stem cell transplant), follow-up plasma samples are collected at various time points to detect residual cancer cells. The tumor fingerprint from the initial diagnosis is used to target the variant sites where residual disease is assessed. Follow-up samples are also evaluated against QC thresholds to ensure sufficient quality. An inter-sample contamination detection step is included to identify potential sample contamination. It is recommended to sequence plasma samples at approximately 50X average WGS coverage.

Pipeline

The DRAGEN MRD pipeline does not include a pre-built workflow script, but rather defines the required computational steps. Data management, sample tracking, and workflow scripts are left to the user.

BCL demultiplexing must be completed prior to running the pipeline to ensure that sample-specific FASTQs (or BAMs/CRAMs) are available as input to the pipeline.

The following diagram lists the main DRAGEN steps.

Step

Description

Fingerprint generation

Run the somatic small variant caller on the matched tumor-normal sample pair to generate a once-off fingerprint VCF.

Germline variant calling

Run the germline small variant caller on the normal sample to generate a germline VCF file that will be used during subsequent QC steps.

Sample matching

Compare the normal sample germline VCF to the somatic BAM from the fingerprint step. Identify mismatched T/N sample pairs.

MRD detect

Run the MRD module on the plasma sample to detect residual disease.

Contamination detection

Run the MRD module on the plasma sample to detect human to human cross-sample contamination.

Fingerprint Generation

The DRAGEN somatic Tumor/Normal small variant caller pipeline is used to generate a fingerprint VCF. The setting --mrd-fingerprint=true enables the small variant caller and also activates additional strict filters, such as more aggressive read position filtering. This helps to reduce false positive SNVs in the fingerprint.

DRAGEN Fingerprint generation command line:

/opt/dragen/$VERSION/bin/dragen      # DRAGEN install path 
--ref-dir $REF_DIR                   # path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH     
--output-file-prefix $PREFIX 
# Inputs (e.g. FQ lists) 
--tumor-fastq-list $PATH              
--tumor-fastq-list-sample-id $STRING 
--fastq-list $PATH                    
--fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true              # optional for BAM/CRAM input 
--enable-map-align-output true       # optionally save the output BAM 
--enable-duplicate-marking true      # default=true
--Aligner.hard-clips=7               # remove any soft clips, this further helps reduce FP calls. 
# Small variant caller
--mrd-fingerprint=true 
--vc-target-bed $HIGH_CONFIDENCE_REGIONS 
--vc-systematic-noise $PATH          # e.g. FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
# Annotation 
--enable-variant-annotation=true 
--variant-annotation-data=PATH       
--vc-enable-germline-tagging=true
# QC
--qc-detect-contamination=true

For DRAGEN MRD (similar to all somatic runs) it is recommended to use the linear hashtable. DRAGEN hashtables can be downloaded here: Product Files

It is recommended to use the settings --vc-target-bed or --vc-excluded-regions-bed $BED to limit fingerprint calls to high-confidence regions. It is generally recommended to construct a BED file covering only easily mapped regions, and excluding ALU or highly repetitive regions where recurring noise tends to be more frequent.

It is also recommended to use a systematic noise file to further reduce false positives. Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

Prebuilt WES/WGS noise files

Description

WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FF

FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

For WGS FFPE (only hg38)

For more information on systematic noise files see: SNV systematic noise To download germline annotation files, please refer to: Nirvana

Germline Variant Calling

Run the DRAGEN germline small variant caller pipeline on the normal sample to generate a germline small variant VCF. This VCF will be used in downstream QC steps, including sample matching and the plasma cross-sample contamination detection steps.

DRAGEN germline small variant calling generation cmd line:

/opt/dragen/$VERSION/bin/dragen      # DRAGEN install path 
--ref-dir $REF_DIR                   # path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH     
--output-file-prefix $PREFIX 
# Inputs (e.g. FQ list) 
--fastq-list $PATH                    
--fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true              # optional for BAM/CRAM input 
--enable-map-align-output false      # optionally save the output BAM (not required)
--enable-duplicate-marking true      # default=true
# Small variant caller
--enable-variant-caller true

Sample Matching

The tumor BAM from the fingerprint generation step can be compared to the normal sample germline VCF to ensure the tumor-normal samples are matched from the same individual.

For more details on sample matching (also referred to as DRAGEN checkfingerprint, but not to be confused with the tumor fingerprint) please refer to: Sample Matching.

Example sample matching cmd line:

/opt/dragen/$VERSION/bin/dragen     
--ref-dir $REF_DIR                     # path to DRAGEN linear hashtable 
-b $TUMOR_SAMPLE_BAM
--output-directory $PATH 
--output-file-prefix $STRING
--enable-checkfingerprint true
--checkfingerprint-expected-vcf $VCF   # Normal sample germline VCF

MRD Detect

MRD detection is based on observing variants at the fingerprint locations identified in the fingerprint step above. For this step, lower quality reads, such as those with low Phred scores or where data from only one read direction is available, are removed from consideration. The number of variant signals seen at the patient-specific fingerprint sites are counted and compared to a statistical noise model. When the "signal" at the fingerprint sites significantly exceeds the number expected from sequencing "noise", a detection call is made. In this step, sample-specific noise is estimated based on dynamically generated noise sites. When all samples in this process meet QC criteria, the algorithm produces a statistical "score". A sufficiently large score is indicative of the presence of tumor DNA in the plasma sample.

Example sample matching cmd line:

/opt/dragen/$VERSION/bin/dragen      # DRAGEN install path 
--ref-dir $REF_DIR                   # path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH     
--output-file-prefix $PREFIX 
# Inputs (e.g. FQ list) 
--fastq-list $PATH                    
--fastq-list-sample-id $STRING
# Mapper 
--enable-map-align true              # optional for BAM/CRAM input 
--enable-map-align-output true       # optionally save the output BAM (not required)
--enable-duplicate-marking true      # default=true
# MRD settings
--enable-mrd=true
--mrd-probes-file $FINGERPRINT_VCF 
--mrd-stats-mode production
--mrd-score-threshold $INT           # expected to be in the range 4 - 7

The command-line parameters that control MRD detect are:

Parameter Name

Description

--enable-mrd

Enables MRD detect. Default = "false".

--mrd-probes-file

Path to the patient fingerprint file (VCF)

--mrd-score-threshold

Threshold used to determine the presence/absence of tumor DNA.

--mrd-stats-mode

Set to "production" to "eVAF" (estimated VAF) and "score" values in the .mrd_summary.json output file.

The MRD detect module generates an output summary file using the standard DRAGEN output directory and prefix: .mrd_summary.json. The file is a valid JSON file that contains an array of JSON objects. DRAGEN supports running one sample at a time, so the array will be of length one.

When the --mrd-stats-mode=production the JSON objects will at a minimum the following key-value pairs:

Run[i].TumorEstimate.illumina.eVAF
Run[i].TumorEstimate.illumina.score

The "eVAF" is the estimated fraction of cancer DNA in the sample.

The "score" can be used to indicate the presence or absence of residual cancer cells. A higher score indicates the presence of cancer cells are more likely. The exact threshold score that is used to indicate may depend on sample quality, quality and coverage, and can be optimized for a specific pipeline. It is expected that this threshold will typically lie between 4 - 7.

Plasma Contamination Detection

The MRD module provides a highly sensitive method for detecting human-to-human cross-contamination, surpassing the detection capabilities of the default DRAGEN contamination module. This enhanced sensitivity allows for the identification of even very low levels of contamination. However, it should be noted that unlike the standard DRAGEN contamination module, this module does not inherently adjust for VAF distortions that can occur in CNV-rich somatic samples. Therefore, it is recommended to include a safety margin when considering reported contamination levels.

The module requires a Germline VCF and the plasma FQ files as input. The Germline VCF was generated in an earlier step using the matched normal sample (Buffy Coat). The detection module then analyzes pileups at loci with high population allele frequencies (approximately 50%), after excluding any germline variant sites known to be present in the primary sample. By doubling the estimated variant allele frequency (VAF) at these loci, the module approximates the fractional foreign contamination.

Example sample matching cmd line:

/opt/dragen/$VERSION/bin/dragen           # DRAGEN install path 
--ref-dir $REF_DIR                        # path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH     
--output-file-prefix $PREFIX 
# Inputs (e.g. FQ list) 
--fastq-list $PATH                    
--fastq-list-sample-id $STRING
# Mapper 
--enable-map-align true                   # optional for BAM/CRAM input 
--enable-map-align-output true            # optionally save the output BAM (not required)
--enable-duplicate-marking true           # default=true
# MRD settings
--enable-mrd=true
--mrd-probes-file=$COMMON_GERMLINE_VCF    # VCF with common germline sites (population allele freq. ~50%)
--mrd-blocklist=$NORMAL_SAMPLE_VCF        # Normal sample germline VCF"
--mrd-stats-mode=production

Similar to the MRD detect module the output JSON will include two fields:

Run[i].TumorEstimate.illumina.eVAF
Run[i].TumorEstimate.illumina.score

The "eVAF" (estimated Variant Allele Frequency) can be used as a proxy for DNA contamination from individuals other than the patient of interest. Since the mrd-probes-file evaluates the signal at common germline sites (with population allele frequencies close to 50%), it is expected that only about half of the sites will actually overlap with germline sites from contaminating individuals. For this reason, the "eVAF" can be multiplied by 2 to get a more realistic point estimate of the amount of contaminating DNA.

PreviousDRAGEN Methylation Pipeline NextDRAGEN Amplicon Pipeline

Last updated 1 month ago

Was this helpful?