Analysis Methods

The Heme pipeline is a DNA only analysis software based on the DRAGEN Secondary Analysis Software. Even though it includes some of the default settings from the DNA Somatic Tumor-Only Heme WGS DRAGEN recipe, it uses a distinct recipe with different options. A user has the ability to override specific parameters via a custom configuration file.

An example command is provided that highlights the input and output used in DragenCaller step of the Heme Pipeline, which may be found in the log file. Any parameter options not displayed on the command line would be using the default value for the DRAGEN variant caller module. The detailed parameters and default arguments for the individual modules within the DragenCaller step may be found in the replay.json output. See DRAGEN Command Line Options for detailed explanations of the parameters.

/opt/edico/bin/dragen \
--ref-dir /staging/dragen-app-manager/resources/Illumina_hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11_r5.0-1 \
--output-directory DragenCaller/Sample-001 \
--output-file-prefix Sample-001 \
--events-log-file DragenCaller/Sample-001/events.csv \
--vc-systematic-noise=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/snv/IDPF_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz  \
--vc-enable-germline-tagging=true \
--variant-annotation-data=/staging/dragen-app-manager/resources/Illumina_variant_annotation_data-tmb_annotations_4.4.4-1/tmb_annotations \
--vc-germline-tag-hotspots=false \
--logging-to-output-dir=true \
--gc-metrics-enable=true \
--enable-metrics-json=true \
--enable-map-align=true  \
--enable-sort=true \
--enable-duplicate-marking=true \
--enable-variant-caller=true \
--heme-sv=true \
--sv-systematic-noise=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/sv/WGS_FF_Heme_hg38_v3.1.0_systematic_noise.sv.bedpe.gz \
--heme-cnv=true \
--cnv-population-b-allele-vcf=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/cnv/hg38_1000G_phase1.snps.high_confidence.vcf.gz \
--enable-variant-deduplication=true \
--vc-output-evidence-bam=false \
--qc-detect-contamination=true \
--enable-dux4-caller=true \
--max-base-quality=63 \
--tumor-fastq-list Sample-001.fastq_list.csv \
--tumor-fastq-list-sample-id Sample-001 \
--force

Reference Genomes

The Heme pipeline supports two reference genomes for the DRAGEN Map/Aligner - hg38 and hs37d5_chr.

The hs37d5_chr genome is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.

DRAGEN Map/Aligner

DNA alignment involves aligning sequencing reads derived from DNA libraries to a reference genome prior to variant calling.

The pipeline currently does not support UMI libraries by default. Please use the DRAGEN DNA Pipeline UMI recipe to generate the collapsed BAM as input, if so desired.

DRAGEN continues to use these final alignments as input for various variant calls such as gene amplification (copy number) calling, small variant calling (SNV, indel, MNV, delin), and DNA library quality control.

Small Variant Calling and Filtering

DRAGEN supports calling SNVs, indels, MNVs, and delins in tumor-only samples by using mapped and aligned DNA reads from a tumor sample as input. Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. DRAGEN insertions and deletions are validated with lengths of at least 0–25 bp and more than 25 bp can be supported. In addition, DRAGEN also uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp can then be reassembled into complex variants (MNVs and delins). The tumor-only pipeline produces a VCF file containing both germline and somatic variants that can be further analyzed to identify tumor mutations. The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.

DRAGEN small variant calling includes the following steps:

Detects regions with sufficient read coverage (callable regions).
Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).
Assembles de novograph haplotypes are assembled from reads (haplotype assembly).
Extracts possible somatic or germline calls (events) from column wise pileup analysis.
Calibrates read base qualities to account for background noise.
Computes read likelihoods for each read/haplotype pair.
Performs mutation calling by summing the genotype probabilities across all reads/haplotype pairs.
Performs additional filtering to improve variant calling accuracy, including using a systematic noise file. The systematic noise file indicates the statistical probability of noise at specific positions in the genome. This noise file is constructed using clean (normal) samples. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.

Additional information is available at DRAGEN DNA Pipeline Small Variant Calling.

Copy Number Variant Calling

The DRAGEN copy number variant caller performs amplification, reference, and deletion calling for CNV targets within the assay. It counts the coverage of each target interval on the panel, uses a preprocessed panel of normal samples to normalize target counts, corrects for GC coverage bias, and calculates scores of a CNV event from observed coverage and makes copy number calls.

Additional information is available at DRAGEN DNA Pipeline Small Variant Calling.

Structural Variant Calling

The DRAGEN Structural Variant (SV) Caller is described here. The DUX4 rearrangement caller is described here.

Variant Deduplication

The Variant Deduplication is described here

Contamination Detection

The contamination analysis step detects foreign human DNA contamination using the SNP error file and pileup file that are generated during the small variant calling and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. In contaminated samples, the variant allele frequencies in SNPs shift from the expected values of 0%, 50%, or 100%. The algorithm collects all positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation. The contamination score is the sum of all the log likelihood scores across the predefined SNP positions with minor allele frequency < 25% in the sample and are not likely due to CNV events.

The larger the contamination score, the more likely there is foreign DNA contamination. A sample is considered to be contaminated if the contamination score is above predefined quality threshold. The contamination score was found to be high in samples with highly rearranged genomes or HRD samples. 1% of HRD samples found to be above the threshold with no evidence for actual contamination.

Annotation

The Illumina Annotation Engine performs annotation of small variants, CNVs, and exon-level CNVs. The inputs are gVCF files and the outputs are annotated JSON files.

The Heme pipeline currently does not support annotation of gVCF files. Please use the Illumina Connected Insights to perform tertiary analysis.

Tumor Mutational Burden

Not Supported in the current release. Please use the DNA Somatic Tumor-Only Heme WGS DRAGEN recipe.

Microsatellite Instability Status

Not supported in the current release. Please use the DNA Somatic Tumor-Only Heme WGS DRAGEN recipe.

PreviousAnalysis Output NextTroubleshooting

Last updated 6 months ago

Was this helpful?