Evidence BAM

Overview

The DRAGEN small variant caller is a haplotype-based caller which performs local assembly of all reads in an active region into a de Bruijn graph (DBG). The assembly process uses all the read bases including the soft-clip bases of reads. The soft-clip bases provide evidence for the presence of variants, specifically longer insertions and deletions which are not present in the read cigar and hence cannot be directly viewed in IGV.

The assembly and realignment step (using pair-HMM) performed by variant caller aims to correct mapping errors made by the original aligner and improves the overall variant caller accuracy. Using the evidence BAM, we can view how the variant caller sees the read evidence and how the reads have been realigned making it a very useful debugging tool.

By default, the evidence BAM contains only a subset of regions processed by the small variant caller. Only regions which have candidate indel variants and some percentage of soft-clip reads in the pileup are realgned and output in the evidence BAM. This is done to reduce the run-time overhead needed to generate the evidence BAM.

Outputs

The output of the VC Evidence BAM feature will match the output format that the customer has selected using --output-format option. The default format is bam.

  • A bam/cram/sam file with the suffix _evidence.bam/cram/sam and the corresponding index file. The evidence BAM can be enabled along with the regular BAM output from the Map-Align step. When multiple BAM are passed as inputs to the variant caller, for e.g., in Tumor-Normal calling, then they will be combined in the evidence BAM output and tagged with appropriate read groups.

  • A bed file with regions that were realigned and output in VC Evidence BAM with suffix ".realigned-regions.bed".

Features

The evidence BAM consists of realigned reads, badly mated reads and reads that are disqualified by the variant caller based on the read likelihood scores.

  • Disqualified and Badly Mated reads

    Reads that are badly-mated (when the read and its mate are mapped to different chromosmes) are tagged with a BM tag (integer) and reads that are disqualified (based on read likelihoods) are tagged with the DQ tag (integer). These reads are filtered out by the genotyper in the variant caller. The alignment score tag AS is forced to 0 for such reads in the evidence BAM and hence, they can be filtered from the IGV pileup by setting the minimum AS score to be 1 instead of 0.

  • Graph Haplotypes

    When enabling graph haplotypes output using --vc-evidence-bam-output-haplotypes, all the haplotypes constructed by the de Bruijn graph are output in the evidence BAM as single reads covering the entire active region. The reads and haplotypes are tagged with different read groups which makes it easily distinguishable in IGV. In IGV, we can use “Color Alignments By” or “Group Alignments By” > read group to separate out the reads from the haplotypes. The haplotypes are tagged with read group EvidenceHaplotype and the reads are part of the EvidenceRead_Normal/Tumor read group.

    The haplotypes are named as Haplotype 1, Haplotype 2 and so on and have an additional ‘HC’ tag (integer). The realigned reads also have an HC tag which encodes which haplotype best matches the read based on the likelihood calculation. Only reads which are supported by a single unique haplotype have the HC tag, reads which match more than one haplotype well do not have an HC tag. The use of this tag is primarily intended to enable highlighting of reads in IGV. Go to "Color Alignments By > Tag" and enter "HC" to view which reads are uniquely supported by a certain graph haplotypes.

Command Line Arguments

NameDescriptionDefault Value

vc-output-evidence-bam

Enable evidence BAM output

False

vc-evidence-bam-output-haplotypes

Output graph haplotypes in evidence BAM

False

vc-evidence-bam-clipped-read-threshold

Percentage of clipped reads in active region to enable evidence BAM output for that region

10%

vc-evidence-bam-force-output

Force evidence BAM output for all active regions

False

Last updated