B-Allele Frequency Output

B-Allele frequency (BAF) output is enabled by default in germline and somatic VCF and gVCF runs.

The BAF value is calculated as either AF or (1 - AF), where

  • AF = (alt_count / (ref_count + alt_count))

  • BAF = 1 - AF, only when ref base < alt base, order of priority for bases is A < T < G < C < N.

The B-allele frequency values are often plotted to visually inspect the spread away from a perfectly diploid heterozygous call (BAF=50%). This plot is more easily interpreted if it is symmetric about the BAF=50% line. To ensure the symmetry, a heuristic must be used to determine when BAF = AF or BAF = 1-AF. This definition of B-Allele Frequency is based on the definition that is used for bead arrays, as most users are accustomed to that implementation. Here, the choice of the B allele is based on the color of dye attached to each nucleotide. A and T get one color, G and C get the other color. The bead array implementation has much more complex rule for tie-breaking between A and T or G and C that involves top and bottom strands. This is unnecessary and so the simpler hierarchical approach of using a priority for the nucleotides A<T<G<C<N is used.

For each small variant VCF entry with exactly one SNP alternate allele, the output contains a corresponding entry in the BAF output file.

  • <NON_REF> lines are excluded

    • ForceGT variants (as marked by the "FGT" tag in the INFO field) are not included in the output, unless the variant also contains the "NML" tag in the INFO field.

    • Variants where the ref_count and alt_count are both zero are not included in the output.

BAF Options

  • --vc-enable-baf Enable or disable B-allele frequency output. Enabled by default.

BAF Output

The BF generates are BigWig-compressed files, named <output-file-prefix>.baf.bw and <output-file-prefix>.hard-filtered.baf.bw. The hard-filtered file only contains entries for variants that pass the filters defined in the VCF (ie, PASS entries).

Each entry contains the following information: Chromosome Start End BAF

Where:

  • Chromosome is a string matching a reference contig.

  • Start and end values are zero-based, half open intervals.

  • BAF is a floating point value.

Last updated