# B-Allele Frequency Output

B-Allele frequency (BAF) output is enabled by default in germline and somatic VCF and gVCF runs.

The BAF value is calculated as either `AF` or `(1 - AF)`, where

* `AF = (alt_count / (ref_count + alt_count))`
* `BAF = 1 - AF`, only when ref base < alt base, order of priority for bases is `A < T < G < C < N`.

The B-allele frequency values are often plotted to visually inspect the spread away from a perfectly diploid heterozygous call (BAF=50%). This plot is more easily interpreted if it is symmetric about the BAF=50% line. To ensure the symmetry, a heuristic must be used to determine when `BAF = AF` or `BAF = 1-AF`. This definition of B-Allele Frequency is based on the definition that is used for bead arrays, as most users are accustomed to that implementation. Here, the choice of the B allele is based on the color of dye attached to each nucleotide. A and T get one color, G and C get the other color. The bead array implementation has much more complex rule for tie-breaking between A and T or G and C that involves top and bottom strands. This is unnecessary and so the simpler hierarchical approach of using a priority for the nucleotides `A<T<G<C<N` is used.

For each small variant VCF entry with exactly one SNP alternate allele, the output contains a corresponding entry in the BAF output file.

* `<NON_REF>` lines are excluded
  * ForceGT variants (as marked by the "FGT" tag in the INFO field) are not included in the output, unless the variant also contains the "NML" tag in the INFO field.
  * Variants where the ref\_count and alt\_count are both zero are not included in the output.

## BAF Options

* `--vc-enable-baf` Enable or disable B-allele frequency output. Enabled by default.

## BAF Output

The BF generates are BigWig-compressed files, named `<output-file-prefix>.baf.bw` and `<output-file-prefix>.hard-filtered.baf.bw`. The hard-filtered file only contains entries for variants that pass the filters defined in the VCF (ie, PASS entries).

Each entry contains the following information: `Chromosome Start End BAF`

Where:

* Chromosome is a string matching a reference contig.
* Start and end values are zero-based, half open intervals.
* BAF is a floating point value.
