GBA Caller

The GBA Caller is capable of detecting both recombinant-like and nonrecombinant-like variants in the GBA gene from whole-genome sequencing (WGS) data. Disruption of all copies of the GBA gene in an individual causes the autosomal recessive disorder Gaucher disease, and carriers are at increased risk of Parkinson's disease and Lewy body dementia. Due to high sequence similarity with its pseudogene paralog GBAP1, calling recombinant-like variants in GBA requires a specialized caller.

To enable the GBA Caller, use --enable-gba=true as part of a germline-only WGS analysis workflow. The GBA Caller is disabled by default and requires WGS data aligned to a human reference genome with at least 30x coverage.

The GBA Caller performs the following steps:

  1. Determine the total combined GBA and GBAP1 copy number

  2. Detect nonrecombinant-like variants from a set of 111 known variants

  3. Assemble phased haplotypes in the exon 9-11 region where recombinant variants occur

  4. Detect any GBAP1 -> GBA breakpoints that are consistent with one of the 7 known recombinant-like variants

Total Combined GBA and GBAP1 Copy Number

A 10 kb region of unique sequence in between GBA and GBAP1 is used to compute the copy number change due to reciprocal recombination events. Reads that align to this 10 kb region are counted and the count is normalized to a diploid baseline derived from 3000 preselected 2 kb regions across the genome. The 3000 normalization regions are randomly selected from the portion of the reference genome that has stable coverage across population samples. The total combined GBA and GBAP1 copy number is then calculated as two more than the copy number of this 10 kb region.

Nonrecombinant-like Variant Calling

Of the known nonrecombinant-like variants, some are in unique (nonhomologous) regions of GBA with high mapping quality. Only reads mapping to GBA are used for calling variant in nonhomologous regions. The other variants occur in homologous regions of GBA/GBAP1 where reads mapping to either GBA or GBAP1 are used for variant calling.

For each variant, reads containing the variant allele and the nonvariant alleles are counted. A binomial model that incorporates the sequencing error rate is then used to determine the most likely variant allele copy number (0 for nonvariant).

For a list of the supported nonrecombinant-like variants, refer to the targeted/gba/target_variants_*.tsv files located in the resources directory of the DRAGEN install location.

Haplotype Phasing in the Exon 9-11 Region

A collection of 10 differentiating sites in the exon 9-11 region of GBA are used to detect the GBA and GBAP1 haplotypes present in the sample. An iterative phasing algorithm is used to build up haplotypes that are supported by the read data. The phasing algorithm starts with seed sites which are then iteratively extended to neighboring sites. At each iteration, reads that can be unambiguously assigned to one of the detected partial haplotypes are used to extend the next neighboring site for each partial haplotype. Iteration continues until all sites have been extended. Some haplotypes may have sites that are unresolved (i.e. ambiguous), but these haplotypes can still participate in GBA -> GBAP1 breakpoint detection.

Nonallelic Homologous Recombination Variant Calling

If any of the 10 differentiating sites in exon 9-11 indicate that there is no wild type GBA allele copies, then the sample is called as homozygous variant and the recombinant-like variant that best matches the depth calls at the 10 sites is reported.

When the sample is not homozygous variant, the phased haplotypes are used to detect heterozygous variants. The detected haplotypes are compared against a set of 7 known recombinant-like variants: A495P, L483P, D448H, c.1263del, RecNciI, RecTL, c.1263del+RecTL). Whenever a detected haplotype has a GBA->GBAP1 or GBAP1->GBA transition that is consistent with one of these 7 known recombinant-like variants, the transition is considered as a candidate breakpoint for calling that recombinant-like variant. Reads containing phasing information for the two sites flanking each candidate breakpoint are used for variant calling. When the read data supports the hypothesis that the sample contains at least one copy of a candidate breakpoint , the associated haplotype is a recombinant haplotype candidate. Recombinant haplotype candidates are sorted by likelihood and the number of variant sites. If no wild type haplotype was detected, DRAGEN reports any detected homozygous recombinant haplotype, or up to two different recombinant haplotypes (i.e. compound het) if detected. If any wild type haplotype was found, DRAGEN reports a maximum of one recombinant haplotype. When no recombinant haplotypes are detected two wild type haplotypes are reported.

The caller can detect the following recombinant variant haplotypes: A495P, L483P, D448H, 1263del, RecNciI, RecTL, and c.1263del+RecTL. Note: RecNciI, RecTL, and c.1263del+RecTL maye be deletion-like recombinant variants. A deletion-like recombinant variant haplotype (as opposed to a gene conversion-like recombinant variant haplotype) is defined as a haplotype with one or fewer switch sites (transitions from a GBAP1 allele to a GBA allele).

The table below shows the HGVS identifiers associated with each recombinant variant haplotype.

Recombinant variant haplotypeHGVS identifiers

A495P

NM_000157.4:c.1483G>C

L483P

NM_000157.4:c.1448T>C

D448H

NM_000157.4:c.1342G>C

c.1263del

NM_000157.4:c.1265_1319del

RecNciI

NM_000157.4:c.1483G>C, NM_000157.4:c.1448T>C

RecTL

NM_000157.4:c.1483G>C, NM_000157.4:c.1448T>C, NM_000157.4:c.1342G>C

c.1263del+RecTL

NM_000157.4:c.1483G>C, NM_000157.4:c.1448T>C, NM_000157.4:c.1342G>C, NM_000157.4:c.1265_1319del

GBA Caller Output File

The GBA Caller generates its output in the targeted caller output file <output-file-prefix>.targeted.json that also contains calls from other targets (see Targeted JSON File).

Fields in JSONExplanationType and Possible Values

totalCopyNumber

Total copy number of all GBA and GBAP1 genes including hybrids

nonnegative integer

deletionBreakpointInGene

null (i.e. unknown) if totalCopyNumber > 3

true, false, null

true if CN <= 3 and a deletion-like recombinant variant haplotype is detected

false if CN <=3 and no deletion-like recombinant variant is detected

recombinantHaplotypes

List of detected haplotypes arising from nonallelic homologous recombination variant calling

Array of two strings. Each string consists of all associated allele IDs (if any) within the haplotype. Consecutive IDs in the same haplotype are separated by a '+'.

variants

List of single site, nonrecombinant-like variants (i.e. not arising from nonallelic homologous recombination). An empty list if no variants are detected.

Array of nonrecombinant-like variants.

Each nonrecombinant-like variant reported in the variants array will have the fields below.

Fields in JSONExplanationType and Possible Values

alleleId

HGVS identifier of the variant allele

string

alleleCopyNumber

Copy number of the allele in the called genotype

nonnegative integer

genotypeQuality

Phred-scaled quality for the called genotype

nonnegative integer

filter

Filter for the called genotype

string. "PASS" when not filtered

Recombinant-like and nonrecombinant-like variants are reported in VCF format. See Targeted VCF File for details about how these variants are reported in VCF.

Output File Example

An example of the GBA caller content in the <output-file-prefix>.targeted.json output file is shown below.

{
  "gba": {
    "totalCopyNumber": 4,
    "deletionBreakpointInGene": null,
    "recombinantHaplotypes": [
      "L483P",
      ""
    ],
    "variants": []
  }
}

Last updated