HBA Caller

The HBA Caller is capable of genotyping the HBA1 and HBA2 genes from whole-genome sequencing (WGS) and whole-exome sequencing (WES) data. Due to high sequence similarity between the genes, a specialized caller is necessary to resolve the possible genotypes of the pair of genes. We consider regions surrounding the HBA1 and HBA2 sites to resolve the possible HBA1 and HBA2 genotypes.

The HBA Caller performs the following steps:

  1. Determines total copy number from read depth of the regions surrounding the HBA1 and HBA2 sites.

  2. Determines HBA genotypes based on the copy number of the regions surrounding the HBA1 and HBA2 sites.

  3. Calls small variants in the HBA1 and HBA2 regions based on the region copy number derived from the genotype along with allele counts from read information.

For a comprehensive evaluation of the HBA caller, see HBA targeted caller blog post.

For information about enabling the HBA caller see Targeted Caller.

Total Copy Number of the regions surrounding the HBA1 and HBA2 sites

The first step of HBA calling is to determine the copy number of the regions surrounding the HBA1 and HBA2 sites. Reads aligned to the regions are counted. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in preselected regions across the genome. Finally, a Gaussian Mixture Model (GMM) is used to obtain the integer region copy number from the region normalized counts.

Genotyping

The genotyping step attempts to identify the two likely haplotypes described in the following table, where "a" stands for a functional copy of either HBA1 or HBA2, "-" stands for a nonfunctional/missing copy of either HBA1 or HBA2, while "3.7", "4.2", and "20.5" describe the recombinant event that likely caused the deletion/duplication of the functional HBA copy.

Genotype

aaa3.7/aa

aaa4.2/aa

aaa20.5/aa

aaa20.5/aaa3.7

aaa20.5/aaa4.2

aaa20.5/aaa20.5

aa/aa

-a3.7/aa

-a4.2/aa

-a20.5/aa

--/aaa3.7

--/aaa4.2

--/aaa20.5

-a3.7/aaa20.5

-a4.2/aaa20.5

-a20.5/aaa3.7

-a20.5/aaa4.2

-a3.7/-a3.7

-a4.2/-a4.2

-a20.5/-a20.5

-a3.7/-a4.2

-a20.5/-a3.7

-a20.5/-a4.2

--/aa

--/-a3.7

--/-a4.2

--/-a20.5

--/--

The quality of fitting the region copy number calls to one of these genotypes is reported in the FORMAT/TargetedSVModelQual field for each SV VCF record and the FILTER/TargetedSVModelQual is applied to each record when the phred-scaled quality is below the threshold specified in the VCF header.

Small Variant Calling

18 small variants are detected from the read alignments. These variants occur in homologous regions of HBA1 and HBA2 where reads mapping to either HBA1 or HBA2 are used for variant calling.

For each variant, reads containing either the variant allele or the nonvariant allele are counted and a binomial model is used to determine the likelihood for each possible variant allele copy number up to the maximum possible as determined from the HBA1/HBA2 genotyping.

HBA Output File

The HBA Caller generates its output in the targeted caller output file <output-file-prefix>.targeted.json that also contains calls from other targets (see Targeted JSON File).

Fields in JSON
Explanation
Type and Possible Values

totalCopyNumber

Total copy number of HBA1 and HBA2 genes including hybrids

nonnegative integer

genotype

The HBA genotype.

string

genotypeFilter

The HBA genotype filter.

string, [PASS, HBALowGQ, HBALowPValue, No_call]

genotypeQual

The HBA Phred genotype quality.

double

variants

List of detected homology region variants in HBA1/HBA2.

Array of variants

Each variant reported in the variants array will have the fields below.

Fields in JSON
Explanation
Type and Possible Values

alleleId

HGVS identifier of the variant allele

string

alleleCopyNumber

Copy number of the allele in the called genotype

nonnegative integer

genotypeQuality

Phred-scaled quality for the called genotype

nonnegative integer

filter

Filter for the called genotype

string. "PASS" when not filtered

Structural variant and homology region variants are reported in VCF format. See Targeted VCF File for details about how these variants are reported in VCF.

Output File Example

An example of the HBA caller content in the <output-file-prefix>.targeted.json output file is shown below.

  "hba": {
    "totalCopyNumber": 4,
    "genotype": "aa/aa",
    "genotypeQuality": 93,
    "genotypeFilter": "PASS",
    "variants": [
      {
        "alleleId": "NM_000517.6:c.95+1G>A",
        "alleleCopyNumber": 1,
        "genotypeQuality": 26,
        "filter": "PASS"
      }
    ]
  }

Last updated

Was this helpful?