# Rh Caller

The Rh Caller is capable of identifying a common gene conversion between *RHD* and *RHCE* genes from whole-genome sequencing (WGS) data, that is referred to as *RHCE* Exon2 gene conversion. Due to high sequence similarity between the genes, a specialized caller is necessary to resolve the gene conversion between the pair of genes. We consider 798 loci, called differentiating sites, that represents differences between the *RHD* and *RHCE* genes, that are well preserved in the population.

The Rh Caller performs the following steps:

1. Determines total copy number from read depth of the *RHD* and *RHCE* regions.
2. Detect *RHD* -> *RHCE* breakpoints that are consistent with the *RHCE* Exon2 gene conversion.

The Rh Caller requires WGS data aligned to a human reference genome with at least 30x coverage. Reference genome builds must be based on `hg19`, `GRCh37`, or `hg38`.

The Rh Caller is run by default when the small variant caller is enabled, the sample is a not a tumor sample, and the sample is detected as WGS by the Ploidy Estimator.

## Total Combined *RHD* and *RHCE* Copy Number

The first step of Rh calling is to determine the copy number of *RHD* and *RHCE* regions. Reads aligned to the *RHD* and *RHCE* regions are counted according to their support of the differentiating sits. The counts in each region are corrected for GC-bias, and then normalized to a diploid baseline. The GC-bias correction and normalization factors are determined from read counts in 3000 preselected 2 kb regions across the genome. These 3000 normalization regions were randomly selected from the portion of the reference genome having stable coverage across population samples.

## Haplotype Phasing in the Exon 2 Region

A collection of 4 differentiating sites in the exon 2 region of *RHD* and *RHCE* are used to detect the presence of the *RHCE* Exon2 gene conversion in the sample. An iterative phasing algorithm is used to build up haplotypes that are supported by the read data. The phasing algorithm starts with candidate haplotypes formed from all possible bases at the first differentiating site. The haplotypes are then extended at the next differentiating site by considering all reads that can be uniquely assigned to a single candidate haplotype. If these reads support only a single base at the next differentiating site for a given candidate haplotype, then the haplotype is extended with that base. When a candidate haplotype can be extended by both bases at the next differentiating site then both possible extended haplotypes are included in the set of candidate haplotypes, growing the set by 1. Subsequent extension steps are performed at neighboring differentiating sites until all sites have been processed. Some haplotypes may have sites that are unresolved (i.e. ambiguous), but these haplotypes can still participate in *RHD* -> *RHCE* breakpoint detection.

## Recombinant-like Variant Calling

When the phased haplotypes support the *RHCE* Exon2 gene conversion. We visit all the differentiating sites ad report them as variants in the output VCF file with ploidy identified using the copy number estimated from the read depth of the differentiating site.

## Rh Output File

The Rh Caller generates a `<output-file-prefix>.targeted.json` file in the output directory. The output file is a JSON formatted file containing the fields below.

| Fields in JSON | Explanation                             | Type and Possible Values |
| -------------- | --------------------------------------- | ------------------------ |
| sample         | The sample name.                        | string                   |
| dragenVersion  | The version of DRAGEN.                  | string                   |
| rh             | The RH targeted caller specific fields. | dictionary               |

The `rh` fields are defined as below.

| Fields in JSON  | Explanation                                                                   | Type and Possible Values |
| --------------- | ----------------------------------------------------------------------------- | ------------------------ |
| totalCopyNumber | Total RHD/RHCE copy number                                                    | integer                  |
| rhdCopyNumber   | *RHD* gene copy number                                                        | integer                  |
| rhceCopyNumber  | *RHCE* copy number                                                            | integer                  |
| variants        | List of known variants from recombination that were detected in *RHD*/*RHCE*. | list of variants         |

For the `variants` the fields are defined as below.

| Fields in JSON       | Explanation                               | Type and Possible Values                                         |
| -------------------- | ----------------------------------------- | ---------------------------------------------------------------- |
| hgvs                 | HGVS identifier of the variant            | string, "NC\_000001.11g.25405596\_25409676con25283766\_25287797" |
| qual                 | Phred QUAL score of the variant           | double                                                           |
| altCopyNumber        | Copy number of the ALT variant            | double                                                           |
| altCopyNumberQuality | Phred QUAL copy number of the ALT variant | double                                                           |

Examples of the Rh Caller content in the output json file are shown below.

```
{
  "rh": {
    "totalCopyNumber": 4,
    "rhdCopyNumber": 2,
    "rhceCopyNumber": 2,
    "variants": [
      {
        "hgvs": "NC_000001.11g.25405596_25409676con25283766_25287797",
        "qual": 63.42360372675952,
        "altCopyNumber": 2,
        "altCopyNumberQuality": 74.27989942583051
      }
    ]
  }
}
```

```
{
  "rh": {
    "totalCopyNumber": 4,
    "rhdCopyNumber": 2,
    "rhceCopyNumber": 2,
    "variants": [
      {
        "hgvs": "NC_000001.11g.25405596_25409676con25283766_25287797",
        "qual": 0.0,
        "altCopyNumber": 0,
        "altCopyNumberQuality": 0.026368888668874203
      }
    ]
  }
}
```

The Rh Caller also generates a `<output-file-prefix>.targeted.vcf[.gz]` file in the output directory. The output file is a `VCFv4.2` formatted file, possibly compressed.

The following are example output files:

```
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002
chr1 25405596 . G A 128.19 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 1/1:128
chr1 25405674 . A T 134.41 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 1/1:134
...
chr1 25409655 . G C 133.16 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 1/1:133
chr1 25409676 . G A 104.46 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 1/1:104
```

```
##fileformat=VCFv4.2
...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00110
chr1 25405596 . G A 44.96 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 0/0:45
chr1 25405674 . A T 35.71 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 0/0:36
...
chr1 25409655 . G C 32.72 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 0/0:33
chr1 25409676 . G A 27.11 PASS EVENT=NC_000001.11g.25405596_25409676con25283766_25287797;EVENTTYPE=GENE_CONVERSION GT:GQ 0/0:27
```
