# HLA Typing

## HLA Typing

DRAGEN includes a dedicated genotyper for genotyping the Human Leukocyte Antigen (HLA) genes. For WGS data, DRAGEN is capable of calling HLA genes at full resolution (four-field resolution; see *Nomenclature for factors of the HLA system*¹). For WES, TSO and other panels DRAGEN reports up to three-field resolution HLA calls. DRAGEN HLA caller can detect and report variants in the coding regions of HLA genes and, for WGS samples, in the non-coding and UTR regions as well. DRAGEN HLA caller supports 41 HLA and HLA-related genes, for the list of supported genes see [Appendix](#list-of-supported-hla-genes).

To enable HLA typing, set the `--enable-hla` flag to `true`. For TSO500-solid or TSO500-liquid runs, HLA typing should be enabled through the following batch options: `--tso500-solid-hla=true` and `--tso500-liquid-hla=true` respectively. For WES or other exome based panels, set the `--hla-exome` flag to `true`, this is by default set to `false`. **NOTE: The TSO500 panel covers only the Class I HLA-A, -B, and -C genes and hence HLA calls are limited to these genes only.**

### HLA Workflow

The HLA Caller primarily executes the following steps after the initial DRAGEN map-align has completed:

1. Extract reads mapped to the HLA regions. The human reference genome version is auto-detected during this step. The human reference builds hg19, hs37d5, and GRCh38 are fully supported. CHM13 build is enabled but not supported.
2. Align the extracted HLA reads to a reference set of HLA alleles, obtained from IMGT (v3.61.0), using the DRAGEN map-align processor.
3. Filter out HLA-specific alignments with sub-maximal alignment scores, and estimate best alleles for each gene using an Expectation-Maximization (EM) based algorithm.
4. Select the most likely genotypes for each HLA gene based on the posterior probabilities obtained from the EM algorithm. These alleles are output to the file `*.hla_intermediate.tsv`.
5. Perform variant calling on the alleles reported by the EM algorithm. If variants are detected, then find the best match to an IMGT reference allele supporting the variants. The final HLA alleles found are output to the file `*.hla.tsv`. The variants are output to the file `*.hla_variants.tsv`.

### Reference Requirement for HLA

The reference directory that is supplied at command-line with `--ref-dir` must contain `anchored_hla`, a specific subdirectory with HLA-specific reference files. **NOTE: The default DRAGEN reference directories already contain the recommended `anchored_hla` subdirectory.**

### Building the HLA-Specific Reference Subdirectory

An HLA-specific reference subdirectory can be built by executing

```
dragen \
--build-hash-table true \
--ht-build-hla-hashtable=true \
--output-directory={REF-DIR}
```

This command will create `anchored_hla` as a subdirectory of the target `{REF-DIR}` supplied as an argument to `--output-directory` as above.

The HLA-specific reference subdirectory can be built at the same time as the primary reference construction. An example command-line for this mode is

```
dragen \
--build-hash-table true \
--ht-build-hla-hashtable=true \
--output-directory={REF-DIR} \ 
--ht-reference {PATH-TO}primary_reference.fasta
```

### HLA Resource Files

The HLA resource files are located under `<INSTALL_PATH>/resources/hla/` directory. The following HLA resource files are packaged with DRAGEN:

1. HLA resource fasta file, `HLA_resource.v4.fasta.gz`. This file is used by default when building the HLA-specific hash-table as above, see [Building the HLA-Specific Reference Subdirectory](#building-the-hla-specific-reference-subdirectory).
2. HLA priors file, `HLA_AF.v4.tsv.gz`. This file is used to initialize the allele prior probabilities for the HLA EM algorithm.
3. HLA resource gff file, `HLA_resource.v4.gff.gz`. This file is used by the HLA caller to annotate the HLA variants.

#### Using Custom HLA Reference Files

**NOTE: Using custom HLA reference files to generate the HLA-specific reference subdirectory `anchored_hla` is not recommended, as accuracy cannot be guaranteed.**

An HLA allele reference FASTA file can be used as input to the hash-table building option `--ht-hla-reference`.

Custom input FASTA files (which can be zipped or unzipped) must contain only HLA allele sequences, and all allele names must adhere to the HLA star-allele nomenclature¹, where the first character of each allele name indicates the HLA locus, e.g. A\*02:01:01:01. Allele names extracted from such a custom input file start at the first character of the allele name (to be preceded by character '>') and end at the last character of the name or until the first delimiter character '-' is reached.

The following is an illustration of a valid HLA reference input file to option `--ht-hla-reference`:

```
>A*01:01:01:30
TCCCCAGACGCCGAGGATGGCCGTCATGGCGCC...
>A*01:01:01:47
TCCCCAGACGCCGAGGATGGCCGTCATGGCGCC...
>A*01:01:01:76
TCCCATTGGGTGTCGGGTTTCCAGAGAAGCCAA...
>A*01:01:01:91
TCCCCAGACGCCGAGGATGGCCGTCATGGCGCC...
...
```

Custom HLA reference files might require customized memory allocation, which can be specified with an argument to the command-line option `--ht-hla-ext-table-alloc`.

**NOTE: Use the --hla-gff-file option to specify the HLA annotation corresponding to the custom HLA sequences provided with the --ht-hla-reference option.**

The HLA caller relies on the HLA annotation file to annotate the HLA variants. If any HLA reference sequence is not present in the HLA GFF file, HLA calling will not proceed and a warning will be reported to the user.

### HLA Caller Pipeline Options

* hla-exome: Set this option to true (i.e. `--hla-exome=true`) when input data is exome panel based e.g. WES or TSO500 panel.

Note: this HLA component replaces prior workflows. See the appropriate guide for the DRAGEN software version being used in order to determine valid parameters.

#### Map-Align DRAGEN Requirement for HLA

The HLA Caller requires the DRAGEN mapper-aligner to be enabled (enabled via option `--enable-map-align=true`, or through TSO500-batch options).

### HLA Output Files

The HLA Caller generates a tab-delimited output file, where each row is a gene and columns contain information regarding that gene. The final genotype output file is `<prefix>.hla.tsv`, and it is located in the user-specified output directory. In tumor-only mode the output is stored to `<prefix>.hla.tumor.tsv` file. In tumor-normal mode, two output genotype files are generated from tumor and normal samples: `<prefix>.hla.tumor.tsv` and `<prefix>.hla.tsv`.

The intermediate EM algorithm genotype calls are output to `<prefix>.hla_intermediate.tsv` files. In tumor-normal mode, the files are named `<prefix>.hla_intermediate.tumor.tsv` and `<prefix>.hla_intermediate.tsv` for tumor and normal samples respectively.

The variants detected on HLA alleles selected by the EM algorithm are output to `<prefix>.hla_variants.tsv`. In tumor-normal mode, the files are named `<prefix>.hla_variants.tumor.tsv` and `<prefix>.hla_variants.tsv` for tumor and normal samples respectively.

An evidence BAM file containing reads supporting the reported HLA alleles (both intermediate and final outputs) is generated as `<prefix>.hla_evidence.bam`.

Following is an example output file produced by DRAGEN HLA typing (showing only 5 genes):

| gene | num\_alleles | allele\_1  | reads\_supporting\_allele\_1 | EM\_posterior\_allele\_1 | allele\_2  | reads\_supporting\_allele\_2 | EM\_posterior\_allele\_2 | Notes                      |
| ---- | ------------ | ---------- | ---------------------------- | ------------------------ | ---------- | ---------------------------- | ------------------------ | -------------------------- |
| A    | 2            | A\*01:01   | 334                          | 0.513178                 | A\*02:01   | 318                          | 0.486822                 | NA                         |
| B    | 2            | B\*35:41   | 294                          | 0.509434                 | B\*37:01   | 284                          | 0.490566                 | NA                         |
| C    | 2            | C\*06:02   | 359                          | 0.509091                 | C\*04:01   | 349                          | 0.490909                 | NA                         |
| DMA  | 2            | DMA\*01:01 | 55                           | 0.816092                 | DMA\*01:02 | 35                           | 0.183908                 | Allele2:LowSupportingReads |
| DMB  | 1            | DMB\*01:01 | 1159                         | 0.998608                 | NA         | NA                           | NA                       | NA                         |

The columns are explained below:

* gene - Lists the gene name
* num\_alleles - Number of alleles found for a given gene, this could be 0 (no result), 1 (homozygous call), or 2 (heterozygous call)
* allele\_1 - the two field resolution allele found with highest abundance. **NOTE: Some genes have only one field resolution alleles in the reference database and may be reported as the best match**
* reads\_supporting\_allele\_1 - The number of reads supporting allele\_1
* EM\_posterior\_allele\_1 - The posterior probability or abundance estimate reported by the EM algorithm for allele\_1
* allele\_2 - the two field resolution allele found with second highest abundance. If allele\_1 has abundance of 85% or higher a homozygous call is made and allele\_2 is reported to be `NA`. **NOTE: Some genes have only one field resolution alleles in the reference database and may be reported as the best match**
* reads\_supporting\_allele\_2 - The number of reads supporting allele\_2 or NA if a homozygous call is made
* EM\_posterior\_allele\_2 - The posterior probability or abundance estimate reported by the EM algorithm for allele\_2 or NA if a homozygous call is made
* Notes - This column is used to display annotations or QC warnings such as LowSupportingReads when supporting reads are below 50. If NA, no annotations or warnings apply

The HLA Caller generates an additional metrics file:

* `<prefix>.hla_metrics.csv`—Contains the number of reads supporting the EM solution alleles (individual reads may support multiple alleles), and the total number of HLA reads analyzed.

Internal checks for sufficient coverage at each HLA locus will trigger a warning message when fewer than 50 reads support any given allele call, or when fewer than 300 HLA reads are detected overall. In both settings, an allele call will still be attempted, but the results may be unreliable.

**NOTE: The HLA TSV output format has been updated with DRAGEN v4.4 release. Please see the appropriate guide for the DRAGEN software version being used for information on older versions. To report HLA output in older format, users may use the option --hla-enable-legacy-output-format=true. However, this feature is marked to be deprecated in future and users are suggested to adopt the new HLA reporting format.**

HLA results are also reported in the targeted caller output file, `<output-file-prefix>.targeted.json`. The targeted caller output file reports combined results from all the targeted callers into a single [JSON file](https://help.dragen.illumina.com/product-guides/dragen-v4.5/targeted-caller#targeted-json-file). Following is an example of HLA caller output in the targeted caller JSON file (showing only 5 genes):

```
"hla": {
    "calls": [
      {
        "gene": "HLA-A",
        "genotype": "*01:01/*02:01"
      },
      {
        "gene": "HLA-B",
        "genotype": "*35:41/*37:01"
      },
      {
        "gene": "HLA-C",
        "genotype": "*06:02/*04:01"
      },
      {
        "gene": "HLA-DMA",
        "genotype": "*01:01/*01:02"
      },
      {
        "gene": "HLA-DMB",
        "genotype": "*01:01/*01:01"
      },
```

### Known Limitations

* Map-align must be enabled for HLA (see [Map-Align DRAGEN Requirement for HLA](#map-align-dragen-requirement-for-hla)).
* No HLA genotype will be returned with single-end DNA read inputs.

### Examples

The HLA Caller accepts standard input files in FASTQ or BAM format.

The following example command line uses FASTQ file inputs.

```
dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--ref-dir={reference_directory} \
--RGID={read_group_ID} \
--RGSM={read_group_sample} \
-1 {fq1} \
-2 {fq2} \
```

The following example command line uses BAM file inputs (with map-align enabled).

```
dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--bam-input={bam} \
--ref-dir={reference_directory} \
```

The following example command line uses tumor-normal paired file inputs from FASTQ.

```
dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--ref-dir={reference_directory} \
--tumor-fastq1={tumor_fq1} \
--tumor-fastq2={tumor_fq2} \
--RGID-tumor={tumor_group_ID} \
--RGSM-tumor={tumor_group_sample} \
-1 {normal_fq1} \
-2 {normal_fq2} \
--RGID={normal_group_ID} \
--RGSM={normal_group_sample}
```

The following example command line uses tumor-normal paired file inputs from WES BAM.

```
dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--ref-dir={reference_directory} \
--bam-input={normal_bam} \
--tumor-bam-input={tumor_bam} \
--hla-exome=true
```

The following example command line activates HLA typing in a TSO500-solid run from FASTQ input. A TSO500-compatible reference\_directory is one which uses the same reference genome as in TSO i.e. hg19.

```
dragen \
--tso500-solid-umi=true \
--tso500-solid-hla=true \
--fastq-file1={tumor_fq1} \
--fastq-file2={tumor_fq2} \
--RGID={read_group_ID} \
--RGSM={read_group_sample} \
--ref-dir={TSO500-compatible reference_directory} \
--output-directory={output_directory} \
--output-file-prefix={prefix} 
```

The following example command line activates HLA typing in a TSO500-liquid run from FASTQ input. A TSO500-compatible reference\_directory is one which uses the same reference genome as in TSO i.e. hg19.

```
dragen \
--tso500-liquid=true \
--tso500-liquid-hla=true \
--fastq-file1={tumor_fq1} \
--fastq-file2={tumor_fq2} \
--RGID={read_group_ID} \
--RGSM={read_group_sample} \
--ref-dir={TSO500-compatible reference_directory} \
--output-directory={output_directory} \
--output-file-prefix={prefix} 
```

The following example command line performs HLA typing on TSO500 BAM input. A TSO500-compatible reference\_directory is one which uses the same reference genome as in TSO i.e. hg19.

```
dragen \
--enable-map-align=true \
--max-base-quality=63 \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--tso500-solid-hla=true \
--tumor-bam-input={tso_tumor_bam} \
--ref-dir={TSO500-compatible reference_directory} \
```

¹Marsh SG, et al. Nomenclature for factors of the HLA system, 2010. Tissue Antigens. 2010 75:291-455.

## Appendix

### List of supported HLA genes

| Gene |             Gene Type            | Notes             | Covered by TSO 500 |
| :--: | :------------------------------: | ----------------- | ------------------ |
|   A  |              Class I             | Supported in v4.3 | Yes                |
|   B  |              Class I             | Supported in v4.3 | Yes                |
|   C  |              Class I             | Supported in v4.3 | Yes                |
|  DMA |             Class II             | New in v4.4       | No                 |
|  DMB |             Class II             | New in v4.4       | No                 |
|  DOA |             Class II             | New in v4.4       | No                 |
|  DOB |             Class II             | New in v4.4       | No                 |
| DPA1 |             Class II             | New in v4.4       | No                 |
| DPA2 |             Class II             | New in v4.4       | No                 |
| DPB1 |             Class II             | New in v4.4       | No                 |
| DPB2 |             Class II             | New in v4.4       | No                 |
| DQA1 |             Class II             | Supported in v4.3 | No                 |
| DQA2 |             Class II             | New in v4.4       | No                 |
| DQB1 |             Class II             | Supported in v4.3 | No                 |
| DQB2 |             Class II             | New in v4.4       | No                 |
|  DRA |             Class II             | New in v4.4       | No                 |
| DRB1 |             Class II             | Supported in v4.3 | No                 |
| DRB3 |             Class II             | New in v4.4       | No                 |
| DRB4 |             Class II             | New in v4.4       | No                 |
| DRB5 |             Class II             | New in v4.4       | No                 |
|   E  |           Class I minor          | New in v4.4       | No                 |
|   F  |           Class I minor          | New in v4.4       | No                 |
|   G  |           Class I minor          | New in v4.4       | No                 |
|   H  |        Class I pseudogene        | New in v4.4       | No                 |
|  HFE |  Class I like - haemochromatosis | New in v4.4       | No                 |
|   J  |        Class I pseudogene        | New in v4.4       | No                 |
|   K  |        Class I pseudogene        | New in v4.4       | No                 |
|   L  |        Class I pseudogene        | New in v4.4       | No                 |
| MICA |        HLA class I related       | New in v4.4       | No                 |
| MICB |        HLA class I related       | New in v4.4       | No                 |
|   N  |        Class I pseudogene        | New in v4.4       | No                 |
|   P  |        Class I pseudogene        | New in v4.4       | No                 |
|   R  |        Class I pseudogene        | New in v4.4       | No                 |
|   S  |        Class I pseudogene        | New in v4.4       | No                 |
|   T  |        Class I pseudogene        | New in v4.4       | No                 |
| TAP1 | HLA related - Antigen processing | New in v4.4       | No                 |
| TAP2 | HLA related - Antigen processing | New in v4.4       | No                 |
|   U  |        Class I pseudogene        | New in v4.4       | No                 |
|   V  |        Class I pseudogene        | New in v4.4       | No                 |
|   W  |        Class I pseudogene        | New in v4.4       | No                 |
|   Y  |        Class I pseudogene        | New in v4.4       | No                 |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/hla-typing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
