HLA Typing

DRAGEN includes a dedicated human leukocyte antigen (HLA) genotyper for calling HLA genes at two-field resolution (a.k.a. four-digit resolution). At this resolution, DRAGEN HLA genotyper is able to discern and report HLA alleles based on their protein sequences (see Nomenclature for factors of the HLA system¹). For the list of supported HLA genes see Appendix.

HLA typing is enabled by setting the --enable-hla flag to true. For TSO500-solid or TSO500-liquid runs, HLA typing should be enabled instead through the following batch options: --tso500-solid-hla=true and --tso500-liquid-hla=true respectively.NOTE: The TSO500 panel covers only the Class I HLA-A, -B, and -C genes and hence HLA calls are limited to these genes only.

HLA Workflow

The HLA Caller primarily executes the following four steps after the initial DRAGEN map-align has completed:

Extract reads mapped to the HLA regions. The human reference version is auto-detected during this step. The human reference builds hg19, hs37d5, and GRCh38 are fully supported, CHM13 build is enabled but not supported.
Align the extracted HLA reads to a reference set of HLA alleles, obtained from IMGT (v3.56.0), using the DRAGEN map-align processor.
Filter out HLA-specific alignments with sub-maximal alignment scores, and optimize the read distribution using Expectation-Maximization (EM).
Select the most likely genotype for each HLA locus based on the posterior probabilities obtained from the EM algorithm. A homozygous call is reported if the abundance (posterior probability) of an allele is found to be 85% or higher.

Reference Requirement for HLA

The reference directory that is supplied at command-line with --ref-dir must contain anchored_hla, a specific subdirectory with HLA-specific reference files.NOTE: The deafult DRAGEN reference directories already contains the recommended anchored_hla subdirectory.

Building the HLA-Specific Reference Subdirectory

An HLA-specific reference subdirectory can be built by executing

dragen \
--build-hash-table true \
--ht-build-hla-hashtable=true \
--output-directory={REF-DIR}

This command will create anchored_hla as a subdirectory of the target {REF-DIR} supplied as an argument to --output-directory as above.

The HLA-specific reference subdirectory can be built at the same time as the primary reference construction. An example command-line for this mode is

dragen \
--build-hash-table true \
--ht-build-hla-hashtable=true \
--output-directory={REF-DIR} \ 
--ht-reference {PATH-TO}primary_reference.fasta

HLA Resource FASTA

An HLA resource file, HLA_resource.v3.fasta.gz, is packaged with DRAGEN. It is located at <INSTALL_PATH>/resources/hla/HLA_resource.v3.fasta.gz

This file is used by default when building the HLA-specific hash-table as above, see Building the HLA-Specific Reference Subdirectory.

Using Custom HLA Reference Files

An HLA allele reference FASTA file can be used as input to the hash-table building option --ht-hla-reference.

Note: Using custom HLA reference files to generate the HLA-specific reference subdirectory anchored_hla is not recommended, as accuracy cannot be guaranteed.

Custom input FASTA files (which can be zipped or unzipped) must contain only HLA allele sequences, and all allele names must adhere to the HLA star-allele nomenclature¹, where the first character of each allele name indicates the HLA locus, e.g. A*02:01:01:01. Allele names extracted from such a custom input file start at the first character of the allele name (to be preceded by character '>') and end at the last character of the name or until the first delimiter character '-' is reached.

The following is an illustration of a valid HLA reference input file to option --ht-hla-reference:

>A*01:01:01:30-full
TCCCCAGACGCCGAGGATGGCCGTCATGGCGCC...
>A*01:01:01:47-full
TCCCCAGACGCCGAGGATGGCCGTCATGGCGCC...
>A*01:01:01:76-full
TCCCATTGGGTGTCGGGTTTCCAGAGAAGCCAA...
>A*01:01:01:91-full
TCCCCAGACGCCGAGGATGGCCGTCATGGCGCC...
...

Custom HLA reference files might require customized memory allocation, which can be specified with an argument to the command-line option --ht-hla-ext-table-alloc.

HLA Caller Pipeline Options

The HLA component has no additional user-settable command-line options.

Note: this HLA component replaces prior workflows. See the appropriate guide for the DRAGEN software version being used in order to determine valid parameters.

Map-Align DRAGEN Requirement for HLA

The HLA Caller requires the DRAGEN mapper-aligner to be enabled (enabled via option --enable-map-align=true, or through TSO500-batch options).

HLA Output Files

The HLA Caller generates a tab-delimited output file, where each row is a gene and columns contain information regarding that gene. The genotype output file is <prefix>.hla.tsv, and it is located in the user-specified output directory. In tumor-only mode the output is stored to <prefix>.hla.tumor.tsv file. In tumor-normal mode, two output genotype files are generated from tumor and normal samples: <prefix>.hla.tumor.tsv and <prefix>.hla.tsv.

Following is an example output file produced by DRAGEN HLA typing (showing only 5 genes):

gene

num_alleles

allele_1

reads_supporting_allele_1

EM_posterior_allele_1

allele_2

reads_supporting_allele_2

EM_posterior_allele_2

Notes

A*01:01

334

0.513178

A*02:01

318

0.486822

B*35:41

294

0.509434

B*37:01

284

0.490566

C*06:02

359

0.509091

C*04:01

349

0.490909

DMA

DMA*01:01

0.816092

DMA*01:02

0.183908

Allele2:LowSupportingReads

DMB

DMB*01:01

1159

0.998608

The columns are explained below

gene - lists the gene name
num_allele - Number of alleles found for a given gene, this could be 0 (no result), 1 (homozygous call), or 2 (heterozygous call)
allele_1 - the two field resolution allele found with highest abundance. NOTE: Some genes have only one field resolution alleles in the reference database and may be reported as the best match
reads_supporting_allele_1 - The number of reads supporting allele_1
EM_posterior_allele_1 - The posterior probability or abundance estimate reported by the EM algorithm for allele_1
allele_2 - the two field resolution allele found with second highest abundance. If allele_1 has abundance of 85% or higher a homozygous call is made and allele_2 is reported to be NA. NOTE: Some genes have only one field resolution alleles in the reference database and may be reported as the best match
reads_supporting_allele_2 - The number of reads supporting allele_2 or NA if a homozygous call is made
EM_posterior_allele_2 - The posterior probability or abundance estimate reported by the EM algorithm for allele_2 or NA if a homozygous call is made
Notes - This column is used to display annotations or QC warnings such as LowSupportingReads when supporting reads are below 50. If NA, no annotations or warnings apply

The HLA Caller generates an additional metrics file.

<prefix>.hla_metrics.csv—Contains the number of reads supporting each allele result (individual reads may support multiple alleles), and the total number of HLA reads analyzed.

Internal checks for sufficient coverage at each HLA locus will trigger a warning message when fewer than 50 reads support any given allele call, or when fewer than 300 HLA reads are detected overall. In both settings, an allele call will still be attempted, but the results may be unreliable.

NOTE: The HLA TSV output format has been updated with DRAGEN v4.4 release, please see the appropriate guide for the DRAGEN software version being used for information on older versions. To report HLA output in older format users may use the option --hla-enable-legacy-output-format=true. However, this feature is marked to be deprecated in future and users are suggested to adopt the new HLA reporting format.

HLA results are also reported in the targeted caller output file, <output-file-prefix>.targeted.json. The targeted caller output file reports combined results from all the targeted callers into a single JSON file. Following is an example of HLA caller output in the targeted caller JSON file (showing only 5 genes):

"hla": {
    "calls": [
      {
        "gene": "HLA-A",
        "genotype": "*01:01/*02:01"
      },
      {
        "gene": "HLA-B",
        "genotype": "*35:41/*37:01"
      },
      {
        "gene": "HLA-C",
        "genotype": "*06:02/*04:01"
      },
      {
        "gene": "HLA-DMA",
        "genotype": "*01:01/*01:02"
      },
      {
        "gene": "HLA-DMB",
        "genotype": "*01:01/*01:01"
      },

Known Limitations

Map-align must be enabled for HLA (see Map-Align DRAGEN Requirement for HLA). As of DRAGEN v4.4, tumor-normal paired file inputs from BAM are now supported for HLA calling.
No HLA genotype will be returned with single-end DNA read inputs.
By default, DRAGEN only genotypes HLA alleles that have sequence representation in IMGT database and hence included the HLA anchored reference. Novel genotypes discovery is not supported.

Examples

The HLA Caller accepts standard input files in FASTQ or BAM format.

The following example command line uses FASTQ file inputs.

dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--ref-dir={reference_directory} \
--RGID={read_group_ID} \
--RGSM={read_group_sample} \
-1 {fq1} \
-2 {fq2} \

The following example command line uses BAM file inputs (with map-align enabled).

dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--bam-input={bam} \
--ref-dir={reference_directory} \

The following example command line uses tumor-normal paired file inputs from FASTQ.

dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--ref-dir={reference_directory} \
--tumor-fastq1={tumor_fq1} \
--tumor-fastq2={tumor_fq2} \
--RGID-tumor={tumor_group_ID} \
--RGSM-tumor={tumor_group_sample} \ 
-1 {normal_fq1} \
-2 {normal_fq2} \
--RGID={normal_group_ID} \
--RGSM={normal_group_sample} \

The following example command line uses tumor-normal paired file inputs from BAM.

dragen \
--enable-hla=true \
--enable-map-align=true \
--enable-sort=true \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--ref-dir={reference_directory} \
--bam-input={normal_bam} \
--tumor-bam-input={tumor_bam} \

The following example command line activates HLA typing in a TSO500-solid run from FASTQ input. A TSO500-compatible reference_directory is one which uses the same reference genome as in TSO i.e. hg19.

dragen \
--tso500-solid-umi=true \
--tso500-solid-hla=true \
--fastq-file1={tumor_fq1} \
--fastq-file2={tumor_fq2} \
--RGID={read_group_ID} \
--RGSM={read_group_sample} \
--ref-dir={TSO500-compatible reference_directory} \
--output-directory={output_directory} \
--output-file-prefix={prefix}

The following example command line activates HLA typing in a TSO500-liquid run from FASTQ input. A TSO500-compatible reference_directory is one which uses the same reference genome as in TSO i.e. hg19.

dragen \
--tso500-liquid=true \
--tso500-liquid-hla=true \
--fastq-file1={tumor_fq1} \
--fastq-file2={tumor_fq2} \
--RGID={read_group_ID} \
--RGSM={read_group_sample} \
--ref-dir={TSO500-compatible reference_directory} \
--output-directory={output_directory} \
--output-file-prefix={prefix}

The following example command line performs HLA typing on TSO500 BAM input. A TSO500-compatible reference_directory is one which uses the same reference genome as in TSO i.e. hg19.

dragen \
--enable-map-align=true \
--max-base-quality=63 \
--output-directory={output_directory} \
--output-file-prefix={prefix} \
--tso500-solid-hla=true \
--tumor-bam-input={tso_tumor_bam} \
--ref-dir={TSO500-compatible reference_directory} \

¹Marsh SG, et al. Nomenclature for factors of the HLA system, 2010. Tissue Antigens. 2010 75:291-455.

Appendix

List of supported HLA genes

Gene

Gene Type

Notes

Covered by TSO 500

Class I

Supported in v4.3

Yes

Class I

Supported in v4.3

Yes

Class I

Supported in v4.3

Yes

DMA

Class II

New in v4.4

DMB

Class II

New in v4.4

DOA

Class II

New in v4.4

DOB

Class II

New in v4.4

DPA1

Class II

New in v4.4

DPA2

Class II

New in v4.4

DPB1

Class II

New in v4.4

DPB2

Class II

New in v4.4

DQA1

Class II

Supported in v4.3

DQA2

Class II

New in v4.4

DQB1

Class II

Supported in v4.3

DQB2

Class II

New in v4.4

DRA

Class II

New in v4.4

DRB1

Class II

Supported in v4.3

DRB3

Class II

New in v4.4

DRB4

Class II

New in v4.4

DRB5

Class II

New in v4.4

Class I minor

New in v4.4

Class I minor

New in v4.4

Class I minor

New in v4.4

Class I pseudogene

New in v4.4

HFE

Class I like - haemochromatosis

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

MICA

HLA class I related

New in v4.4

MICB

HLA class I related

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

TAP1

HLA related - Antigen processing

New in v4.4

TAP2

HLA related - Antigen processing

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

Class I pseudogene

New in v4.4

PreviousJSON Metrics Reporting NextBiomarkers

Last updated 3 months ago

Was this helpful?