Star Allele Caller
Overview
The Star Allele Caller identifies the genotypes and metabolism status of the following PGx genes that are included in FDA's PGx recommendations or have CPIC Level A designation : CACNA1S, CFTR, CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, IFNL3, RYR1, NUDT15, SLCO1B1, TPMT, UGT1A1, VKORC1, DPYD, G6PD, MT-RNR1, BCHE, ABCG2, NAT2, F5 and UGT2B17. It finds optimal genotypes for the above genes, based on star allele definitions from resources listed below. It calls metabolism status based on a PharmCAT resource file that provides mappings between genotypes and phenotypes. The file is here. The Star Allele Caller is supported for human references hg38, hg19 and GRCh37.
Star allele definition resources for hg38
For genes CACNA1S, CFTR, CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, IFNL3, RYR1, NUDT15, SLCO1B1, TPMT, UGT1A1, VKORC1, DPYD, G6PD, MT-RNR1, ABCG2 the allele definitions are sourced from PharmGKB (Snapshot 2025.07.17) which are found here. For BCHE and NAT2, the alleles are sourced from this paper and this website (Snapshot-2025.07.17), respectively. For UGT2B17, the star alleles are defined here (Snapshot 2025.07.17). Note that since BCHE does not have defined star alleles, the Star Allele Caller checks if a sample is positive for any of the variants that are reported in the paper.
Star allele definition resources for hg19/GRCh37
For genes CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, NUDT15, SLCO1B1, DPYD, the definitions are sourced from PharmVAR and can be found here (PharmVar Version: 5.2.11 for DPYD, and Version 6.2.14 for rest of the above genes). For the remaining Star allele caller genes, the allele definitions have been lifted from their corresponding definitions for hg38 (which are sourced from PharmGKB as noted above).
Functionality
The Star Allele Caller has the following features.
It calls star allele genotypes from different types of genomic data like FASTQ, BAM, gVCF, VCF.
It provides additional details about the genotype call, including a confidence score.
It assumes genotypes for missing positions to be ref - these positions are listed in the output.
It assumes filtered genotype calls to be ref - these records are also listed in the output.
If multiple optimal diplotypes are satisfied, then it lists them all.
It supports different versions of the human reference hg38, hg19 and GRCh37.
For the genes UGT2B17 and CYP2C19, the caller analyzes CNV calls to detect star alleles.
Input files and command line examples
The Star Allele Caller can accept as input, different forms of sequence data such as FASTQs files, BAM/CRAM files or gVCF/VCF files.
If small variant VCF/gVCF and CNV-VCF files are used as input, they should meet the following specifications.
Must be aligned to the same human reference that is passed through the -r option.
Variants should follow a parsimonious left aligned variant representation format.
Complex variants - for example, representing closely located, independent variants, in a single record - are NOT supported.
Note that VCF/gVCF files can also be substituted with, a compressed GZ file (i.e. <file_name>.vcf.gz or <file_name>.gvcf.gz).
For running the caller, the human reference needs to be always passed as a command line option. The Star Allele Caller detects the reference version (i.e., hg19, GRCh37 or hg38) and accordingly reads in the correct allele definitions.
Configuration files
The Star Allele Caller uses configuration files that are included in the resources/star_allele directory of the DRAGEN install location. These files include the star allele definitions at resources/star_allele/star_allele_definitions_hg19.json and resources/star_allele/star_allele_definitions_hg38.json. It is possible to modify these files to customize the functionality of the caller, such as defining custom star alleles or modifying the names of the star alleles. Use the following steps to run DRAGEN with a custom set of star allele caller configuration files:
Copy the <dragen_install_dir>/resources/star_allele directory to a new location
Modify the configuration files in the new location as needed
Run DRAGEN with the additional command line option to specify the new resources directory:
--star_allele-resources-path /path/to/new/resources/star_allele
Note the underscore in --star_allele-resources-path. Modification of the star allele caller configuration files can cause unexpected results or errors and should be done with caution.
Recommended command line
From a bam/cram/fastq input, the Star allele caller can be enabled in parallel with other components as part of a WGS germline analysis workflow using the option --enable-pgx ( see DRAGEN Recipe - Germline WGS). This is the simplest and recommended way to run the Star allele caller.
Additionally, the Star allele caller can also be enabled separately using the following command line options.
Command line with gVCF input
In the simplest case, the caller takes DRAGEN gVCF and DRAGEN CNV-VCF files as input. The following is an example of the command line for the basic use case.
Command line with VCF input
Contrary to a variant-only VCF file, a DRAGEN gVCF file contains the genotypes for all positions in a genome. Although the gVCF format is the preferred format for the caller, it can also accept a standard variant-only VCF file as input. The command line for this case will be the same as above, with the VCF file passed instead of a gVCF file. Also, the CNV-VCF file is optional - in this case the Star Allele Caller will not call star alleles that are detected through CNV analysis. An example of this use case, with only a variant only VCF file as input, is as follows.
Command line with BAM input
For running the Star Allele Caller from a BAM input, the variant caller also needs to be enabled. Optionally, the CNV caller should also be preferably enabled for analyzing CNV star alleles. An example of the command line for this use case is as follows.
Note that the Star Allele Caller supports force genotyping option of the variant caller (set by --vc-forcegt-vcf) but other variant caller options, such as combining phased variants (set using --vc-combine-phased-variants-distance), is NOT supported at this time.
Command line with FASTQ input
If a FASTQ file is used as input, additional options, --RGID and --RGSM need to be set in the command line. An example of the command line for this use case as follows.
Output files
Following completion of the DRAGEN Star Allele Caller run, the following output files are produced.
When the Star Allele Caller is run with small variant calling, or directly from genome VCF input, then the main output file,
<prefix>.targeted.jsoncontains the complete and detailed results for all genes. This is an example output for one geneDPYDand for one sampleNA19374.
The fields in the json file are as follows.
"genomeBuild": Reference version being used
"softwareVersion": Version of DRAGEN being run
"sampleId": Sample name
"phenotypeDatabaseSources": Resources used for calling metabolism status (phenotype)
"starAlleleDatabaseSources": Resources used for identifying star alleles (genotype)
"locusAnnotations": List of star allele caller results, one for each gene
"gene": Gene name
"geneId": HGNC or Ensembl id of the gene that is static
"starAlleleDatabaseSource": Resource for the star allele definitions file
"genotype": The detected star allele diplotype (or haplotype for haploid gene)
"genotypeQuality": Phred scaled quality score for the genotype
"phenotypeDatabaseAnnotation": Metabolism status corresponding to the genotype called
"supportingVariants": List of variants corresponding to the star-allele genotype. The id field denotes the name of the star allele. Each non-ref star allele has a list of supportingVariants which displays the variant details (same as from the small variant vcf file. The quality field denotes the gq field from the vcf record)
"missingVariantSites": List of relevant gene sites for which vcf records are missing or filtered
"variantStarAllelesFound": List of star allele haplotypes that are satisfied by the found variants
"variantStarAllelesChecked": List of all star alleles checked by the caller
The fields in "supportingVariants" are as follows.
"alleleId": The star allele associated with this variant
"chrom": Chromosome
"pos": Position
"ref": Reference allele
"alt": Alt alleles (comma separated)
"gt": Genotype call for the variant
"quality": Qual for the variant
Note that the fields other than the alleleId corresponds with the vcf record call for the variant
The fields in the missingVariantSites are as follows.
"id": an id for a missing or filtered variant site. For a missing variant site the format is
CHROM:REF:ALTFor a filtered variant site the format isCHROM:POS:REF:ALT:GT:GQ:DP:FILTERThese fields corresponds with the vcf record call for the filtered variant call. TheALTandFILTERfields may have comma separated alt alleles and filters."alleleIds": star alleles that are associated with the missed or filtered variant sites.
Each Star allele genotype contains one or two haplotypes (a haplotype for chrM gene MT-RNR1 and chrX gene G6PD for male samples, and a diplotype for all other genes) separated by a slash (e.g. *1/*2). Each haplotype is a pre-defined star allele and the definitions are from resources listed in the field "starAlleleDatabaseSources". Note that these resources are periodically updated by the agencies maintaining them, and may receive updates that are not yet covered by a specific version of our caller. When the Star Allele Caller cannot identify an optimal genotype for a gene, a no-call (./. or .) is made. In certain cases, more than one genotype is optimally satisfied, in that case all satisfied genotypes are listed, separated by a semi-colon (e.g. *1/*2;*3/*4).
Tsv and json files (
<prefix>.star_allele.tsvand<prefix>.star_allele.json, respectively) are produced when the Star Allele Caller is run stand-alone from a gvcf or vcf file or if the option--targeted-enable-legacy-outputis set. The json file has the same format as<prefix>.targeted.json(shown above) while the tsv file contains summarized star allele calls for each gene. This is an example for one gene from the tsv output. The fields are gene name and genotype.
Last updated
Was this helpful?