DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Overview
  • Star allele definition resources for hg38
  • Star allele definition resources for hg19/GRCh37
  • Functionality
  • Input files and command line examples
  • Command line with gVCF input
  • Command line with VCF input
  • Command line with BAM input
  • Command line with FASTQ input
  • Output files

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4
  3. DRAGEN DNA Pipeline

Star Allele Caller

PreviousIndel Re-aligner (Beta)NextHigh Coverage Analysis

Last updated 11 months ago

Was this helpful?

Overview

The Star Allele Caller identifies the genotypes and metabolism status of the following PGx genes that are included in or have : CACNA1S, CFTR, CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, IFNL3, RYR1, NUDT15, SLCO1B1, TPMT, UGT1A1, VKORC1, DPYD, G6PD, MT-RNR1, BCHE, ABCG2, NAT2, F5 and UGT2B17. It finds optimal genotypes for the above genes, based on star allele definitions from resources listed below. It calls metabolism status based on a PharmCAT resource file that provides mappings between genotypes and phenotypes. The file is . The primary support for the Star Allele Caller is for human reference hg38 for which it supports the above mentioned genes. Additionally, it also supports the following genes on references hg19 and GRCh37 : CACNA1S, CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, IFNL3, NUDT15, SLCO1B1, VKORC1, DPYD, ABCG2, F5.

Star allele definition resources for hg38

For genes CACNA1S, CFTR, CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, IFNL3, RYR1, NUDT15, SLCO1B1, TPMT, UGT1A1, VKORC1, DPYD, G6PD, MT-RNR1, ABCG2 the allele definitions are sourced from PharmGKB which are found . For BCHE and NAT2, the alleles are sourced from paper and website, respectively. For UGT2B17, the star alleles are defined . Note that since BCHE does not have defined star alleles, the Star Allele Caller checks if a sample is positive for any of the variants that are reported in the paper.

Star allele definition resources for hg19/GRCh37

For genes CYP2C19, CYP2C9, CYP3A4, CYP3A5, CYP4F2, NUDT15, SLCO1B1, DPYD, the definitions are sourced from PharmVAR and can be found . For the remaining hg19/GRCh37 genes, i.e., ABCG2, CACNA1S, IFNL3, F5 and VKORC1 - the allele definitions have been lifted from their corresponding definitions for hg38 (which are sourced from PharmGKB as noted above).

Functionality

The Star Allele Caller has the following features.

  • It calls star allele genotypes from different types of genomic data like FASTQ, BAM, gVCF, VCF.

  • It provides additional details about the genotype call, including a confidence score.

  • It assumes genotypes for missing positions to be ref - these positions are listed in the output.

  • It assumes filtered genotype calls to be ref - these records are also listed in the output.

  • If multiple optimal diplotypes are satisfied, then it lists them all.

  • It supports different versions of the human reference hg38, hg19 and GRCh37.

  • For the genes UGT2B17 and CYP2C19, the caller analyzes CNV calls to detect star alleles.

Input files and command line examples

The Star Allele Caller can accept as input, different forms of sequence data such as FASTQs files, BAM/CRAM files or gVCF/VCF files.

If small variant VCF/gVCF and CNV-VCF files are used as input, they should meet the following specifications.

  • Must be aligned to the same human reference that is passed through the -r option.

  • Variants should follow a parsimonious left aligned variant representation format.

  • Complex variants - for example, representing closely located, independent variants, in a single record - are NOT supported.

Note that VCF/gVCF files can also be substituted with, a compressed GZ file (i.e. <file_name>.vcf.gz or <file_name>.gvcf.gz).

For running the caller, the human reference needs to be always passed as a command line option. The Star Allele Caller detects the reference version (i.e., hg19, GRCh37 or hg38) and accordingly reads in the correct allele definitions.

The Star allele caller can be enabled in parallel with other components as part of a WGS germline analysis workflow using the option --enable-pgx (see DRAGEN Recipe - Germline WGS)

Command line with gVCF input

In the simplest case, the caller takes DRAGEN gVCF and DRAGEN CNV-VCF files as input. The following is an example of the command line for the basic use case.

dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
--star-allele-gvcf /staging/test/data/NA12878.gvcf \
--star-allele-cnv-vcf /staging/test/data/NA12878.cnv.vcf.gz \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-star-allele true

Command line with VCF input

Contrary to a variant-only VCF file, a DRAGEN gVCF file contains the genotypes for all positions in a genome. Although the gVCF format is the preferred format for the caller, it can also accept a standard variant-only VCF file as input. The command line for this case will be the same as above, with the VCF file passed instead of a gVCF file. Also, the CNV-VCF file is optional - in this case the Star Allele Caller will not call star alleles that are detected through CNV analysis. An example of this use case, with only a variant only VCF file as input, is as follows.

dragen \
-r /staging/human/reference/hg38_alt_aware+cnv+hla+rna_v2/DRAGEN/${HASH_TABLE_VERSION} \
--star-allele-gvcf /staging/test/data/NA12878.vcf \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-star-allele true

Command line with BAM input

For running the Star Allele Caller from a BAM input, the variant caller also needs to be enabled. Optionally, the CNV caller should also be preferably enabled for analyzing CNV star alleles. An example of the command line for this use case is as follows.

dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
--bam-input /staging/test/data/NA12878.bam \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--enable-map-align false \
--enable-star-allele true \
--enable-variant-caller true \
--vc-emit-ref-confidence gvcf \
--enable-cnv true \ 
--cnv-enable-self-normalization true

Note that the Star Allele Caller supports force genotyping option of the variant caller (set by --vc-forcegt-vcf) but other variant caller options, such as combining phased variants (set using --vc-combine-phased-variants-distance), is NOT supported at this time.

Command line with FASTQ input

If a FASTQ file is used as input, additional options, --RGID and --RGSM need to be set in the command line. An example of the command line for this use case as follows.

dragen \
-r /staging/human/reference/hg38_alt_aware+cnv+hla+rna_v2/DRAGEN/${HASH_TABLE_VERSION} \
-1 /scratch/NA11829.fq1.gz \
-2 /scratch/NA11829.fq2.gz \
--RGID DRAGEN_RGID \
--RGSM DRAGEN_RGSM \
--enable-map-align true \
--output-directory /staging/test/output \
--output-file-prefix NA11829 \
--enable-star-allele true \
--enable-variant-caller true \
--vc-emit-ref-confidence gvcf \
--enable-cnv true \ 
--cnv-enable-self-normalization true

Output files

Following completion of the DRAGEN Star Allele Caller run, the following output files are produced.

  1. When the Star Allele Caller is run with small variant calling, or directly from genome VCF input, then the main output file, <prefix>.targeted.json contains the complete and detailed results for all genes. This is an example output for one gene DPYD and for one sample NA19374.

{
  "genomeBuild": "hg38",
  "softwareVersion": "dragen v4.4.0-52-g09190b26",
  "sampleId": "HG00236",
  "phenotypeDatabaseSources": [
    "PharmCAT Phenotypes Version: Snapshot-2022.09.15"
  ],
  "starAlleleDatabaseSources": [
    "PharmGKB Database Version: Snapshot-2022.01.01",
    "PharmGKB Database Version: Snapshot-2022.03.01",
    "UGT Nomenclature Committee Version: Snapshot-01.01.2023",
    "Zhu et al. 2020, PMID: 33061533"
  ],
  "locusAnnotations": [
    {
      "gene": "CYP3A5",
      "geneId": "HGNC:2638",
      "starAlleleDatabaseSource": "PharmGKB Database Version: Snapshot-2022.01.01",
      "genotype": "*3/*3",
      "genotypeQuality": 43,
      "phenotypeDatabaseAnnotation": "Poor Metabolizer",
      "supportingVariants": [
        {
          "alleleId": "*3",
          "chrom": "chr7",
          "pos": 99672916,
          "ref": "T",
          "alt": "C,<NON_REF>",
          "gt": "1/1",
          "quality": 43
        }
      ],
      "variantStarAllelesFound": "*3",
      "missingVariantSites": []
    }
    {
      "gene": "F5",
      "geneId": "HGNC:3542",
      "starAlleleDatabaseSource": "PharmGKB Database Version: Snapshot-2022.01.01",
      "genotype": "rs6025reference(C)/rs6025reference(C)",
      "genotypeQuality": 0,
      "phenotypeDatabaseAnnotation": null,
      "supportingVariants": [],
      "variantStarAllelesFound": "",
      "missingVariantSites": [
        {
          "id": "169549811:C:T",
          "alleleIds": "rs6025variant(T)"
        }
      ]
    },

The fields in the json file are as follows.

  • "genomeBuild": Reference version being used

  • "softwareVersion": Version of DRAGEN being run

  • "sampleId": Sample name

  • "phenotypeDatabaseSources": Resources used for calling metabolism status (phenotype)

  • "starAlleleDatabaseSources": Resources used for identifying star alleles (genotype)

  • "locusAnnotations": List of star allele caller results, one for each gene

  • "gene": Gene name

  • "geneId": HGNC or Ensembl id of the gene that is static

  • "starAlleleDatabaseSource": Resource for the star allele definitions file

  • "genotype": The detected star allele diplotype (or haplotype for haploid gene)

  • "genotypeQuality": Phred scaled quality score for the genotype

  • "phenotypeDatabaseAnnotation": Metabolism status corresponding to the genotype called

  • "supportingVariants": List of star alleles that are satisfied by found variants. The id field denotes the name of the star allele. Each non-ref star allele has a list of supportingVariants which displays the variant details (same as from the small variant vcf file. The quality field denotes the gq field from the vcf record)

  • "missingVariantSites": List of relevant gene sites for which vcf records are missing or filtered

  • "variantStarAllelesFound": List of star allele haplotypes that are satisfied by the found variants

Each Star allele genotype contains one or two haplotypes (a haplotype for chrM gene MT-RNR1 and chrX gene G6PD for male samples, and a diplotype for all other genes) separated by a slash (e.g. *1/*2). Each haplotype is a pre-defined star allele and the definitions can be found under the allele definitions URL. Note that there may be some variance to star allele definitions and notations based on the resource and when it was last updated. When the Star Allele Caller cannot identify an optimal genotype for a gene, a no-call (./. or .) is made. In certain cases, more than one genotype is optimally satisfied, in that case all satisfied genotypes are listed, separated by a semi-colon (e.g. *1/*2;*3/*4).

  1. Tsv and json files (<prefix>.star_allele.tsv and <prefix>.star_allele.json, respectively) are produced when the Star Allele Caller is run stand-alone from a gvcf or vcf file or if the option --targeted-enable-legacy-output is set. The json file has the same format as <prefix>.targeted.json (shown above) while the tsv file contains summarized star allele calls for each gene. This is an example for one gene from the tsv output. The fields are gene name and genotype.

UGT1A1  *36/*80+*37
FDA's PGx recommendations
CPIC Level A designation
here
here
this
this
here
here