# Targeted Caller

Repetitive regions in the human genome pose a challenge for general variant calling approaches which typically cannot make use of potentially misplaced MAPQ0 reads. Furthermore, high sequence homology of some genes with a pseudogene paralog can lead to a wide variety of common structural variants (SVs) in the population, requiring specialized targeted calling approaches. DRAGEN supports targeted calling for a number of genes/targets as described in subsequent target-specific sections.

The targeted caller can be enabled using the command line option `--enable-targeted=true` or a subset of targets can be enabled by providing a space-separated list of target names. The supported target names for WGS are: `cyp2b6`, `cyp2d6`, `cyp21a2`, `gba`, `hba`, `lpa`, `rh`, and `smn`. For WGS TruPath data, only `lpa`, `hba`, and `smn` will run when the Targeted Caller is enabled, but a custom list of supported targets can be specified on the command line. The supported target names for WES are: `hba` and `smn`. For a list of all supported targeted caller options along with their default values, see [Targeted Caller Options](https://help.dragen.illumina.com/product-guides/command-line-options#targeted-caller-options). The targeted caller produces a `<output-file-prefix>.targeted.json` file containing a summary of the variant caller results for each target. Additional detail of individual variant calls are reported in VCF format in the `<output-file-prefix>.targeted.vcf.gz` output file.

## Input Data

The targeted caller requires WES data or WGS data aligned to a human reference genome. WGS data should be at least 30x coverage as the caller may be less reliable at lower coverage. Human reference genome builds based on `hg19`, `hs37d5` (including `GRCh37`), or `hg38` are supported.

## Configuration files

The targeted caller utilizes several configuration files that are included in the `resources/targeted` directory of the DRAGEN install location. These files include information about the target regions, known variants, and known haplotypes for each target. It is possible to specify additional known variants by modifying these configuration files. Use the following steps to run DRAGEN with a custom set of targeted caller configuration files:

1. Copy the \<dragen\_install\_dir>/resources/targeted directory to a new location
2. Modify the configuration files in the new location as needed
3. Run DRAGEN with the additional command line option to specify the new targeted resources directory: `--targeted-resources-path /path/to/new/resources/targeted`

Note that modification of the targeted caller configuration files can cause unexpected results or errors and should be done with caution.

## Output Files

### Targeted JSON File

The targeted caller generates a `<output-file-prefix>.targeted.json` file in the output directory. The output file is a JSON formatted file containing the fields below.

| Fields in JSON           | Explanation                                                                                                                                          | Type and Possible Values | Present                       |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ | ----------------------------- |
| sampleId                 | The sample name.                                                                                                                                     | string                   | always                        |
| softwareVersion          | The version of DRAGEN.                                                                                                                               | string                   | always                        |
| genomeBuild              | The reference genome build.                                                                                                                          | string                   | always                        |
| phenotypeDatabaseSources | Resources used for calling metabolism status (phenotype).                                                                                            | json array of strings    | CYP2B6 or CYP2D6 is enabled   |
| cyp2b6                   | The CYP2B6 caller fields.                                                                                                                            | dictionary               | CYP2B6 caller is enabled      |
| cyp2d6                   | The CYP2D6 caller fields.                                                                                                                            | dictionary               | CYP2D6 caller is enabled      |
| cyp21a2                  | The CYP21A2 caller fields.                                                                                                                           | dictionary               | CYP21A2 caller is enabled     |
| gba                      | The GBA caller fields.                                                                                                                               | dictionary               | GBA caller is enabled         |
| hba                      | The HBA caller fields.                                                                                                                               | dictionary               | HBA caller is enabled         |
| lpa                      | The LPA caller fields.                                                                                                                               | dictionary               | LPA caller is enabled         |
| rh                       | The RH caller fields.                                                                                                                                | dictionary               | RH caller is enabled          |
| smn                      | The SMN caller fields.                                                                                                                               | dictionary               | SMN caller is enabled         |
| hla                      | The HLA caller fields, see [HLA Typing](https://help.dragen.illumina.com/product-guides/dragen-v4.5/hla-typing#hla-output-files).                    | dictionary               | HLA caller is enabled         |
| locusAnnotations         | The Star Allele caller fields, see [Star Allele Caller](https://help.dragen.illumina.com/product-guides/dragen-v4.5/star-allele-caller#output-files) | dictionary               | Star Allele caller is enabled |

#### JSON content for recombinant variant detection

For cyp21a2 and gba, the fields below are included in the JSON output. Note that in the descriptions, target gene refers to CYP21A2 or GBA1 and nontarget gene refers to their respective pseudogene paralogs CYP21A1P or GBAP1. The `recombinantHaplotypes` field is limited to reporting only two phased haplotypes for the target gene. To see call sets containing all phased haplotypes for both target and nontarget genes, see the `phasedHaplotypes`->`depthMatchedHaplotypes`->`topHaplotypeSets` field.

| Fields in JSON              | Explanation                                                                                                                                                                                                                                                            | Type and Possible Values                                                                                                                                             |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| totalCopyNumber             | Total copy number of target and nontarget genes including hybrids                                                                                                                                                                                                      | nonnegative integer                                                                                                                                                  |
| deletionBreakpointInGene    | <ul><li>null (i.e. unknown) if totalCopyNumber > 3</li><li>true if CN <= 3 and a deletion-like recombinant variant haplotype is detected</li><li>false if CN <=3 and no deletion-like recombinant variant is detected</li></ul>                                        | true, false, null                                                                                                                                                    |
| recombinantHaplotypes       | List of detected haplotypes arising from nonallelic homologous recombination variant calling. Limited to two haplotypes. For all phased haplotypes in both target and nontarget genes, see the `phasedHaplotypes`->`depthMatchedHaplotypes`->`topHaplotypeSets` field. | Array of two strings. Each string consists of all associated allele IDs (if any) within the haplotype. Consecutive IDs in the same haplotype are separated by a '+'. |
| recombinantHaplotypesFilter | The filter status for the recombinant haplotypes call. Filter will be `RecombinantSiteDepthMismatch` if the reported haplotypes are incompatible with depth-based calls at all phasing sites within the haplotypes.                                                    | string (`PASS` or `RecombinantSiteDepthMismatch`)                                                                                                                    |
| phasedHaplotypes            | Summary of all detected phased haplotypes in target and nontarget gene regions                                                                                                                                                                                         | JSON object                                                                                                                                                          |

Note: A deletion-like recombinant variant haplotype (as opposed to a gene conversion-like recombinant variant haplotype) is defined as a haplotype with one or fewer switch sites (transitions from a target gene allele to a nontarget gene allele) after excluding some sites with common gene conversions in the nontarget gene.

The `phasedHaplotypes` json object will have the fields below.

| Fields in JSON         | Explanation                                                                                                                                                                               | Type and Possible Values                                                                                                                                                                    |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| targetAlleleDepths     | Array consisting of the most likely set of target allele copy numbers at each phasing site in the haplotypes.                                                                             | Array of nonnegative integers                                                                                                                                                               |
| targetAlleleDepthsQual | Phred-scaled quality for the set of depth calls in `targetAlleleDepths`                                                                                                                   | nonnegative decimal number                                                                                                                                                                  |
| rawHaplotypes          | The set of possible haplotypes that were phased from the read data.                                                                                                                       | Array of strings. Each character in a string corresponds to a phasing site in the haplotype (`T` for target gene allele, `N` for nontarget gene allele and `.` for unknown/unphased allele) |
| depthMatchedHaplotypes | Summary of sets of haplotypes that are consistent with depth-based calls at each site within the haplotypes. This field will not be present if no consistent set of haplotypes was found. | JSON object                                                                                                                                                                                 |

The `depthMatchedHaplotypes` json object, when present, will have the fields below.

| Fields in JSON           | Explanation                                                                                                                                                                                                                                                                                            | Type and Possible Values      |
| ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ----------------------------- |
| targetAlleleDepths       | Array consisting of the set of target allele copy numbers at each phasing site in the haplotypes. The set of target allele copy numbers are consistent with the haplotypes reported in `topHaplotypeSets`.                                                                                             | Array of nonnegative integers |
| targetAlleleDepthsQual   | Phred-scaled quality for the set of matched depth calls in `targetAlleleDepths`                                                                                                                                                                                                                        | nonnegative decimal number    |
| numMatchingHaplotypeSets | Total number of sets of haplotypes that matched the `targetAlleleDepths`                                                                                                                                                                                                                               | positive integer              |
| topHaplotypeSets         | Array of top matching sets of haplotypes. When available, at least 2 matching sets will be reported. If haplotype frequencies are available, the sets are prioritized by population prior. Any sets that are consistent with the haplotypes reported in the `recombinantHaplotypes` are also reported. | Array of JSON objects         |

Each haplotype set reported in the `topHaplotypeSets` array will have the fields below

| Fields in JSON      | Explanation                                                                                                                               | Type and Possible Values   |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | -------------------------- |
| populationPriorQual | Phred-scaled population prior for this set of haplotypes. Only reported if population prior information is available for the target gene. | nonnegative decimal number |
| haplotypes          | Array of JSON objects summarizing each unique haplotype within the set                                                                    | Array of JSON objects      |

Each haplotype reported in the `haplotypes` array will have the fields below

| Fields in JSON       | Explanation                                                                                                                                                                                                 | Type and Possible Values |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ |
| recombinantAlleleIDs | The relevant identifiers for the variants in the haplotype. Multiple variant identifiers are delimited by `+`. `N/A` when the haplotype cannot be matched to a set of identifiers.                          | string                   |
| haplotype            | The sequence of alleles within the haplotype. Each character corresponds to a phasing site in the haplotype (`T` for target gene allele, `N` for nontarget gene allele and `.` for unknown/unphased allele) | string                   |
| copyNumber           | The number of copies of this haplotype in the set of haplotypes that match the `targetAlleleDepths`                                                                                                         | positive integer         |

#### JSON content for nonrecombinant variant detection

A `variants` field is used to report each nonrecombinant-like variant (i.e. not arising from nonallelic homologous recombination). Each variant will have the fields below.

| Fields in JSON   | Explanation                                      | Type and Possible Values         |
| ---------------- | ------------------------------------------------ | -------------------------------- |
| alleleId         | HGVS identifier of the variant allele            | string                           |
| alleleCopyNumber | Copy number of the allele in the called genotype | nonnegative integer              |
| genotypeQuality  | Phred-scaled quality for the called genotype     | nonnegative integer              |
| filter           | Filter for the called genotype                   | string. "PASS" when not filtered |

Recombinant-like and nonrecombinant-like variants are reported in VCF format. See [Targeted VCF File](#targeted-vcf-file) for details about how these variants are reported in VCF.

### Targeted VCF File

The targeted caller generates a `<output-file-prefix>.targeted.vcf.gz` file in the output directory. The output file is a `VCFv4.2` formatted file. The targets that have VCF output are: cyp2d6, cyp21a2, gba, hba, lpa, rh, and smn.

Small variants, structural variants, and copy number variants are reported in the same VCF file.

The `<output-file-prefix>.targeted.vcf.gz` file includes the following `source` header line:

```
##source=DRAGEN_TARGETED
```

For lpa, rh and smn targets, the `EVENT` and `EVENTTYPE` INFO fields are used to identify the called variants.

The `EVENT` and `EVENTTYPE` INFO fields are formally introduced in `VCFv4.4` to enable the representation of complex rearrangements. This is achieved using the `EVENT` field to group all the related VCF records together, and the `EVENTTYPE` to classify the event. The corresponding header lines are the following.

```
##INFO=<ID=EVENT,Number=A,Type=String,Description="Event name">
##INFO=<ID=EVENTTYPE,Number=A,Type=String,Description="Type of associated event">
```

However, the use of `EVENT` is not limited to complex rearrangements and can be used to associate nonsymbolic alleles, for example in cases of variant position ambiguity in high homology regions.

Since the `EVENTTYPE` values are implementation-defined, custom `EVENTTYPE` header lines are included to describe each `EVENTTYPE`.

```
##EVENTTYPE=<ID=GENE_CONVERSION,Description="Gene conversion event">
##EVENTTYPE=<ID=VARIANT_IN_HOMOLOGY_REGION,Description="Variant in homology region">
##EVENTTYPE=<ID=VNTR,Description="Variable number tandem repeat">
```

For cyp2d6, cyp21a2, gba, and hba targets, the `ALLELE_ID` INFO field is used to identify the called variant alleles.

```
##INFO=<ID=ALLELE_ID,Number=R,Type=String,Description="Identifier for each allele">
```

The missing value `.` is used when no identifier is available (e.g. a wild type allele) or applicable (e.g. allele index 0 for a structural variant record).

Additionally, the 'TargetedCaller' INFO field is used to indicate which targeted caller the current VCF record is generated from

```
##INFO=<ID=TargetedCaller,Number=1,Type=String,Description="Targeted Caller Name">
```

#### Nonrecombinant-like Variants In High Homology Regions

In the case of target variants in a high homology region, each variant is reported ambiguously at all corresponding homologous positions (i.e. in both the pseudogene and in the target gene). Additional analysis for these variants can be performed if absolute certainty that these variants are located in the target gene (e.g. in gba or cyp21a2) is required.

For lpa and smn the ploidy of the called genotype (`FORMAT/GT` field) corresponds to the combined copy number from all the homologous positions. For cyp21a2, gba and hba, this "joint" genotype from all the homologous positions is instead reported in a separate `FORMAT/JGT` field which is then collapsed into a diploid genotype and reported in the `FORMAT/GT` field. The following fields are reported for "joint" calls:

```
##INFO=<ID=JIDS,Number=.,Type=String,Description="IDs (from ID column) of calls associated with a joint genotype call in duplicated regions">
##FORMAT=<ID=JGT,Number=1,Type=String,Description="Joint genotype in duplicated regions">
##FORMAT=<ID=JGQ,Number=1,Type=Integer,Description="Quality of joint genotype in duplicated regions">
##FORMAT=<ID=JPL,Number=.,Type=Integer,Description="Normalized, Phred-scaled likelihoods for joint genotypes as defined in the VCF specification">
##FORMAT=<ID=JQL,Number=1,Type=Float,Description="Phred-scaled likelihood for homozygous reference joint genotype call">
##FORMAT=<ID=JVQL,Number=1,Type=Float,Description="Phred-scaled likelihood for nonvariant joint genotype call where overlapping deletion (*) ALT alleles are not considered to be variant.">
##FORMAT=<ID=JDP,Number=1,Type=Integer,Description="Total depth from all alleles in duplicated regions">
##FORMAT=<ID=JAD,Number=R,Type=Integer,Description="Total read depth for each allele in duplicated regions">
##FORMAT=<ID=JAF,Number=A,Type=Float,Description="Allele frequency for each alt allele in duplicated regions">
```

Note that the `FORMAT/GQ` and `FORMAT/JGQ` fields contain the unconditional genotype quality, unlike the VCF spec where `FORMAT/GQ` is defined as the genotype quality conditioned on the site being variant.

![High Homology Region Variant Example](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-e43ba60ee72d294f505e348e92d78d494d8508ad%2Fhigh-homology-region-variants.png?alt=media)

In the depicted example there are two genes A and B that include a high homology region. The usual process to call variants in this regions is to make a joint pileup of the reads aligning in both genes A and B and call the variants using a model with a ploidy proportional to the total copy number of the regions. This generates divergent possible genotypes that are equally likely since the variant cannot be confidently placed in either gene A or gene B. For lpa and smn the variant would be reported as follows:

```
chr1 100 . A T . TargetedRepeatConflict EVENT=GeneA-B:50A>T;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT 0/0/0/1
chr1 200 . A T . TargetedRepeatConflict EVENT=GeneA-B:50A>T;EVENTTYPE=VARIANT_IN_HOMOLOGY_REGION GT 0/0/0/1
```

Given the unconventional ploidy of the `FORMAT/GT` field used in this representation, a `TargetedRepeatConflict` filter is applied to these records. The header line for the filter is the following.

```
##FILTER=<ID=TargetedRepeatConflict,Description="Set if call is in a targeted repeat region that cannot be placed">
```

For cyp21a2, gba and hba, a conventional diploid `FORMAT/GT` is reported and so no `TargetedRepeatConflict` filter is applied. Due to the ambiguity in placing target variants in high homology regions, the corresponding `QUAL` and `FORMAT/GQ` fields can be much lower than conventional small variant calls (i.e. Phred 3 for a single variant allele copy across two homologous diploid positions). Therefore, instead of filtering on `QUAL` and `FORMAT/GQ` for these records, the records are filtered based on the `FORMAT/JVQL` and `FORMAT/JGQ` fields:

```
##FILTER=<ID=TargetedLowJGQ,Description="Set if call has JGQ < 3.">
##FILTER=<ID=TargetedLowJVQL,Description="Set if call has JVQL < 3.00.">
```

Since the wild type alleles at homologous positions may be different from each other or different from the reference alleles, an additional filter is applied when only wild type alleles are detected across the homologous positions. This avoids making ambiguous variant calls when no target variant of interest is detected.

```
##FILTER=<ID=TargetedWT,Description="Region-ambiguous targeted call with GT containing only wild type alleles, ignoring any overlapping deletions.">
```

#### Rh Gene Conversion Events

In the case of an identified gene conversion even in rh, a small variant is reported at each differentiating site in the acceptor region.

![Gene Conversion Example](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-6d3761fadb3a2a51842ce18c1a458e744160a52b%2Fgene-conversion.png?alt=media)

In the depicted example there are two genes A and B and gene A is the acceptor of a gene conversion from gene B (green box in the figure). Gene conversion are identified by observing variations in copy number at differentiating sites (blue and pink bars in the figure) in consecutive regions. Copy number variations between regions define the breakends of the gene conversion. An equivalent VCF representation for gene conversion would be using CNV and SV entries with breakends corresponding to the donor/acceptor regions, however, only the small variant representation is currently supported.

```
chr1 121 .   A T    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121
...
chr1 280 .   G A    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121
```

In the case of a detected gene conversion event, there may be differentiating sites with a genotype that is inconsistent with that gene conversion event. In these cases the `RecombinantConflict` filter is applied. The `RecombinantConflict` is defined by the following header line.

```
##FILTER=<ID=RecombinantConflict,Description="Set if call has a copy number that conflicts with a recombinant variant">
```

In the example, the resulting representation is as follows.

```
chr1 121 .   A T    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121
...
chr1 144 .   C T    . RecombinantConflict EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 1|1:121
chr1 153 .   A G    . RecombinantConflict EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT 0/0
...
chr1 280 .   G A    . PASS EVENT=GC_AB;EVENTTYPE=GENE_CONVERSION; GT:PS 0|1:121
```

#### Nonallelic Homologous Recombination

For cyp21a2 and gba, nonallelic homologous recombination can result in gene deletion or duplication in the case of reciprocal recombination or gene conversion in the case of nonreciprocal recombination. Both gene deletion and gene conversion can introduce loss-of-function variants and in both cases the targeted caller will report these variants in the target gene. In the case of gene deletion, the differentiating sites at the nontarget (i.e. pseudogene) positions will contain the overlapping deletion allele `*` while the differentiating sites in the target will contain any variant alleles. Although an equivalent VCF representation would be to simply report the deletion with a single structural variant VCF record, reporting small variant VCF records in the target gene allows for identification of the specific mutations that may occur in a gene transcript and matches well with annotation using HGVS nomenclature. Similarly, for gene conversions, variants are reported at differentiating sites in the target gene, rather than as pairs of structural variant breakends.

Calls at differentiating sites within the recombinant variant calling region will contain the same "joint" fields as are reported for nonrecombinant-like variants in high homology regions (see [Nonrecombinant-like Variants In High Homology Regions](#nonrecombinant-like-variants-in-high-homology-regions)). However, the collapsed diploid `FORMAT/GT` will be based on any detected recombination events. Because detected recombinant variants are placed in the target gene, these records are filtered differently than the ambiguously placed, nonrecombinant-like variants in high homology regions. The `INFO/Recombinant` flag is added to calls derived from recombinant variant calling to distinguish them from nonrecombinant-like variant calls in high homology regions. The `FORMAT/VQL` field is used to apply the `RecombinantLowVQL` filter for low quality recombinant variants and the `RecombinantREF` filter is applied when the collapsed diploid `FORMAT/GT` contains only reference alleles.

```
##FORMAT=<ID=VQL,Number=1,Type=Float,Description="Phred-scaled likelihood for nonvariant genotype call where overlapping deletion (*) ALT alleles are not considered to be variant.">
##FILTER=<ID=RecombinantLowVQL,Description="Region-ambiguous targeted call at recombinant site with VQL below 0.50.">
##FILTER=<ID=RecombinantREF,Description="Region-ambiguous targeted call at recombinant site with GT containing only reference alleles, ignoring any overlapping deletions.">
```

#### Overlapping Structural Variant Representation

The use of `GT=0` for symbolic structural variant alleles is formally disambiguated in `VCFv4.4`, specifying that *"GT=0 indicates the absence of any of the ALT symbolic structural variants defined in the record"*. With this convention we can report compound overlapping heterozygous structural variants.

![Overlapping Variants Representation Example](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-010a80b182e07a93942d96e23dc48334979be703%2Foverlapping-variants-representation.png?alt=media)

In the hba genotype depicted above, two overlapping SVs can be represented as follows:

```
chr16	170262	.	G	<DEL>,<DUP>	.	.	END=174517;IMPRECISE;SVLEN=4255,4255;SVCLAIM=DJ,DJ;ALLELE_ID=.,-a4.2,aaa4.2	GT	0/2
chr16	173301	.	A	<DEL>,<DUP>	.	.	END=177104;IMPRECISE;SVLEN=3804,3804;SVCLAIM=DJ,DJ;ALLELE_ID=.,-a3.7,aaa3.7	GT	0/1
```

The relevant header lines for the VCF records above are:

```
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=A,Type=Integer,Description="Length of structural variant">
##INFO=<ID=SVCLAIM,Number=A,Type=String,Description="Claim made by the structural variant call. Valid values are D, J, DJ for abundance, adjacency and both respectively.">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
```

#### Variable Number Tandem Repeat Representation

![VNTR Example](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-2f52b64a69c3e05051df1e1ceef1868be94bba49%2Fvntr.png?alt=media)

In the depicted example there is a Variable Number Tandem Repeat (VNTR) region composed of three repeat units in the reference. The `CN` INFO field is used to report the allele copy number, the `CN` FORMAT field to is used report the region total copy number given by the sum of the allele copy numbers, and the `REPCN` FORMAT field is used to report the repeat unit copy number equal to the allele copy number multiplied by the number of repeat units in the reference.

This VNTR can be represented as follows:

```
chr1 100 . A <DUP>,<DUP> . . END=400;EVENT=A;EVENTTYPE=VNTR;SVCLAIM=D;SVLEN=300;CN=2.6,4.3   GT:CN:REPCN 1|2:6.9:8|13
```

The `REPCN` and `CN` header lines are:

```
##FORMAT=<ID=REPCN,Number=1,Type=String,Description="Number of repeat units spanned by the allele">
##INFO=<ID=CN,Number=A,Type=Float,Description="Copy number of CNV / breakpoint">
##FORMAT=<ID=CN,Number=1,Type=Float,Description="Estimated copy number">
```

#### Additional Filters

For lpa, rh and smn, the `TargetedLowQual` filter is applied if the `QUAL` of a target variant is less than `3.00`.

```
##FILTER=<ID=TargetedLowQual,Description="Set if call has QUAL < 3.00">
```

Similarly, for cyp21a2 and gba the `TargetedLowVQL` filter is applied if the `VQL` of a target variant in low-homology region is less than `3.00`.

```
##FORMAT=<ID=VQL,Number=1,Type=Float,Description="Phred-scaled likelihood for nonvariant genotype call where overlapping deletion (*) ALT alleles are not considered to be variant.">
##FILTER=<ID=TargetedLowVQL,Description="Set if call has VQL < 3.00.">
```

The `TargetedLowGQ` filter is applied if the targeted variant has `GQ` smaller than `3`.

```
##FILTER=<ID=TargetedLowGQ,Description="Set if call has GQ < 3 and JGQ is not present.">
```

### Merging Targeted Calls In The `hard-filtered` Files

When the small variant caller is enabled, the targeted small variant VCF calls can be merged into the `<output-file-prefix>.hard-filtered.vcf.gz` and `<output-file-prefix>.hard-filtered.gvcf.gz` files, briefly `hard-filtered` files. The `--targeted-merge-vc` command line option can be used to control which targets will have their small variant VCF records merged into the `hard-filtered` files. For example, `--targeted-merge-vc rh` will enable merging of the calls from the `rh` caller into the `hard-filtered` files and `--targeted-merge-vc rh hba` will enable merging of the calls from the `rh` and `hba` targets into the `hard-filtered` files. The `true` value will merge all calls from all supported targets into the `hard-filtered` files, while the `false` value will merge no calls into the `hard-filtered` files.

The targeted calls merged into the `hard-filtered` files are marked with a `TARGETED` INFO flag.

When enabled, targeted small variants are merged into the `hard-filtered` files regardless of any regions that may be provided using the `--vc-target-bed` option.

#### Merging Strategy

The merging strategy for targeted small variant calls is to prioritize the targeted calls over small variant calls from the germline small variant caller. When a germline small variant call overlaps a targeted caller call, then the small variant call is filtered with a `TargetedConflict` filter if any of the following holds:

* The targeted caller call is `PASS`.
* The small variant call and targeted caller call have incompatible genotypes and the targeted caller call is not filtered with the `TargetedLowGQ` filter.

The strategy is summarized in the following examples.

1. The `TARGETED` call is `PASS`.

```
chr1 100 . A	C	. TargetedConflict 	.			GT 0/1
chr1 100 . A	C	. PASS 				TARGETED 	GT 1/1
```

2. The `TARGETED` call and the small variant call are not overlapping

```
chr1 110 . T	TCA	. PASS 				. 			GT 0/1
chr1 111 . G	A	. PASS 				TARGETED 	GT 0/1
```

3. The `TARGETED` call is filtered with `VARIANT_IN_HOMOLOGY_REGION` and has a discordant variant representation with the overlapping small variant call.

```
chr1 120 . ATTC A	. TargetedConflict	.			GT 0/1
chr1 121 . T	A	. TargetedLowQual	TARGETED 	GT 0/1
chr1 125 . TCAC T	. TargetedLowQual	TARGETED	GT 0/1
chr1 126 . C	G	. TargetedConflict	.			GT 0/1
```

4. The `TARGETED` call is filtered with `TargetedLowQual` and has a discordant genotype with the overlapping small variant call.

```
chr1 130 . C	G	. TargetedConflict	.			GT 0/1
chr1 130 . C	G	. TargetedLowQual	TARGETED 	GT 1/1
```

5. The `TARGETED` call is filtered with `TargetedLowGQ` and has a discordant genotype with the overlapping small variant call.

```
chr1 140 . AC	A	. PASS			.			GT:GQ 0/1:5
chr1 140 . A	T	. TargetedLowGQ	TARGETED 	GT:GQ 1/1:2
```

## Exome calling using in-run PON

Targeted calling from WES data is supported for hba and smn. It uses an in-run panel of normals (PON) for coverage normalization of the various target regions by automatically identifying copy-neutral samples from a single sequencing run. All samples in the panel are expected to be from the same sequencing run and library prep batch as the case samples being analyzed. Samples must be prepared using the Illumina CS/PGx Custom Enrichment Research Panel. If targeted calling is enabled on WES data without a PON then targeted calling is skipped and no targeted calling output files will be generated. The first step in targeted calling from WES data is to generate exome counts files for each of the samples in the PON. A minimum of 30 samples is required in the PON and the PON must be sufficiently diverse such that for a given target region, a large subset of samples is copy-neutral. For example, a PON where all samples are positive for alpha thalassemia (HBA1/2 deletion) would not be sufficiently diverse for accurately calling variants in HBA1/2. Similarly, a PON consisting of a large pedigree of related samples would not be sufficiently diverse. No more than \~6% of the samples in the PON should be related to any case sample being analyzed; a PON of 50 samples containing a quad would be acceptable since it would contain 3 samples related to a proband (Mother/Father/Sibling) or \~6% of the samples in the PON. If the samples in the sequencing run are sufficiently diverse, then it is recommended that the PON consist of as many samples from the sequencing run as possible, but can be limited to 96 samples without significantly impacting the accuracy of coverage normalization.

The table below summarizes the available options and high-level steps for running the Targeted Caller using an in-run PON. CNV and Targeted Caller require separate PON files, but the intermediate counts files can be generated in the same DRAGEN command line invocation. For additional details click on the link for each option.

| Analysis option                                                                                                                                                                                                     | Steps                                                                                                                                                                                                                                                          |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [BSSH planned sequencing run (from BCLs)](https://help.dragen.illumina.com/product-guides/dragen-apps/dragen-germline-enrichment-from-bcls-bssh-app#dragen-germline-enrichment-from-bcls-through-run-planning-tool) | <ol><li>Create run using the Run Planning tool in BSSH</li><li>Start planned run in Control Software on instrument</li></ol>                                                                                                                                   |
| [BSSH existing sequencing run (from BCLs)](https://help.dragen.illumina.com/product-guides/dragen-apps/dragen-germline-enrichment-from-bcls-bssh-app#dragen-germline-enrichment-from-bcls-from-existing-run)        | <ol><li>Run DRAGEN Germline Enrichment from BCLs App</li></ol>                                                                                                                                                                                                 |
| [ICA from FASTQs/BAMs/CRAMs](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-apps/dragen-germline-enrichment-ica-app)                                                                            | <ol><li>Run DRAGEN Germline Enrichment App</li></ol>                                                                                                                                                                                                           |
| [BSSH from FASTQs/BAMs/CRAMs](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-apps/dragen-enrichment-bssh-app)                                                                                   | <ol><li>Run DRAGEN Enrichment App</li></ol>                                                                                                                                                                                                                    |
| [Local or AMI](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-recipes/dna-germline-wes)                                                                                                         | <ol><li>BCL to FASTQ conversion</li><li>Generate CNV target counts and Targeted Caller exome counts for each PON sample</li><li>Generate CNV combined counts PON file</li><li>Generate Targeted Caller PON file</li><li>Perform case sample analyses</li></ol> |

### Exome counts generation

Exome counts file generation can be enabled using the command line option `--targeted-generate-exome-counts=true`. A `<output-file-prefix>.targeted.exome.counts.json.gz` file will be generated in the output directory. Note that the `--enable-targeted` option is not required, but can be used to specify a subset of targets.

### Exome PON generation

An exome PON file can be generated, using the command line option `--targeted-pon-counts-list` to pass a text file containing a list of exome counts files, one for each sample in the panel. A `<output-file-prefix>.targeted.pon.json.gz` file will be generated in the output directory. Note that this is a stand-alone independent dragen run that cannot be combined with other dragen components. A read input (bam/cram/fastq) file is not used.

### Exome case sample analysis

Exome targeted calling on a case sample is performed by passing in a PON file and a systematic noise file using the command line options `--targeted-pon` and `--targeted-systematic-noise`, respectively. Note that the PON file should be for the same batch as the case sample. A systematic noise file and corresponding pre-built pangenome reference can be downloaded from the [DRAGEN Software Support Site page](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html). A json file, `<output-file-prefix>.targeted.json` and a vcf file, `<output-file-prefix>.targeted.vcf.gz` will be generated in the output directory with the calls for the enabled targets. For WES mode, an additional field `ponQualityFilter`, is added to the JSON output for each enabled target. It denotes the quality of the PON and the confidence of the resulting calls. If the case sample does not correlate well with the PON, the `ponQualityFilter` gets set to `LowPonCorrelation`, signaling that the calls are considered to have low confidence. Note that the `--enable-targeted` option is not required, but can be used to specify a subset of targets.

## Command-Line Examples

The Targeted Caller can be enabled in parallel with other components as part of a human WGS germline analysis workflow (see [DRAGEN Recipe - Germline WGS](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-recipes/dna-germline-wgs)).

The Targeted Caller can be enabled in parallel with other components as part of a human WES germline analysis workflow (see [DRAGEN Recipe - Germline WES](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-recipes/dna-germline-wes)).

### FASTQ Input Example

The following command-line example runs the targeted caller from FASTQ input:

```
dragen \
	-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
	--fastq-file1 /staging/test/data/NA12878_R1.fastq \
	--fastq-file2 /staging/test/data/NA12878_R2.fastq \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--RGID DRAGEN_RGID \
	--RGSM NA12878 \
	--enable-targeted=true
```

### Prealigned BAM Input Example

The following command-line example runs cyp21a2 only using BAM input without realignment:

```
dragen \
	-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
	--bam-input /staging/test/data/NA12878.bam \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--enable-map-align=false \
	--enable-targeted=cyp21a2
```

### Exome counts generation from prealigned BAM Input Example

```
dragen \
	-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
	--bam-input /staging/test/data/NA12878.bam \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--enable-map-align=false \
	--targeted-generate-exome-counts=true
```

### Exome PON generation from exome counts files from a single sequencing run

```
dragen \
--output-directory /staging/test/output \
--output-file-prefix run1 \
--targeted-pon-counts-list run1_exome_counts_list.txt
```

### Exome case sample analysis from prealigned BAM Input Example

```
dragen \
-r /staging/human/reference/hg38_alt_aware/DRAGEN/${HASH_TABLE_VERSION} \
--bam-input /staging/test/data/NA12878.bam \
--output-directory /staging/test/output \
--output-file-prefix NA12878_dragen \
--targeted-pon run1.targeted.pon.json.gz \
--targeted-systematic-noise dragen4.4.targeted.systematic_noise.json.gz \
```
