# Force Genotyping

DRAGEN supports force genotyping (ForceGT) for small variant calling. Use `--vc-forcegt-vcf` to specify a VCF file containing variants to force genotype. The input list of small variants can be a \*.vcf or \*.vcf.gz file.

## Supported Modes

* **Germline**: Supported. When using joint genotyping with the `--vc-forcegt-vcf` option, the output joint VCF contains only variants tagged with `FGT`. Without this option, FGT-tagged variants are skipped.
* **Somatic**: Supported in both Tumor-Only (T/O) and Tumor-Normal (T/N) modes.

## Input Requirements

DRAGEN supports only a single ForceGT VCF input file. The input VCF must:

* Be a valid VCF 4.2 file (minimum 8 tab-delimited columns, sorted by contig and position).
* The header must list the same contig names as the reference used for variant calling. All variants must refer to one of these contig names.
* Contain normalized variants (parsimonious and left-aligned).
* Not contain multinucleotide or complex variants (e.g., `AT → C`). These are variants that require more than one substitution / insertion / deletion to go from REF allele to ALT allele and are ignored.
* Not contain deletions longer than 50bp — these are filtered out.
* Duplicate entries (same POS, REF, ALT) are ignored.

**Example of normalization:**

```
# Wrong (not parsimonious):
chrX  153592402  GC  GCG

# Correct (parsimonious):
chrX  153592403  C   CG
```

A nonnormalized variant will cause undefined behavour in DRAGEN.

## Output Behavior

The output VCF contains both regular variant calls and ForceGT variants. Each variant is tagged in the INFO field to indicate its origin:

| Scenario                                 | INFO Tag  |
| ---------------------------------------- | --------- |
| Regular call only (not in ForceGT input) | *(none)*  |
| ForceGT only (not called by pipeline)    | `FGT`     |
| Both regular and ForceGT (germline)      | `FGT;NML` |
| Both regular and ForceGT (somatic)       | `FGT;SOM` |

**Notes:**

* `NML` (normal): Indicates the variant was independently called by the pipeline in germline mode AND present in the ForceGT input.
* `SOM` (somatic): Indicates the variant was independently called by the pipeline in somatic mode AND present in the ForceGT input.
* `NML` and `SOM` **only** appear paired with `FGT`, never alone

**FILTER and INFO field behavior:**

* If a ForceGT variant matches a regular call with the same POS, REF, ALT, it inherits all FILTER and INFO fields from the regular call.
* If a ForceGT variant is at a novel site (no regular call), FILTER and INFO fields are calculated independently for that variant.

## Genotype Reporting

All variants in the ForceGT input VCF are genotyped and included in the output with the following GT values:

| Condition                            | Germline GT        | Somatic T/N GT | Somatic T/O GT |
| ------------------------------------ | ------------------ | -------------- | -------------- |
| No coverage at position              | `./.`              | `./.`          | `./.`          |
| Coverage but no ALT-supporting reads | `0/0`              | `0/0`          | `0/0`          |
| Coverage with ALT-supporting reads   | `0/1`, `1/1`, etc. | `0/1`          | `0/1` or `1/1` |

## ForceGT and Multiallelic Sites

In somatic mode, `--vc-split-multiallelic-calls` is enabled by default, which outputs multiallelic variants on separate lines. **It is not recommended to disable this option.**

ForceGT variants are combined into a single output line with regular calls only when they have an exact match (same POS, REF, and ALT). Otherwise, a separate ForceGT call is emitted.

**Example 1: ForceGT variant differs from regular call**

Both variants are output on separate lines:

```
chrX  100  .  G  C  .  PASS  .           ...  # Regular call (no tags)
chrX  100  .  G  A  .  PASS  FGT         ...  # ForceGT variant (different ALT)
```

**Example 2: ForceGT variant matches regular call exactly**

Combined into a single line with both tags:

```
# Germline mode:
chrX  100  .  G  A  .  PASS  FGT;NML     ...  # Called by pipeline AND in ForceGT input

# Somatic mode:
chrX  100  .  G  A  .  PASS  FGT;SOM     ...  # Called by pipeline AND in ForceGT input
```

**Example 3: Multiallelic site with partial ForceGT overlap**

If the pipeline calls a multiallelic site (e.g., G→A and G→T) and ForceGT input contains only G→A:

```
chrX  100  .  G  A  .  PASS  FGT;SOM     ...  # Matches ForceGT
chrX  100  .  G  T  .  PASS  .           ...  # Regular call only (no tags)
```

## Target BED Filtering

If a target BED file is provided via `--vc-target-bed`, only ForceGT variants overlapping the BED regions are included in the output.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/small-variant-calling/force-genotyping.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
