# Force Genotyping

DRAGEN supports force genotyping (ForceGT) for small variant calling. Use `--vc-forcegt-vcf` to specify a VCF file containing variants to force genotype. The input list of small variants can be a \*.vcf or \*.vcf.gz file.

## Supported Modes

* **Germline**: Supported. When using joint genotyping with the `--vc-forcegt-vcf` option, the output joint VCF contains only variants tagged with `FGT`. Without this option, FGT-tagged variants are skipped.
* **Somatic**: Supported in both Tumor-Only (T/O) and Tumor-Normal (T/N) modes.

## Input Requirements

DRAGEN supports only a single ForceGT VCF input file. The input VCF must:

* Be a valid VCF 4.2 file (minimum 8 tab-delimited columns, sorted by contig and position).
* The header must list the same contig names as the reference used for variant calling. All variants must refer to one of these contig names.
* Contain normalized variants (parsimonious and left-aligned).
* Not contain multinucleotide or complex variants (e.g., `AT → C`). These are variants that require more than one substitution / insertion / deletion to go from REF allele to ALT allele and are ignored.
* Not contain deletions longer than 50bp — these are filtered out.
* Duplicate entries (same POS, REF, ALT) are ignored.

**Example of normalization:**

```
# Wrong (not parsimonious):
chrX  153592402  GC  GCG

# Correct (parsimonious):
chrX  153592403  C   CG
```

A nonnormalized variant will cause undefined behavour in DRAGEN.

## Output Behavior

The output VCF contains both regular variant calls and ForceGT variants. Each variant is tagged in the INFO field to indicate its origin:

| Scenario                                 | INFO Tag  |
| ---------------------------------------- | --------- |
| Regular call only (not in ForceGT input) | *(none)*  |
| ForceGT only (not called by pipeline)    | `FGT`     |
| Both regular and ForceGT (germline)      | `FGT;NML` |
| Both regular and ForceGT (somatic)       | `FGT;SOM` |

**Notes:**

* `NML` (normal): Indicates the variant was independently called by the pipeline in germline mode AND present in the ForceGT input.
* `SOM` (somatic): Indicates the variant was independently called by the pipeline in somatic mode AND present in the ForceGT input.
* `NML` and `SOM` **only** appear paired with `FGT`, never alone

**FILTER and INFO field behavior:**

* If a ForceGT variant matches a regular call with the same POS, REF, ALT, it inherits all FILTER and INFO fields from the regular call.
* If a ForceGT variant is at a novel site (no regular call), FILTER and INFO fields are calculated independently for that variant.

## Genotype Reporting

All variants in the ForceGT input VCF are genotyped and included in the output with the following GT values:

| Condition                            | Germline GT        | Somatic T/N GT | Somatic T/O GT |
| ------------------------------------ | ------------------ | -------------- | -------------- |
| No coverage at position              | `./.`              | `./.`          | `./.`          |
| Coverage but no ALT-supporting reads | `0/0`              | `0/0`          | `0/0`          |
| Coverage with ALT-supporting reads   | `0/1`, `1/1`, etc. | `0/1`          | `0/1` or `1/1` |

## ForceGT and Multiallelic Sites

In somatic mode, `--vc-split-multiallelic-calls` is enabled by default, which outputs multiallelic variants on separate lines. **It is not recommended to disable this option.**

ForceGT variants are combined into a single output line with regular calls only when they have an exact match (same POS, REF, and ALT). Otherwise, a separate ForceGT call is emitted.

**Example 1: ForceGT variant differs from regular call**

Both variants are output on separate lines:

```
chrX  100  .  G  C  .  PASS  .           ...  # Regular call (no tags)
chrX  100  .  G  A  .  PASS  FGT         ...  # ForceGT variant (different ALT)
```

**Example 2: ForceGT variant matches regular call exactly**

Combined into a single line with both tags:

```
# Germline mode:
chrX  100  .  G  A  .  PASS  FGT;NML     ...  # Called by pipeline AND in ForceGT input

# Somatic mode:
chrX  100  .  G  A  .  PASS  FGT;SOM     ...  # Called by pipeline AND in ForceGT input
```

**Example 3: Multiallelic site with partial ForceGT overlap**

If the pipeline calls a multiallelic site (e.g., G→A and G→T) and ForceGT input contains only G→A:

```
chrX  100  .  G  A  .  PASS  FGT;SOM     ...  # Matches ForceGT
chrX  100  .  G  T  .  PASS  .           ...  # Regular call only (no tags)
```

## Target BED Filtering

If a target BED file is provided via `--vc-target-bed`, only ForceGT variants overlapping the BED regions are included in the output.
