Force Genotyping

DRAGEN supports force genotyping (ForceGT) for small variant calling. Use --vc-forcegt-vcf to specify a VCF file containing variants to force genotype. The input list of small variants can be a *.vcf or *.vcf.gz file.

Supported Modes

  • Germline: Supported. When using joint genotyping with the --vc-forcegt-vcf option, the output joint VCF contains only variants tagged with FGT. Without this option, FGT-tagged variants are skipped.

  • Somatic: Supported in both Tumor-Only (T/O) and Tumor-Normal (T/N) modes.

Input Requirements

DRAGEN supports only a single ForceGT VCF input file. The input VCF must:

  • Be a valid VCF 4.2 file (minimum 8 tab-delimited columns, sorted by contig and position).

  • The header must list the same contig names as the reference used for variant calling. All variants must refer to one of these contig names.

  • Contain normalized variants (parsimonious and left-aligned).

  • Not contain multinucleotide or complex variants (e.g., AT → C). These are variants that require more than one substitution / insertion / deletion to go from REF allele to ALT allele and are ignored.

  • Not contain deletions longer than 50bp — these are filtered out.

  • Duplicate entries (same POS, REF, ALT) are ignored.

Example of normalization:

# Wrong (not parsimonious):
chrX  153592402  GC  GCG

# Correct (parsimonious):
chrX  153592403  C   CG

A nonnormalized variant will cause undefined behavour in DRAGEN.

Output Behavior

The output VCF contains both regular variant calls and ForceGT variants. Each variant is tagged in the INFO field to indicate its origin:

Scenario
INFO Tag

Regular call only (not in ForceGT input)

(none)

ForceGT only (not called by pipeline)

FGT

Both regular and ForceGT (germline)

FGT;NML

Both regular and ForceGT (somatic)

FGT;SOM

Notes:

  • NML (normal): Indicates the variant was independently called by the pipeline in germline mode AND present in the ForceGT input.

  • SOM (somatic): Indicates the variant was independently called by the pipeline in somatic mode AND present in the ForceGT input.

  • NML and SOM only appear paired with FGT, never alone

FILTER and INFO field behavior:

  • If a ForceGT variant matches a regular call with the same POS, REF, ALT, it inherits all FILTER and INFO fields from the regular call.

  • If a ForceGT variant is at a novel site (no regular call), FILTER and INFO fields are calculated independently for that variant.

Genotype Reporting

All variants in the ForceGT input VCF are genotyped and included in the output with the following GT values:

Condition
Germline GT
Somatic T/N GT
Somatic T/O GT

No coverage at position

./.

./.

./.

Coverage but no ALT-supporting reads

0/0

0/0

0/0

Coverage with ALT-supporting reads

0/1, 1/1, etc.

0/1

0/1 or 1/1

ForceGT and Multiallelic Sites

In somatic mode, --vc-split-multiallelic-calls is enabled by default, which outputs multiallelic variants on separate lines. It is not recommended to disable this option.

ForceGT variants are combined into a single output line with regular calls only when they have an exact match (same POS, REF, and ALT). Otherwise, a separate ForceGT call is emitted.

Example 1: ForceGT variant differs from regular call

Both variants are output on separate lines:

Example 2: ForceGT variant matches regular call exactly

Combined into a single line with both tags:

Example 3: Multiallelic site with partial ForceGT overlap

If the pipeline calls a multiallelic site (e.g., G→A and G→T) and ForceGT input contains only G→A:

Target BED Filtering

If a target BED file is provided via --vc-target-bed, only ForceGT variants overlapping the BED regions are included in the output.

Last updated

Was this helpful?