Force Genotyping

DRAGEN supports force genotyping (ForceGT) for small variant calling. To use ForceGT, use the --vc-forcegt-vcf option with a list of small variants to force genotype. The input list of small variants can be a *.vcf or *.vcf.gz file.

The current limitations of ForceGT are as follows:

  • ForceGT is supported for germline small variant calling in the V3 mode. The V1, V2, and V2+ modes are not supported.

  • ForceGT is also supported for somatic small variant calling.

  • ForceGT variants do not propagate through joint genotyping.

ForceGT Input

DRAGEN supports only a single ForceGT VCF input file, which must meet the following requirements:

  • The input has to be a valid VCF file according to version 4.2 of the VCF standard. For instance, it has to have at least eight tab-delimited columns and records need to be sorted by reference contig and position.

  • The header has to list the same contigs as the reference used for variant calling. All variants must refer to one of these contig names.

  • Variants have to be normalized (parsimonious and left-aligned, see below).

  • It must not contain any multinucleotide or complex variants (AT -> C). These are variants that require more than one substitution / insertion / deletion to go from REF allele to ALT allele and are ignored.

  • Any deletions longer than 50bp are filtered out.

  • Any variant will only be called once. Duplicate entries will be ignored.

The following nonnormalized variant will cause undefined behavior in DRAGEN:

Instead of…

parsimonious: chrX 153592402 GC GCG

use…

parsimonious representation: chrX 153592403 C CG

ForceGT Operation and Expected Outcome

Force genotyping requires an input VCF and can be used with DRAGEN software in VCF, GVCF or VCF+GVCF mode. In all cases the output file(s) contains all regular calls and the forceGT variants, as follows:

  • For a ForceGT call that was not called by the variant caller (not common), the call is tagged with FGT in the INFO field.

  • For a germline ForceGT call that was also called by the variant caller and filter field is PASS, the call is tagged with NML;FGT in the INFO field (NML denotes normal). In somatic mode, the call is tagged with FGT;SOM.

  • For a normal call (and PASS) by the variant caller, with no ForceGT call (normal), no extra tags are added (no NML tag, no FGT tag).

This scheme distinguishes among calls that are present due to FGT only, common in both ForceGT input and normal calling, and normal calls.

All the variants in the input ForceGT VCF are genotyped and present in the output file. The following table lists the reported GTs for the variants.

Condition
Reported GT

At a position with no coverage

./. or .

At a position with coverage but no reads supporting ALT allele

0/0 or 0

At a position with coverage and reads supporting ALT allele

dependent on pipeline (germline/somatic)

If DRAGEN calls a variant that is different from the one specified in the input ForceGT VCF, the output contains the following multiple entries at the same position:

  • One entry for the default DRAGEN variant call

  • One entry each for every variant call present in the input ForceGT-VCF at that position

chrX 100 G C [Default DRAGEN variant call]
chrX 100 G A [Variant in ForceGT vcf]

If a target BED file is provided along with the input ForceGT VCF, then the output file only contains ForceGT variants that overlap the BED file positions.

Last updated