CheckFingerprint

CheckFingerprint identifies whether two or more sequencing datasets originate from the same individual. It supports multiple comparison modes depending on the available inputs and the desired trade-off between statistical rigor, runtime, and scalability.

LOD-based CheckFingerprint modes are broadly based on Picard CheckFingerprint and report logarithmic odds (LOD) scores to assess sample identity using probabilistic genotype modeling and haplotype blocks.

Pairwise pileup comparison mode is a new, experimental method that reports a simple MatchRate based on direct genotype concordance. This mode is intended for rapid screening and large-scale comparisons.

A positive LOD score or a high MatchRate indicates that samples are likely derived from the same individual.


CheckFingerprint Modes (Summary)

CheckFingerprint supports two types of identity metrics:

  • LOD score (logarithmic odds) quantifies how much more likely two samples are to come from the same person than from different people. A positive score indicates a likely match, with higher values indicating stronger evidence.

  • Match rate is a simpler 0–1 concordance score based on direct genotype comparison, where values above 0.90–0.95 indicate a likely match.

Mode
What you need
Metric
Runtime
When to use

From reads (generate VCF)

BAM/CRAM + expected genotype VCF. Requires the DRAGEN germline small variant caller to be enabled.

LOD score

Medium

WGS or larger datasets; best general-purpose option

Precomputed VCF

One or more observed genotype VCFs + expected genotype VCF (no BAM needed). Small variant caller is skipped.

LOD score

Fast

VCFs (must be germline) already available

Pairwise pileup (experimental)

Pileup files generated during DRAGEN contamination detection

Match rate

Very fast

Batch screening across many samples; requires contamination detection to have been run


Processing Flow

LOD-Based Modes (On-the-fly VCF / Precomputed Germline VCF)

In LOD-based modes, CheckFingerprint uses reference-specific haplotype map files (*.map), bundled with DRAGEN and automatically selected based on the reference, to define curated SNPs grouped into haplotype (linkage disequilibrium) blocks. Genotype likelihoods are estimated from VCF PL values, and evidence is aggregated at the haplotype level to avoid over-counting correlated variants. A logarithmic odds (LOD) score is then computed to quantify how much more likely the samples originate from the same individual than from different individuals.

Processing steps:

  1. Select SNPs from reference-specific haplotype maps

  2. Estimate genotype likelihoods from VCF PL values

  3. Aggregate evidence across haplotype blocks

  4. Compute LOD scores for sample pairs

LOD-based modes provide a statistically rigorous identity assessment and are recommended for final confirmation.


Pairwise Pileup Mode (Experimental)

In pairwise pileup mode, CheckFingerprint performs a fast, direct comparison of genotypes using pileup files, without haplotype modeling or probabilistic inference. This mode is optimized for rapid screening and large-scale, multi-sample comparisons.

Processing steps:

  1. Load pileup files for all input samples

  2. Select overlapping marker sites across samples

  3. Apply minimum depth and heterozygosity filters

  4. Exclude uninformative sites (e.g. homozygous reference in both samples)

  5. Compare genotypes at remaining sites

  6. Compute a MatchRate for all pairwise sample comparisons


Interpretation of Results

LOD-Based Modes

  • LOD > 0: samples likely from the same individual

  • LOD < 0: samples likely from different individuals

  • LOD ≈ 0: inconclusive (often due to low coverage)

LOD scores are reported on a base-10 logarithmic scale. For example, a LOD of 4 indicates the data are 10,000× more likely to match than not.


Pairwise Pileup Mode

  • MatchRate ≥ 0.90–0.95: samples likely from the same origin

  • Lower MatchRate: samples likely from different individuals

  • MatchRate = NA: insufficient overlapping informative sites

MatchRate is intended for screening and triage, not formal identity confirmation.


Command-Line Options

[Required]

  • --enable-checkfingerprint true


[Required for LOD-Based Modes]

  • --checkfingerprint-expected-vcf <expected.vcf>

The expected VCF may contain one or multiple samples. The input sample is compared independently against each expected sample.


[Mode Selection Options]

Option
Description

--checkfingerprint-enable-vcf-comparison true

Enable VCF comparison mode (required for either precomputed or on the fly)

--checkfingerprint-observed-vcf <vcf>

Enable precomputed VCF comparison mode

--checkfingerprint-pairwise-read-files <pileup>

Enable pairwise pileup mode (repeatable)


[Optional – Advanced (LOD-Based Modes)]

  • --checkfingerprint-haplotype-map <map_file> Specify a custom haplotype map file. By default, DRAGEN automatically selects a reference-specific haplotype map bundled with the software.


[Pairwise Pileup Mode Settings]

Setting
Description
Default

--checkfingerprint-pairwise-min-depth

Minimum depth required at a locus

10

--checkfingerprint-pairwise-het-width

Total AF window around 0.5 used to classify heterozygous sites (e.g. 0.5 → AF 0.25–0.75)

0.5

--checkfingerprint-pairwise-min-passing-sites

Minimum overlapping passing sites required to compute MatchRate

500


[Tumor-Aware Settings – LOD Modes]

Setting
Description
Default

--checkfingerprint-enable-tumor-aware true

Enable tumor-aware LOD computation

--checkfingerprint-loss-of-het-rate

Rate at which heterozygous sites become homozygous due to LOH

0.5


Command-Line Examples

On-the-fly VCF Comparison Mode

Most applicable for: Whole-genome sequencing (WGS) datasets (≈30× coverage) and general-purpose identity checking.


Standalone VCF Comparison Mode

Most applicable for: VCF-only workflows where both observed and expected VCFs are already available.


Pairwise Pileup Comparison Mode (Experimental)

Most applicable for: Rapid batch-level screening of many samples (e.g. WGS runs), duplicate detection, and large-scale identity sanity checks.

Pileup files can be generated during DRAGEN map-align steps by:

  • DRAGEN contamination detection (--qc-detect-contamination true)

  • External tools such as samtools mpileup


Outputs

LOD-Based Modes

  • <prefix>.CheckFingerprint.summary.txt

  • <prefix>.CheckFingerprint.detail.txt


Pairwise Pileup Mode Output

  • <prefix>.CheckFingerprint.pairwise.csv

The CSV file contains all pairwise sample comparisons, sorted by MatchRate (highest to lowest):

Column
Description

SampleA / SampleB

Input pileup file names

OverlappingSites

Total shared loci

PassingSites

Loci passing depth and genotype filters

UninformativeSites

Loci where both samples are homozygous reference

MatchingGenotypes

Matching genotype calls

MismatchingGenotypes

Mismatching genotype calls

MatchRate

Matching / (Matching + Mismatching), or NA

If PassingSites < checkfingerprint-pairwise-min-passing-sites, MatchRate is reported as NA.


Limitations

Pairwise pileup mode:

  • Experimental; intended for rapid screening

  • Non-probabilistic and haplotype-free

  • Less sensitive for low-coverage or targeted panels

LOD-based modes:

  • Tumor-aware LOD assumes loss of heterozygosity

  • Observed and expected VCFs should originate from the same pipeline

  • Compatible only with DRAGEN germline and tumor-only pipelines

Last updated

Was this helpful?