CheckFingerprint

CheckFingerprint identifies whether two or more sequencing datasets originate from the same individual. It supports multiple comparison modes depending on the available inputs and the desired trade-off between statistical rigor, runtime, and scalability.

LOD-based CheckFingerprint modes are broadly based on Picard CheckFingerprint and report logarithmic odds (LOD) scores to assess sample identity using probabilistic genotype modeling and haplotype blocks.

Pairwise pileup comparison mode is a new, experimental method that reports a simple MatchRate based on direct genotype concordance. This mode is intended for rapid screening and large-scale comparisons.

A positive LOD score or a high MatchRate indicates that samples are likely derived from the same individual.

CheckFingerprint Modes (Summary)

CheckFingerprint supports two types of identity metrics:

LOD score (logarithmic odds) quantifies how much more likely two samples are to come from the same person than from different people. A positive score indicates a likely match, with higher values indicating stronger evidence.
Match rate is a simpler 0–1 concordance score based on direct genotype comparison, where values above 0.90–0.95 indicate a likely match.

Mode

What you need

Metric

Runtime

When to use

From reads (generate VCF)

BAM/CRAM + expected genotype VCF. Requires the DRAGEN germline small variant caller to be enabled.

LOD score

Medium

WGS or larger datasets; best general-purpose option

Precomputed VCF

One or more observed genotype VCFs + expected genotype VCF (no BAM needed). Small variant caller is skipped.

LOD score

Fast

VCFs (must be germline) already available

Pairwise pileup (experimental)

Pileup files generated during DRAGEN contamination detection

Match rate

Very fast

Batch screening across many samples; requires contamination detection to have been run

Processing Flow

LOD-Based Modes (On-the-fly VCF / Precomputed Germline VCF)

In LOD-based modes, CheckFingerprint uses reference-specific haplotype map files (*.map), bundled with DRAGEN and automatically selected based on the reference, to define curated SNPs grouped into haplotype (linkage disequilibrium) blocks. Genotype likelihoods are estimated from VCF PL values, and evidence is aggregated at the haplotype level to avoid over-counting correlated variants. A logarithmic odds (LOD) score is then computed to quantify how much more likely the samples originate from the same individual than from different individuals.

Processing steps:

Select SNPs from reference-specific haplotype maps
Estimate genotype likelihoods from VCF PL values
Aggregate evidence across haplotype blocks
Compute LOD scores for sample pairs

LOD-based modes provide a statistically rigorous identity assessment and are recommended for final confirmation.

Pairwise Pileup Mode (Experimental)

In pairwise pileup mode, CheckFingerprint performs a fast, direct comparison of genotypes using pileup files, without haplotype modeling or probabilistic inference. This mode is optimized for rapid screening and large-scale, multi-sample comparisons.

Processing steps:

Load pileup files for all input samples
Select overlapping marker sites across samples
Apply minimum depth and heterozygosity filters
Exclude uninformative sites (e.g. homozygous reference in both samples)
Compare genotypes at remaining sites
Compute a MatchRate for all pairwise sample comparisons

Interpretation of Results

LOD-Based Modes

LOD > 0: samples likely from the same individual
LOD < 0: samples likely from different individuals
LOD ≈ 0: inconclusive (often due to low coverage)

LOD scores are reported on a base-10 logarithmic scale. For example, a LOD of 4 indicates the data are 10,000× more likely to match than not.

Pairwise Pileup Mode

MatchRate ≥ 0.90–0.95: samples likely from the same origin
Lower MatchRate: samples likely from different individuals
MatchRate = NA: insufficient overlapping informative sites

MatchRate is intended for screening and triage, not formal identity confirmation.

Command-Line Options

[Required]

--enable-checkfingerprint true

[Required for LOD-Based Modes]

--checkfingerprint-expected-vcf <expected.vcf>

The expected VCF may contain one or multiple samples. The input sample is compared independently against each expected sample.

[Mode Selection Options]

Option

Description

--checkfingerprint-enable-vcf-comparison true

Enable VCF comparison mode (required for either precomputed or on the fly)

--checkfingerprint-observed-vcf <vcf>

Enable precomputed VCF comparison mode

--checkfingerprint-pairwise-read-files <pileup>

Enable pairwise pileup mode (repeatable)

[Optional – Advanced (LOD-Based Modes)]

--checkfingerprint-haplotype-map <map_file> Specify a custom haplotype map file. By default, DRAGEN automatically selects a reference-specific haplotype map bundled with the software.

[Pairwise Pileup Mode Settings]

Setting

Description

Default

--checkfingerprint-pairwise-min-depth

Minimum depth required at a locus

--checkfingerprint-pairwise-het-width

Total AF window around 0.5 used to classify heterozygous sites (e.g. 0.5 → AF 0.25–0.75)

0.5

--checkfingerprint-pairwise-min-passing-sites

Minimum overlapping passing sites required to compute MatchRate

500

[Tumor-Aware Settings – LOD Modes]

Setting

Description

Default

--checkfingerprint-enable-tumor-aware true

Enable tumor-aware LOD computation

--checkfingerprint-loss-of-het-rate

Rate at which heterozygous sites become homozygous due to LOH

0.5

Command-Line Examples

On-the-fly VCF Comparison Mode

Most applicable for: Whole-genome sequencing (WGS) datasets (≈30× coverage) and general-purpose identity checking.

dragen -r <ref_dir> -b <input.bam> \
  --output-directory <outdir> \
  --output-file-prefix sample \
  --enable-checkfingerprint true \
  --checkfingerprint-expected-vcf expected.vcf \
  --checkfingerprint-enable-vcf-comparison true \
  --enable-variant-caller true

Standalone VCF Comparison Mode

Most applicable for: VCF-only workflows where both observed and expected VCFs are already available.

dragen -r <ref_dir> \
  --output-directory <outdir> \
  --output-file-prefix sample \
  --enable-checkfingerprint true \
  --checkfingerprint-expected-vcf expected.vcf \
  --checkfingerprint-observed-vcf observed.vcf

Pairwise Pileup Comparison Mode (Experimental)

Most applicable for: Rapid batch-level screening of many samples (e.g. WGS runs), duplicate detection, and large-scale identity sanity checks.

dragen -r <ref_dir> \
  --enable-checkfingerprint true \
  --checkfingerprint-pairwise-read-files sampleA.pileup.txt \
  --checkfingerprint-pairwise-read-files sampleB.pileup.txt \
  --checkfingerprint-pairwise-read-files sampleC.pileup.txt \
  --checkfingerprint-pairwise-min-depth 20 \
  --checkfingerprint-pairwise-het-width 0.5 \
  --output-directory <outdir> \
  --output-file-prefix batch

Pileup files can be generated during DRAGEN map-align steps by:

DRAGEN contamination detection (--qc-detect-contamination true)
External tools such as samtools mpileup

Outputs

LOD-Based Modes

<prefix>.CheckFingerprint.summary.txt
<prefix>.CheckFingerprint.detail.txt

Pairwise Pileup Mode Output

<prefix>.CheckFingerprint.pairwise.csv

The CSV file contains all pairwise sample comparisons, sorted by MatchRate (highest to lowest):

Column

Description

SampleA / SampleB

Input pileup file names

OverlappingSites

Total shared loci

PassingSites

Loci passing depth and genotype filters

UninformativeSites

Loci where both samples are homozygous reference

MatchingGenotypes

Matching genotype calls

MismatchingGenotypes

Mismatching genotype calls

MatchRate

Matching / (Matching + Mismatching), or NA

If PassingSites < checkfingerprint-pairwise-min-passing-sites, MatchRate is reported as NA.

Limitations

Pairwise pileup mode:

Experimental; intended for rapid screening
Non-probabilistic and haplotype-free
Less sensitive for low-coverage or targeted panels

LOD-based modes:

Tumor-aware LOD assumes loss of heterozygosity
Observed and expected VCFs should originate from the same pipeline
Compatible only with DRAGEN germline and tumor-only pipelines

PreviousFastQC NextJSON Metrics Reporting

Last updated 15 hours ago

Was this helpful?

hashtagCheckFingerprint Modes (Summary)

hashtagProcessing Flow

hashtagLOD-Based Modes (On-the-fly VCF / Precomputed Germline VCF)

hashtagPairwise Pileup Mode (Experimental)

hashtagInterpretation of Results

hashtagLOD-Based Modes

hashtagPairwise Pileup Mode

hashtagCommand-Line Options

hashtag[Required]

hashtag[Required for LOD-Based Modes]

hashtag[Mode Selection Options]

hashtag[Optional – Advanced (LOD-Based Modes)]

hashtag[Pairwise Pileup Mode Settings]

hashtag[Tumor-Aware Settings – LOD Modes]

hashtagCommand-Line Examples

hashtagOn-the-fly VCF Comparison Mode

hashtagStandalone VCF Comparison Mode

hashtagPairwise Pileup Comparison Mode (Experimental)

hashtagOutputs

hashtagLOD-Based Modes

hashtagPairwise Pileup Mode Output

hashtagLimitations

CheckFingerprint Modes (Summary)

Processing Flow

LOD-Based Modes (On-the-fly VCF / Precomputed Germline VCF)

Pairwise Pileup Mode (Experimental)

Interpretation of Results

LOD-Based Modes

Pairwise Pileup Mode

Command-Line Options

[Required]

[Required for LOD-Based Modes]

[Mode Selection Options]

[Optional – Advanced (LOD-Based Modes)]

[Pairwise Pileup Mode Settings]

[Tumor-Aware Settings – LOD Modes]

Command-Line Examples

On-the-fly VCF Comparison Mode

Standalone VCF Comparison Mode

Pairwise Pileup Comparison Mode (Experimental)

Outputs

LOD-Based Modes

Pairwise Pileup Mode Output

Limitations