DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Check Sample Identity with CheckFingerprint
  • Usage

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4
  3. DRAGEN DNA Pipeline

CheckFingerprint

Check Sample Identity with CheckFingerprint

CheckFingerprint is broadly based on Picard CheckFingerprint. CheckFingerprint will output LOD score to indicate whether all the genetic data between two files from the same individual or not.

If LOD score is positive, those two samples come from the same individual. Otherwise, those two samples come from different individuals.

In general, the sign of LOD in summary file should be consistent with Picard CheckFingerprint summary file, but the exact values may be different.

Validation were done on whole-genome sequencing (WGS) data, mixing WGS samples and whole exon sequencing data.

Usage

Modes

The checks can run in one of two modes:

  • Read comparison mode. Aligned reads are compared with the expected VCF

  • VCF comparison mode. Output VCF is compared with the expected VCF

  • Standalone VCF comparison mode. Provide two VCF files, one as the observed VCF and the other as the expected VCF, and compare them.

Options

To enable CheckFingerprint module, the following command-line options are required.

  • --enable-checkfingerprint true

  • --checkfingerprint-expected-vcf [path_to_expected_sample_vcf]

Read comparison mode is enabled by default. Read comparison mode is recommended to use for small dataset or whole exon sequencing data.

To switch to VCF comparison mode, use the following options

  • --checkfingerprint-enable-vcf-comparison true

  • --enable-variant-caller true

Vcf comparison mode is recommended to use for larger samples, such as whole-genome sequencing data with average 30 coverage or whole exon sequencing data.

Command-line Examples

Read mode. Input BAM/FASTQ/CRAM, examine the individual reads in input sample, and compare individual reads with expected VCF file.

./bin/dragen -r [dragen_hash_table] -b [bam] --output-directory [output_dir] \
--output-file-prefix [output_prefix] --enable-checkfingerprint true --checkfingerprint-expected-vcf [input_expected_vcf]

VCF mode. Input BAM/FASTQ/CRAM, generate a VCF file first, and compare the VCF file with expected VCF file

./bin/dragen -r [dragen_hash_table] -b [bam] --output-directory [output_dir] \
--output-file-prefix [output_prefix] --enable-checkfingerprint true --checkfingerprint-expected-vcf [input_expected_vcf] \
--checkfingerprint-enable-vcf-comparison true --enable-variant-caller true

Standalone VCF mode. Input an observed VCF file, and compare observed VCF file with expected VCF file

./bin/dragen -r [dragen_hash_table] --output-directory [output_dir] \
--output-file-prefix [output_prefix] --enable-checkfingerprint true --checkfingerprint-expected-vcf [input_expected_vcf] \
--checkfingerprint-observed-vcf [input_observed_vcf]

Advanced Usage

Input customzied haplotype map. Without user input, DRAGEN checkfingerprint will use default haplotype map. Format of the haplotype map presented below in the "Inputs: a) Haplotype Map" section.

  • --checkfingerprint-haplotype-map [input_haplotype_map]

Enable tumor aware LOD. Default --checkfingerprint-loss-of-het-rate is 0.5. It assumes that tumor sample has undergone a loss of heterozygosity (LoH) where large sections of chromosomes are lost. It makes the heterozygous hapolotypes in normal samples seem homozygouse in corresponding tumor samples.

  • --checkfingerprint-enable-tumor-aware true

  • --checkfingerprint-loss-of-het-rate [float]

Inputs

The input files used by DRAGEN CheckFingerprint are: a) haplotype map (configuration files), b) FASTQ/BAM/CRAM (user input) or observed VCF file (user input), c) expected VCF file (user input).

a) Haplotype Map

Haplotype maps for hg19, hg38 and chm13 are files that are packaged with DRAGEN and automatically selected by the software. The haplotype map is a set of SNPs grouped into haplotyp blocks (also known as linkage disequilibrium blocks). SNPs in haplotye map is used as fingerprinting.

Haplotype map is a txt file with tab delimiter.

@SQ     SN:X    LN:156040895    M5:2b3a55ff7f58eb308420c8a9b11cac50     AS:38   UR:/seq/references/Homo_sapiens_assembly38/v0/Homo_sapiens_assembly38.fasta     SP:Homo sapiens
#CHROMOSOME POSITION    NAME    MAJOR_ALLELE    MINOR_ALLELE    MAF ANCHOR_SNP  PANELS
1       122872  chr1:25 T       G       0.235623                        
1       789502  chr1:84 T       C       0.480232                        
1       789503  chr1:85 G       A       0.480232        chr1:84         
1       796338  chr1:89 T       C       0.154353                        
1       798969  chr1:91 T       C       0.152556        chr1:89  	

The following columns are of interest:

Field
Description

CHROMOSOME

chromosome

POSITION

position

NAME

SNP identifier

MAF

minor allele frequency

ANCHOR_SNP

refers to the NAME of a SNP. SNPs with the same ANCHOR_SNP have high linkage disequilibrium with each other.

b) Sample Input

Samples are input from bam/cram/fastq or observed vcf files.

The following command-line example uses FASTQ input:

dragen \
	-r [dragen_hash_table] \
	--fastq-file1 /staging/test/data/NA12878_R1.fastq \
	--fastq-file2 /staging/test/data/NA12878_R2.fastq \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--RGID DRAGEN_RGID \
	--RGSM NA12878 \
	--enable-checkfingerprint true \
 --checkfingerprint-expected-vcf [input_expected_vcf] \
 --checkfingerprint-enable-vcf-comparison true \
 --enable-variant-caller true

The following command-line example uses vcf input:

dragen \
	-r [dragen_hash_table] \
	--output-directory /staging/test/output \
	--output-file-prefix NA12878_dragen \
	--enable-checkfingerprint true \
 --checkfingerprint-expected-vcf [input_expected_vcf] \
 --checkfingerprint-observed-vcf [input_observed_vcf] \

c) Expected Vcf Input

Vcf output from dragen is recommended. It can contains multiple samples. Multiple sample vcfs can combine together and input here --checkfingerprint-expected-vcf Checkfingerprint calculates LOD between input sample (bam/cram/fastq or vcf) and each sample in expected_vcf file.

Outputs

There are two main output files:

  • [output-file-prefix].CheckFingerprint.summary.txt : contains LOD scores between input sample and expected sample

  • [output-file-prefix].CheckFingerprint.detail.txt : contains LOD scores between individual SNPs.

CheckFingerprint.detail.txt example

READ_GROUP      EXPECTED_SAMPLE SNP     SNP_ALLELES     CHROM   POSITION        EXPECTED_GENOTYPE       OBSERVED_GENOTYPE       LOD
     LOD_EXPECTED_TUMOR_OBSERVED_NORMAL      LOD_EXPECTED_NORMAL_OBSERVED_TUMOR      OBS_A   OBS_B
IGNORE  hg002   chr1:274        AG      1       908025  AG      AG      5.92214 -9.7141e-06     -0.301035       0       0
IGNORE  hg002   chr1:308        GA      1       916119  GG      GG      7.15017 -0.0459602      -1.39646e-06    0       0
IGNORE  hg002   chr1:473        CT      1       984039  CT      CT      4.60476 -0.000568506    -0.301314       0       0 

CheckFingerprint.summary.txt example

LOD_EXPECTED_SAMPLE is the LOD score between two samples. LOD_OBS_TUMOR_EXP_NORMAL is the LOD score while observed sample is tumor sample and expected sample is normal sample. LOD_OBS_NORMAL_EXP_TUMOR is the LOD score while expected sample is tumor sample and observed sample is normal sample. LOD_OBS_TUMOR_EXP_NORMAL and LOD_OBS_NORMAL_EXP_TUMOR have values only when ENABLE_TUMOR_AWARE is true.

READ_GROUP      EXPECTED_SAMPLE LL_EXPECTED_SAMPLE      LL_RANDOM_SAMPLE        LOD_EXPECTED_SAMPLE     ENABLE_TUMOR_AWARE      LOD_OBS_TUMOR_EXP_NORMAL        LOD_OBS_NORMAL_EXP_TUMOR        HAPLOTYPES_WITH_GENOTYPES       HAPLOTYPES_CONFIDENTLY_CHECKED  HAPLOTYPES_CONFIDENTLY_MATCHING HET_AS_HOM      HOM_AS_HET      HOM_AS_OTHER_HOM
IGNORE  hg002   -18237  -6517.62        -11719.4        true    -5907.94        -4499.57        12423   6725    4307    1193    1225
    0

Method of Operation

CheckFingerprint calculates the LOD score to identify whether two samples are from the same individual or not. A positive value indicates those two samples are from the same individual. A negative value indicates two samples are not match. LOD is in logarithmic scale (base 10). Thus, a LOD of 4 indicates it is 10,000 more likely that data matches the genotypes than not. A score that is close to 0 is inconclusive that can result from low coverage or missing informative genotypes. The identity check takes advantage of haplotype blocks defined in configuration file (hg38_nochr.map,hg19_nochr.map). It can improve statistic power for identity detection by checking SNPs in haplotype blocks.

In VCF mode, CheckFingerprint uses PL to estimate genotype probabilities.

Limitaions:

  • Vcf mode is recommended for general use.

  • Currently, Vcf mode is designed for whole genome sequencing samples with 30 coverage;

  • Read mode is designed for whole exome sequencing. Larger datasets may encounter timeout errors.

  • Read mode should be used in isolation without other components enabled and should only be used if Vcf mode does not provide sufficient accuracy.

  • DRAGEN CheckFingerprint is compatible only with DRAGEN germline and tumor-only pipelines.

  • Tumor-aware settings assume tumor samples with loss of heterozygosity and should be used with caution.

  • The input observed and expected sample VCF should originate from the same pipeline, as using different pipelines can lead to inaccurate LOD calculations.

PreviousHigh Coverage AnalysisNextPopulation Haplotyping (Beta)

Last updated 2 days ago

Was this helpful?