Filter Duplicate Variants
DRAGEN can find and remove variants that are common to separate VCF files. DRAGEN supports the following modes:
Small indel deduplication—If using a structural variant VCF and a small variant VCF, DRAGEN filters all small indels in the structural variant VCF that appear and are passing in the small variant VCF (
PASS
in theFILTER
column of the small variant VCF file). Using this feature, DRAGEN will create a new VCF (without changing SV and SNV VCF files) that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by--output-file-prefix
followed bysv.small_indel_dedup.vcf.gz
as suffix. The diagram below describes the small indel deduplication pipeline. You must provide a reference genome to generate the VCF files to normalize the variants. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. An instance of utilizing this feature is when incorporating both SV and SNV callers in somatic workflows, which can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A.
SMN deduplication—If using a small variant VCF and an ExpansionHunter VCF, DRAGEN filters any lines in the small variant VCF that have the same chromosome and position as lines in the ExpansionHunter VCF with the INFO tag
VARID=SMN
. A reference genome is not required.
Use the following command line options to input VCF or gVCF files. The input files are not altered.
vd-sv-vcf
—Specify a structural variant VCF or gVCF.vd-small-variant-vcf
—Specify a small variant VCF or gVCF.vd-eh-vcf
—Specify an ExpansionHunter VCF or gVCF.
DRAGEN determines the name and type of the output file as follows.
Command-Line Options
You can use the following command line options for variant deduplication.
The following is an example command for an SMN deduplication standalone run:
You can also run small indel deduplication automatically on outputs from the DRAGEN joint caller where both structural variant and small variant callers are enabled. To run small indel deduplication automatically, set enable-variant-deduplication
to true
, and make sure the vd-sv-vcf
, vd-small-indel-vcf
, and vd-eh-vcf
input options are not set. Only small indel deduplication can be run automatically.
The following is an example command for an automatic small indel deduplication run.
Last updated