DUX4 Rearrangement Caller

Overview

The DUX4 Rearrangement Caller identifies the events of potential structural rearrangements between DUX4 and other genes (including IGH). The primary support for the DUX4 Rearrangement Caller is for human reference hg38.

Functionality

The DUX4 Rearrangement Caller has the following features:

  • call DUX4 Rearrangement events from various format of genomic data like FASTQ, BAM, CRAM.

  • scan the whole genome and identify potential DUX4 rearrangement events.

  • run in parallel with the host DRAGEN software with minimal overhead.

Prerequisites

  • Sequencing dataset to be tumor-only, paired-end and whole-genome sequencing

  • Sequencing dataset with mean coverage range between 25X to 120X

  • Sequencing dataset with mean fragment length between 300 to 500bp

  • Sequencing dataset with mean read length between 100 to 151bp

  • A reference genome that is compatible with DRAGEN software. You can download prebuilt reference genomes from our website or build your own customized version with: dragen --build-hash-table true --output-directory <HASHTABLE_DIR> --ht-reference <REF_FASTA> [options]

The DRAGEN DUX4 caller has been validated with a cohort of samples that fall within the above defined parameters. If you have datasets that don't comply with the above parameters, you can bypass the requirements check by specifying --dux4-skip-santiy-check true to obtain experimental results.

Basic usage

The basic syntax of the DRAGEN command line is:

dragen [global options] [pipeline options] [output options]

  • The global options are common to all pipelines and control the general behavior of DRAGEN, such as the input and output files/directories, the reference genome, and the license file.

  • The pipeline options are specific to each pipeline and control the parameters and features of the analysis, such as the variant callers, the filters and the annotations.

  • The output options control the format and content of the output files, such as the VCF, BAM, and the metrics files.

Input files and command line options

For DUX4 caller, a simple and quick example would be:

dragen \
-r ${HASHTABLE_DIR} \
--enable-map-align=true \
--enable-map-align-output=false \
--enable-sort=true \
--enable-duplicate-marking=true \
--tumor-fastq1 test_IGH_DUX4.bam.r1.fastq \
--tumor-fastq2 test_IGH_DUX4.bam.r2.fastq \
--enable-dux4-caller=true \
--output-dir=${OUT_DIR} \
--output-file-prefix=${OUTPUT_PREFIX}

where DRAGEN analysis will take in sequencing data from fastq format (BAM, CRAM, ORA also acceptable) and map/align the reads to the reference genome, the mapped and sorted reads will be consumed by DUX4 caller.

Alternatively, DRAGEN DUX4 caller can start from bam format input by skipping the map/align step (assuming bam file is sorted and with duplicates being marked):

dragen \
-r ${HASHTABLE_DIR} \
--enable-map-align=false \
--enable-map-align-output=false \
--enable-sort=false \
--tumor-bam-input test_IGH_DUX4.bam \
--enable-dux4-caller=true \
--output-dir=${OUT_DIR} \
--output-file-prefix=${OUTPUT_PREFIX}

What's more, DUX4 caller can run in parallel with other variant callers:

dragen \
-r ${HASHTABLE_DIR} \
--enable-map-align=true \
--enable-map-align-output=false \
--enable-sort=true \
--enable-duplicate-marking=true \
--tumor-fastq1 test_IGH_DUX4.bam.r1.fastq \
--tumor-fastq2 test_IGH_DUX4.bam.r2.fastq \
--enable-dux4-caller=true \
--enable-sv=true \
--enable-variant-caller=true \
--output-dir=${OUT_DIR} \
--output-file-prefix=${OUTPUT_PREFIX}

Finally, you will find DUX4 VCF results in the directory of --output-dir with prefix being specified by --output-file-prefix.

Output format

The DUX4 VCF will contain positive calls that represent translocation events across gene pairs. Each event will consist of a set of 4 VCF Breakend records to describe the potential translocation event. Each record will contain PR:SR:SRPB tags to describe the number of fragment that support the events, where PR stands for number of spanning paired reads, SR stands for number of spanning split reads and SRPB stands for number of support read pairs per billion reads being processed. We predefined two sets of genomics target regions, "CoreDUX4" regions and "ExtendedDUX4" regions, to optimize the events detection process, where "CoreDUX4" regions is a subset of "ExtendedDUX4" regions.

An output VCF example will look like this:

##FILTER=<ID=CoreDUX4Present,Description="CoreDUX4 regions demonstrated sufficient evidence to call the arrangement">
##INFO=<ID=TotalReadsNum,Number=1,Type=Integer,Description="Total number of reads for SRPB normalization">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PR,Number=.,Type=Integer,Description="Spanning paired-read support for the ref and alt alleles in the order listed">
##FORMAT=<ID=SR,Number=.,Type=Integer,Description="Split reads for the ref and alt alleles in the order listed">
##FORMAT=<ID=SRPB,Number=.,Type=FLOAT,Description="Supporting read pairs per billion">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample001
chr3    75667931        ExtendedDUX4:IGH:Bnd_W  T       T[chr14:105586938[      .       CoreDUX4Present SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=ExtendedDUX4:IGH:Bnd_X;TotalReadsNum=1837703714       GT:PR:SR:SRPB   1/1:.,129:.,42:93.05
chr3    75667932        ExtendedDUX4:IGH:Bnd_V  G       ]chr14:105586937]G      .       CoreDUX4Present SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=ExtendedDUX4:IGH:Bnd_U;TotalReadsNum=1837703714       GT:PR:SR:SRPB   1/1:.,129:.,42:93.05
chr4    190020407       CoreDUX4:IGH:Bnd_W      C       C[chr14:105586938[      .       PASS    SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=CoreDUX4:IGH:Bnd_X;TotalReadsNum=1837703714   GT:PR:SR:SRPB   1/1:.,128:.,42:92.51
chr4    190020408       CoreDUX4:IGH:Bnd_V      C       ]chr14:105586937]C      .       PASS    SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=CoreDUX4:IGH:Bnd_U;TotalReadsNum=1837703714   GT:PR:SR:SRPB   1/1:.,128:.,42:92.51
chr14   105586937       CoreDUX4:IGH:Bnd_U      T       T[chr4:190020408[       .       PASS    SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=CoreDUX4:IGH:Bnd_V;TotalReadsNum=1837703714   GT:PR:SR:SRPB   1/1:.,128:.,42:92.51
chr14   105586937       ExtendedDUX4:IGH:Bnd_U  T       T[chr3:75667932[        .       CoreDUX4Present SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=ExtendedDUX4:IGH:Bnd_V;TotalReadsNum=1837703714       GT:PR:SR:SRPB   1/1:.,129:.,42:93.05
chr14   105586938       CoreDUX4:IGH:Bnd_X      A       ]chr4:190020407]A       .       PASS    SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=CoreDUX4:IGH:Bnd_W;TotalReadsNum=1837703714   GT:PR:SR:SRPB   1/1:.,128:.,42:92.51
chr14   105586938       ExtendedDUX4:IGH:Bnd_X  A       ]chr3:75667931]A        .       CoreDUX4Present SVTYPE=BND;CIPOS=.,.;EVENTTYPE=TRA;MATEID=ExtendedDUX4:IGH:Bnd_W;TotalReadsNum=1837703714       GT:PR:SR:SRPB   1/1:.,129:.,42:93.05

Last updated