Somatic

Overview

DRAGEN provides somatic copy number variant (CNV) calling workflows that detect copy number aberrations and regions with loss of heterozygosity (LOH) in whole genome sequencing (WGS) and whole exome sequencing (WES) data. The CNV workflows leverage both depth of coverage and B-allele frequencies (BAFs) to provide comprehensive detection of:

Copy number gains (duplications) and losses (deletions)
Copy-neutral loss of heterozygosity (CNLOH)
Subclonal alterations (WGS only, enabled by default)
Minor allele copy number estimation

Workflow

The DRAGEN somatic CNV workflow follows this processing pipeline:

The pipeline consists of the following modules:

Target Counts — Binning of read counts and other signals from alignments
B-Allele Counts — Extraction of allelic read counts
Bias Correction — Correction of GC bias and other systematic biases
Normalization — Detection of normal ploidy levels and normalization
Segmentation — Breakpoint detection via segmentation of normalized depth and BAF signals
Allele Specific Copy Number (ASCN) Calling — Integration of depth and BAF segments to determine copy number states and allele-specific information

B-Allele Frequency Inputs: The pipeline supports multiple input options for estimating B‑allele frequencies (BAF), depending on the availability of a matched normal sample.

Matched normal already processed
- If the matched normal sample has been processed with the germline small variant caller, the resulting VCF file can be provided directly.
Matched normal not yet processed
- If the matched normal has not been processed, the user may provide raw reads or aligned reads and enable concurrent execution of the germline small variant caller.
- In this case, DRAGEN CNV consumes the small variant caller output to estimate B‑allele frequencies from germline SNVs.
No matched normal available
- A population SNV VCF may be provided.
- DRAGEN estimates B‑allele frequencies using variants from the population SNV VCF.

For WES, population SNVs are intersected with the regions defined in cnv-target-bed. The target BED file must contain the same target intervals used to generate the PON.

Depth-Only Workflow (Legacy): For applications that require only fold-change detection without purity/ploidy model estimation, a legacy depth-only workflow is also available for WES and targeted panels. See Depth-Only Workflow for details.

Example Command Lines

WGS — Tumor-Normal (concurrent SNV caller)

If the matched normal has not been pre-processed, you can run the somatic SNV caller concurrently with CNV, which feeds germline heterozygous sites directly to the CNV caller:

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--tumor-bam-input <TUMOR_BAM> \
--bam-input <NORMAL_BAM> \
--enable-variant-caller true \
--cnv-use-somatic-vc-baf true

To additionally enable Germline-aware Mode and VAF-aware Mode, add the following flags:

--cnv-normal-cnv-vcf <CNV_NORMAL_VCF>   # germline-aware mode
# VAF-aware mode is enabled by default for tumor/matched-normal runs with --enable-variant-caller true

WGS — Tumor-Only (population SNP VCF)

If no matched normal is available, run in tumor-only mode using a population SNP catalog:

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--tumor-bam-input <TUMOR_BAM> \
--cnv-population-b-allele-vcf <POP_SNP_VCF>

WES — Tumor-Normal (concurrent SNV caller)

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--tumor-bam-input <TUMOR_BAM> \
--bam-input <NORMAL_BAM> \
--enable-cnv true \
--enable-variant-caller true \
--cnv-use-somatic-vc-baf true \
--cnv-normals-list <PANEL_OF_NORMALS> \
--cnv-target-bed <TARGET_BED> \
--vc-target-bed <TARGET_BED>

WES — Tumor-Only (population SNP VCF)

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--tumor-bam-input <TUMOR_BAM> \
--cnv-population-b-allele-vcf <POP_SNP_VCF> \
--cnv-normals-list <PANEL_OF_NORMALS> \
--cnv-target-bed <TARGET_BED>

Required Options

Option

Description

--enable-cnv

Enable CNV processing (set to true)

Input Options

DNA inputs

Option

Description

--tumor-fastq1, --tumor-fastq2

FASTQ input files (requires --enable-map-align true)

--tumor-bam-input

BAM input file

--tumor-cram-input

CRAM input file

B-Allele inputs

Option

Description

--cnv-normal-b-allele-vcf

Specify a matched normal SNV VCF.

--cnv-population-b-allele-vcf

Specify a population SNP catalog.

--cnv-use-somatic-vc-baf

If running in tumor-normal mode with the SNV caller enabled, use this option to specify the germline heterozygous sites.

For more information on specifying b-allele loci, see Specification of B-Allele Loci.

PON inputs

Option

Description

--cnv-target-bed

BED file defining exome capture regions (only for WES)

--cnv-normals-file

Specify individual normal counts file (target.counts.gz or target.counts.gc-corrected.gz) for PON. You can use this option multiple times, one time for each file.

--cnv-normals-list

Specify text file that contains paths to the list of reference target counts files to be used as a panel of normals (new line separated).

--cnv-combined-counts

Specify combined PON file (.combined.counts.txt.gz).

Other inputs

Option

Description

--ref-dir

DRAGEN reference genome hashtable directory

--enable-map-align

Enable mapper and aligner module

--sample-sex

Sample sex (e.g., male, female). If not specified, sex is estimated from data.

Pop SNP download

Population VCF files can be downloaded from link below:

Reference

Size

Download

hg38 CNV Population SNP VCF v1.0

1.8GB

Download

hg19 CNV Population SNP VCF v1.0

1.8GB

Download

hs37d5 CNV Population SNP VCF v1.0

1.8GB

Download

CHM13-v2 CNV Population SNP VCF v1.0

4.0GB

Download

Download links in the table opens an external Illumina download page: https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html

Output Options

Option

Description

--output-directory

Output directory for all results

--output-file-prefix

Prefix prepended to all output file names

--cnv-enable-cyto-output

Enable cytogenetics-compatible output VCF (default false) - only available for WGS

Target Counting Options

Option

Description

--cnv-counts-method

Specifies the counting method for an alignment to be counted in a target bin. Values are midpoint, start, or overlap. The default value is overlap when using the panel of normals approach, which means if an alignment overlaps any part of the target bin, the alignment is counted for that bin. In the self-normalization mode, the default counting method is start.

--cnv-min-mapq

Specifies the minimum MAPQ for an alignment to be counted during target counts generation. The default value is 3 for self-normalization and 20 otherwise. When generating counts for panel of normals, all MAPQ0 alignments are counted.

--cnv-target-bed

Specifies a properly formatted BED file that indicates the target intervals to sample coverage over. For use in WES analysis.

--cnv-interval-width

Specifies the width of the sampling interval for CNV processing. This option controls the effective window size. The default is 1000 for WGS analysis and 500 for WES analysis.

--cnv-skip-contig-list

Specifies a comma-separated list of contig identifiers to skip when generating intervals for WGS analysis. The default contigs that are skipped, if not specified, are chrM,MT,m,chrm.

--cnv-filter-duplicate-alignments

Filter duplicate marked alignments during target counts if option is set to true. The default setting is true unless map/align is enabled and duplicate marking is disabled.

Note that --cnv-filter-duplicate-alignments is only available with duplicate marking option set to true. For more information, see Filter Duplicate Alignments

For more information of target counting method description, see Target Counts

GC Bias Correction Options

Option

Description

--cnv-enable-gc-bias-correction

Enable or disable GC bias correction when generating target counts. The default is true.

--cnv-enable-gcbias-smoothing

Enable or disable smoothing the GC bias correction across adjacent GC bins with an exponential kernel. The default is true.

--cnv-num-gc-bins

Specifies the number of bins for GC bias correction. Each bin represents the GC content percentage. Allowed values are 10, 20, 25, 50, or 100. The default is 25.

For more information, see GC Bias Correction

Normalization Options

A Panel of normals (PON) is used to provide the reference baseline for copy number variants. PON is required for WES, while WGS can use either self (recommended) or PON normalization.

Self-normalization option

Option

Description

--cnv-enable-self-normalization

Enable/disable self normalization mode, which does not require a panel of normals (only available for WGS).

PON normalization options

Option

Description

--cnv-extreme-percentile

Specifies the extreme median percentile value at which to filter out samples. The default is 2.5.

--cnv-max-percent-zero-samples

Specifies the number of zero coverage samples allowed for a target. If the target exceeds the specified threshold, then the target is filtered out. The default value is 5%. The option is sensitive to the number of normal samples being used. Make sure you adjust the threshold accordingly. If your panel of normals size is small and the threshold not adjusted, the option could filter out targets that were not intended to be.

--cnv-max-percent-zero-targets

Specifies the number of zero coverage targets allowed for a sample. If sample exceeds the specified threshold, then the sample is filtered out. The default value is 2.5%. The option is sensitive to the total number of target intervals. Make sure you adjust the threshold accordingly. If the capture kit has a small number of probes and the threshold not adjusted, the option could filter out targets that were not intended to be.

--cnv-target-factor-threshold

Specifies the bottom percentile of panel of normals medians to filter out useable targets. The default is 1% for whole genome processing and 5% for targeted sequencing processing.

--cnv-truncate-threshold

Specifies a percentage threshold for truncating extreme outliers. The default is 0.1%.

--cnv-enable-gender-matched-pon

Enable/disable gender matched PON normalization. If enabled, DRAGEN uses matched gender PON for sex chromosome normalization. Sex chromosome intervals are filtered if PON has no matched gender sample. The default value is true.

--cnv-enable-cross-gender-adjustments-chrX

Enable normalization on chrX by adjusting coverage of PON samples according to the expected number of copies of chrX in male and female samples. If the case sample is male, coverage of female PON samples is scaled down by a factor of 2 on chrX. If the case sample is female, coverage of male PON samples is scaled up by a factor of 2 on chrX. If no male PON samples are available, chrY intervals will be filtered. This feature is only supported for germline enrichment runs. The default value is false; if set to true, then --cnv-enable-gender-matched-pon must also be true.

DRAGEN will select PON normalization if PON is provided. For more information, see normalization

Segmentation Options

The segmentation method for both WGS and WES somatic workflows, and in both tumor-normal and tumor-only configurations, is a variant of shifting level models (SLM) called adaptive shifting level models, or ASLM. This can be overridden with the option --cnv-segmentation-mode (see segmentation), but is not recommended.

Option

Description

--cnv-slm-eta

Probability that the segmenter changes to any other state than the current state going from the current target to the next target. This could also be expressed as the probability that the true depth for adjacent targets is different for reasons that simple counting noise does not adequately explain. Likewise, the stay-in-state probability is (1.0 - eta). The effective default value is 3e-3, the range is (0.0, 1.0) excluding endpoints. Decreasing this value results in longer segments and reduced fragmentation; increasing produces shorter segments with more fragmentation.

--cnv-slm-bafeta

Similar to above, but between adjacent B allele sites. The default value is 1e-7 for somatic WES, 1e-12 for somatic WGS tumor-normal, and 1e-20 for somatic WGS tumor-only. The range is (0.0, 1.0) excluding endpoints. Decreasing this value results in longer segments and reduced fragmentation; increasing produces shorter segments with more fragmentation. However, see below for the limited purpose of segmentation on B allele frequencies.

The B allele segmentation is performed separately and independently of the depth segmentation. It is a crude segmentation to find the segments which have a balanced B allele frequency, indicating both parental haplotypes are present at equal copy number. A subset of these B allele balanced segments, subject to some additional criteria, are then used to identify a common variance parameter for the depth domain. The ordinary SLM method used for depth-based segmentation is then extended to have a state-dependent emission variance computed from the common variance and scaled by the state mean. The B allele segmentation is not directly used after this, but it plays a critical role in determining the parameterization of the depth-based segmentation. However, there is no analogous parameter in ASLM to --cnv-slm-omega in SLM and HSLM as described for germline analyses.

The following options are documented here in proximity to segmentation options because of their direct relevance to each other. Once provisional calls for copy number (CN) and minor copy number (MCN) have been made on the resulting segments from the segmentation stage, given the selected purity/ploidy model, adjacent segments with the same CN and MCN are joined to form a single segment. This is continued until no two adjacent segments satisfy the merging criteria. Segment merging is a critical step which compensates for over-segmentation or over-fragmentation happening at the segmentation stage. However, segment merging cannot split segments apart, so it cannot compensate in the other direction. Thus, segmentation can afford to produce a degree of over-segmentation, but there is no compensatory mechanism for under-segmentation. These options control segment merging in somatic analyses and do not depend on the segmentation option settings.

Option

Description

--cnv-merge-distance

Maximum gap in base pairs between two adjacent segments that still allows them to be merged. The default is 10000 for somatic WGS, meaning segments must be within 10 kb of each other. For WES, the default is effectively unlimited, since target intervals are inherently non-contiguous.

--cnv-merge-threshold

Maximum difference in segment mean (linear copy ratio) between two adjacent segments that still allows them to be merged. The default is 0.025 for somatic WGS and 0.4 for somatic WES.

Setting --cnv-merge-threshold to zero disables segment merging entirely. This is not recommended.

You can specify additional CBS options

Option

Description

--cnv-cbs-alpha

Specifies the significance level for the test to accept change points. The default is 0.01.

--cnv-cbs-eta

Specifies the Type I error rate of the sequential boundary for early stopping when using the permutation method. The default is 0.05.

--cnv-cbs-kmax

Specifies maximum width of smaller segment for permutation. The default is 25.

--cnv-cbs-min-width

Specifies the minimum number of markers for a changed segment. The default is 2.

--cnv-cbs-nmin

Specifies the minimum length of data for maximum statistic approximation. The default is 200.

--cnv-cbs-nperm

Specifies the number of permutations used for p-value computation. The default is 10000.

--cnv-cbs-trim

Specifies the proportion of data to be trimmed for variance calculations. The default is 0.025.

For more information, see segmentation

Purity/Ploidy model selection options

Option

Description

--cnv-use-somatic-vc-vaf

Use the variant allele frequencies (VAFs) from the somatic SNVs to help select the tumor model for the sample. For more information, see VAF-aware Mode.

--cnv-somatic-essential-genes-bed

BED file containing genes where the model should not predict HOMDEL

--cnv-somatic-enable-het-calling

Enable HET-calling mode for heterogeneous segments.

--cnv-somatic-enable-lower-ploidy-limit

Enable check on lower ploidy limit based on essential genes

--cnv-normal-cnv-vcf

Specify germline CNVs from the matched normal sample. For more information, see Germline-aware Mode.

--cnv-somatic-min-purity

Specify minimum purity to consider

--cnv-somatic-max-purity

Specify maximum purity to consider

--cnv-ascn-min-ploidy

Specify minimum ploidy to consider

--cnv-ascn-max-ploidy

Specify maximum ploidy to consider

For more information, see ASCN calling

Filtering Options

Option

Description

--cnv-enable-ref-calls

Emit copy-neutral (REF) calls in output VCF (defaultrue for WGS, false for WES)

--cnv-filter-qual

QUAL value at which to hard filter CNV VCF (default 40 for WGS/WES, 90 for WES depth-only)

--cnv-filter-length

Minimum event length (bp) for PASS calls (default 10000 for WGS, 0 for WES)

--cnv-filter-del-mean

SM value used to hard filter DELs in CNV VCF (Somatic WGS)

--cnv-filter-dup-mean

SM value used to hard filter DUPs in CNV VCF (Somatic WGS)

--cnv-filter-cnloh-maf

MAF value used to hard filter CNLOHs in CNV VCF (Somatic WGS)

--cnv-somatic-filter-het-length

Minimum event length to hard filter subclonal CNV VCF

--cnv-post-vcf-target-bed

BED file to keep only VCF entries overlapping with target regions

If --cnv-post-vcf-target-bed is specified, VCF records that do not overlap the provided BED intervals are filtered out. This is a post‑processing hard filter applied only to the output VCF and does not affect any upstream workflow steps or CNV modeling.

Other Options

Option

Description

--cnv-enable-tracks

Enables generation of IGV track files

--cnv-generate-pon-metric-file

Generate PON metric file for WES/targeted panel

--cnv-exclude-bed

BED file specifying intervals to exclude from analysis

--cnv-exclude-bed-min-overlap

Minimum overlap fraction for exclusion (default 0.5)

--cnv-sex-genotyper-num-interval-requirement

Number of sex contig interval requirements for sex genotyper (default:300)

CNV Output Files

The somatic CNV workflow generates the following output files:

File

Description

Format

<prefix>.tumor.target.counts.gz

Raw target counts before bias correction

gzipped TSV

<prefix>.tumor.target.counts.gc-corrected.gz

GC-bias corrected target counts

gzipped TSV

<prefix>.tumor.ballele.counts.gz

B-allele counts at population SNP sites

gzipped TSV

<prefix>.baf.bedgraph.gz

B-allele frequency in bedgraph format

gzipped bedGraph

<prefix>.tn.tsv.gz

Tangent-normalized coverage signal

gzipped TSV

<prefix>.cnv.excluded_intervals.bed.gz

List of target regions excluded

gzipped TSV

<prefix>.cnv.pon_metrics.tsv.gz

Coverage statistics of PON per interval

gzipped TSV

<prefix>.cnv.pon_correlation.txt.gz

Correlation between CASE and PON

gzipped TSV

<prefix>.seg

Segmentation results (depth and BAF)

TSV

<prefix>.cnv.purity.coverage.models.tsv

Model likelihood score for purity/ploidy estimation

TSV

<prefix>.cnv.vcf.gz

Primary CNV calls (VCF v4.4 by default)

gzipped VCF

<prefix>.cyto.vcf.gz

Cytogenetics-compatible calls (if enabled)

gzipped VCF

<prefix>.cnv_metrics.csv

Summary metrics including predicted sex

CSV

<prefix>.cnv.gff3

Variant calls in GFF format

GFF

<prefix>.tn.bw

Tangent-normalized signal track

BigWig

Target Counts Output

<prefix>.tumor.target.counts.gz

Compressed tab-delimited file containing the number of read counts per target interval. This is the raw signal as extracted from the alignments of the BAM or CRAM file. The format is identical for both the case sample and any panel of normals samples. There is also a bigWig representation of a target.counts.diploid file, which is normalized to the normal ploidy level of 2 instead of raw counts.

Columns:

Contig identifier
Start position
End position
Target interval name
Count of alignments in this interval
Count of improperly paired alignments in this interval

Header lines starting with # contain the DRAGEN version, command line, and other meta information.

Example:

#TARGET COUNTS FILE
##DRAGENVersion=<VERSION_INFO>
##DRAGENCommandLine=<CommandLineOptions>
#TargetCountOptions=<CNV_COUNTS_OPTIONS>
...
#Input target file: BED_FILENAME
contig  start   stop    name    WES_EA_N_1      improper_pairs
chr1    12080   12251   target-wes-chr1-12080:12251     662     0
chr1    12595   12802   target-wes-chr1-12595:12802     220     1
...

For more information, see Target Counts File

GC-Corrected Counts Output

<prefix>.tumor.target.counts.gc-corrected.gz

Contains GC-corrected read counts per target interval. The format is equivalent to the *.target.counts.gz file:

Contig identifier
Start position
End position
Target interval name
GC-corrected read counts in this interval
Count of improperly paired alignments in this interval

Example:

#GC CORRECTED FILE
##DRAGENVersion=<VERSION_INFO>
##DRAGENCommandLine=<CommandLineOptions>
#TargetCountOptions=<CNV_COUNTS_OPTIONS>
#Original input file: sample.target.counts.gz // raw counts filename
contig  start   stop    name    SampleName      improper_pairs
chr1    12080   12251   target-wes-chr1-12080:12251     981.529698      0
chr1    12595   12802   target-wes-chr1-12595:12802     50.05497673     1
chr1    13163   13658   target-wes-chr1-13163:13658     1086.20189      4
...

For more information, see GC bias correction

B-Allele Counts

In somatic ASCN runs, B-allele counts are calculated at sites in the tumor sample where the normal sample is likely to be heterozygous. When analyzed in conjunction with a matched normal sample, the sites are those that are called as heterozygous SNVs in the normal sample. When analyzed in tumor-only mode, sites are selected from a population collection (similar to germline ASCN runs). Each B-allele site consists of a reference allele and a variant allele, and the number of reads in the sample supporting each of these alleles is counted.

B-allele counts are written both to gzipped tsv file *.ballele.counts.gz and gzipped bedgraph file *.baf.bedgraph.gz.

<prefix>.ballele.counts.gz

Columns:

Contig identifier
Start, BED-style (zero-based inclusive) start position of the reference allele
Stop, BED-style (one-based inclusive) stop position of the reference allele
Base sequence for the reference allele
Base sequence for the first allele being counted
Base sequence for the second allele being counted
The number of qualified reads containing a sequence matching the first allele
The number of qualified reads containing a sequence matching the second allele

Additionally, in the case of B-allele sites from a population VCF, the following two additional columns are added after the columns listed above:

Population frequency for the first allele
Population frequency for the second allele

Example:

contig  start   stop    refAllele       allele1 allele2 allele1Count    allele2Count    allele1AF       allele2AF
chr1    51478   51479   T       T       A       4       2       0.6747  0.3253
chr1    82733   82734   T       T       C       111     36      0.79346 0.20654
chr1    83083   83084   T       T       A       0       0       0.1538  0.8462
chr1    86330   86331   A       A       G       9       9       0.87384 0.12616
chr1    88315   88316   G       G       A       0       0       0.8926  0.1074

B-Allele Counts BED Graph

<prefix>.baf.bedgraph.gz

B-allele frequency in bedgraph format. Allele count ratios are calculated by sorting alleles according to base priority {A, T, G, C} (descending), producing frequencies deterministically distributed above and below 0.5. This provides easy visualization in IGV of significant BAF changes between neighboring segments.

Example:

chr1    11021   11022   0.333333
chr1    14463   14464   0.755102
chr1    16494   16495   0.317708
chr1    38741   38742   0.5
chr1    39014   39015   0.44186

Normalization Output

<prefix>.tn.tsv.gz

Contains the normalized signal of the case sample per target interval, i.e., the log2-transformed copy ratio signal. A strong signal deviation from 0.0 indicates a potential for a CNV event. The format is equivalent to the *.target.counts.gz file:

Contig identifier
Start position
End position
Target interval name
Log2-transformed copy ratio in this interval
Count of improperly paired alignments in this interval

Header lines are also included that start with #. In some cases, the normalization counts could be patched internally with intervals from other processes, such as the SegDups extension. In such cases, patches are indicated (sorted in order of application) with header lines starting with #patch:

#patch 1 = <normalized_counts_patch_1_filename>
#patch 2 = <normalized_counts_patch_2_filename>
...

and the original (unpatched) *.tn.tsv.gz is renamed as *.tn.unpatched.tsv.gz. Note: this file is reported in output for inspection, but most use cases will use the (patched) *.tn.tsv.gz file downstream of normalization.

An example of a *.tn.tsv.gz file is shown below.

#title = Tangent normalized coverage profile
#sex = MALE
contig  start   stop    name    SampleName      improper_pairs
chr1    12080   12251   target-wes-chr1-12080:12251     -0.3025426810360819     0
chr1    12595   12802   target-wes-chr1-12595:12802     -0.10691600293612752    0
chr1    13163   13658   target-wes-chr1-13163:13658     -0.55258557719170587    6
...

For more information, see Normalization

Excluded Intervals Output

<prefix>.cnv.excluded_intervals.bed.gz

To improve accuracy, the DRAGEN CNV Pipeline excludes genomic intervals if one or more of the target intervals failed at least one quality requirement. The excluded intervals are reported to *.cnv.excluded_intervals.bed.gz file. The file has a bed format, identifies the regions of the genome that are not callable for CNV analysis and describes the reason intervals were excluded in the fourth column. The following are the possible reasons for exclusion.

Example:

chr1    258648  258852  PON_TARGET_FACTOR_THRESHOLD
...
chrX    151717091       151717377       EXCLUDE_BED
chrY    348335  348455  PON_UNMATCHED_GENDER
...

4th column provides reason for excluded intervals

For more information, see Excluded Intervals File

PON Metrics Output

<prefix>.cnv.pon_metrics.tsv.gz

The DRAGEN CNV Pipeline generates the PON Metrics File (.cnv.pon_metrics.tsv.gz) if a Panel of Normals is provided and --cnv-generate-pon-metric-file is set to true. If PON size is less than 2, then an empty file will be generated.

The PON Metric File includes basic statistics of the coverage profile for each interval. To remove sample coverage bias, DRAGEN applies sample median normalization, and then computes the metrics.

Example:

contig  start   stop    name    mean    std     normalizedStd min     25%     50%     75%     max     intervalSize    gcContents
1       12098   12178   target-wes-1-12098:12178/1      3.6259044560802365      0.46661435469856077      0.1286890927079175     2.7961783439490446      3.2573018790849675      3.7105263157894739      4.0162683823529415      4.3298969072164946      80      0.49382716049382713
1       12178   12258   target-wes-1-12178:12258/2      5.0685579775753595      0.70638315915955963      0.13936570564740217     3.9044585987261144      4.5225944682508761      5.067708333333333       5.5778115844038769      6.3277777777777775      80      0.46913580246913578
1       12553   12637   target-wes-1-12553:12637/1      4.6990858287992054      0.62537786269786677      0.13308500535681309     3.7417218543046356      4.0305632538350444      5.0382165605095546      5.2151580459770113      5.5773195876288657      84      0.6705882352941176
...

For more information, see PON Metrics File

PON Correlation Output

<prefix>.cnv.pon_correlation.txt.gz

The DRAGEN CNV Pipeline generates the PON Correlation File (.cnv.pon_correlation.txt.gz) if a Panel of Normals is provided. The PON Correlation File includes correlation between CASE sample and each PON sample.

Example:

Correlation of case sample CASE_SAMPLE_NAME
  PON1: 0.9786
  PON2: 0.9868
  PON3: 0.9912
  ...

For more information, see PON Correlation File

PON Combined Counts Output

<prefix>.combined.counts.txt.gz

If PON samples are provided by --cnv-normals-file or --cnv-normals-list, then CNV generate single PON file for later uses by --cnv-combined-counts option.

Example:

#COMBINED COUNTS FILE
##DRAGENVersion=<VERSION_INFO>
##DRAGENCommandLine=<CommandLineOptions>
#TargetCountOptions=<CNV_COUNTS_OPTIONS>
contig  start   stop    name    PON1  PON2  PON3  PON4  PON5  PON6  PON7  PON8  PON9
chr1    69411   69541   target-wes-chr1-69411:69541/1   0       1.8140869319999999      0       0       3.6301639140000002      0       3.6322517749999998      2.7239599229999998      0
chr1    69541   69670   target-wes-chr1-69541:69670/2   0       1.732555405     0       0       3.4641199290000002      0       3.4667661330000001      3.4653864840000002      0
chr1    785931  786282  target-wes-chr1-785931:786282/1 41.683179699999997      37.341243050000003      52.024789929999997      59.030795980000001      53.800898459999999      50.370179270000001      43.404515570000001      43.349519780000001      46.891320659999998
chr1    817466  817596  target-wes-chr1-817466:817596/1 1.9608427310000001      0       0.98101929590000003     1.9595376019999999      0       0.9799059067    1.957367598     4.9118208430000001      0.97657295369999997
chr1    826645  826950  target-wes-chr1-826645:826950/1 67.020948630000007      66.92125953     76.833943899999994      64.963436049999999      37.414978750000003      90.559157929999998      61.079245899999997      78.87869293     69.023087360000005
...

Segmentation Results

<prefix>.seg

Contains the segments produced by the segmentation algorithm. The Segment_Mean value of a segment is the ratio of the mean of that segment to the whole-sample median, without log transformation (linear copy-ratio). A strong signal deviation from 1.0 indicates a potential for a CNV event.

The file has the following columns:

Sample name
Contig identified
Start position
End position
Number of intervals in the segment
Linear copy-ratio of the segment

An example of a *.seg file is shown below.

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean
<SampleName> chr1    818022  1117426 224     0.82500341336435279
<SampleName> chr1    1117426 4063702 2438    0.91726081432236528
<SampleName> chr1    4063702 4067591 3       0.38861386123247205
<SampleName> chr1    4067591 7705829 3302    0.93021316913709917
<SampleName> chr1    7705829 9357003 1405    0.98147825043799442
<SampleName> chr1    9357003 9377365 19      0.50269670724395654
<SampleName> chr1    9377365 12859821        2905    1.0684818476332989

BAF Segmentation Output

<prefix>.baf.seg

In addition to segmentation of target counts, some workflows perform segmentation of B-allele loci. The output file has suffix *.baf.seg and it has the same format of the *.seg file with two modifications. First, the Segment_Mean value is the mean over B-allele loci of the smaller observed allele fraction. Second, there is an additional column:

BAF_SLM_STATE: Integer between 0 and 10, indicating bins of minor-allele fraction (low to high), or . when the BAF data are too variable to estimate a minor-allele fraction

An example of BAF segmentation output file is shown below:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean    BAF_SLM_STATE
<SampleName> chr1    820348  1104646 194     0.29301737166888697     6
<SampleName> chr1    1105091 1533754 444     0.26185904799069076     5
<SampleName> chr1    1533810 1534166 9       0.41958837071702065     8
<SampleName> chr1    1534217 9356793 6689    0.26034515815016335     5
<SampleName> chr1    9358304 9376529 27      0.46450553586280602     10

Purity/Coverage Models Output

<prefix>.cnv.purity.coverage.models.tsv

Contains the tested purity and diploid-coverage models along with their log-likelihood scores. Each row corresponds to a candidate model evaluated by the ASCN caller during model selection.

Columns:

Model purity (Cellularity) — fraction of cells in the sample due to tumor [0, 1]
Model diploid coverage — expected read count for a target bin in a diploid region
Model log-likelihood — log-likelihood score for this purity/coverage hypothesis
Approximate ploidy - approximate sample ploidy estimation before CNV calling, derived from the sample mean coverage
Failed constraints - model search constraints that were not satisfied by the model

The model with the highest log-likelihood is selected as the best estimate of tumor purity and ploidy. The selected purity is reported as EstimatedTumorPurity in the VCF header.

Example:

#Purity Coverage        logL    ApproxPloidy    FailedConstraints
0.05    400     -19966231.8629  5.99    MIN_MHD,HOM_DEL,USER_MIN_PURITY
0.05    401     -19952483.5915  5.88    MIN_MHD,HOM_DEL,USER_MIN_PURITY
0.10    402     -19938276.3649  5.77    
...

VCF Output

<prefix>.cnv.vcf.gz

The CNV VCF file follows the standard VCF format v4.4. The VCF header is annotated with ##source=<DRAGEN_SOURCE>, where <DRAGEN_SOURCE> identifies the caller which produced the VCF, e.g.:

DRAGEN_ASCN: CNV caller
DRAGEN_ASCN_SV: CNV caller + SV support
DRAGEN_CNV: legacy depth-only CNV caller (note: for legacy reasons this caller uses VCF version v4.2)

Due to the nature of how CNV events are represented, not all fields are applicable. In general, if more information is available about an event, then the information is annotated. To include copy neutral (REF) calls, set --cnv-enable-ref-calls to true. AOH/LOH events are not available in the legacy depth-only caller.

Example Records

# Example REF call
chr1    819841  DRAGEN:REF:chr1:819841-6103865  N       .       1000    PASS
  END=6103865;REFLEN=5284025
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/0:2:1:1000:1000:2.00155:1.000775:1.000775:129.1:0.5:4544:10920:66,10:0.00368019

# Example copy-neutral LOH call
chr1    6104347 DRAGEN:CNLOH:chr1:6104348-6727324       N       <LOH>   1000    PASS
  END=6727324;REFLEN=622977;SVLEN=622977;LOHTYPE=AOH;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  1/1:2:0:1000:1000:1.9876:0.001988:0.993798:128.2:0.001:528:916:10,12:0.00766703

# Example GAIN call
chr1    16715826        DRAGEN:GAIN:chr1:16715827-16949283      N       <DUP>   744     PASS
  END=16949283;REFLEN=233457;SVLEN=233457;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:3:1:1000:99:3.08217:1.134239:1.541085:198.8:0.368:49:26:20,14:0.0384615

# Example GAIN LOH call
chr15   20212550        DRAGEN:GAINLOH:chr15:20212551-20421468  N       <LOH>   390     PASS
  END=20421468;REFLEN=208918;SVLEN=208918;LOHTYPE=AOH;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  1/1:6:0:1:1:5.90559:0.000000:2.952793:380.91:0:76:1:9,8:0

# Example LOSS call
chr1    25274774        DRAGEN:LOSS:chr1:25274775-25331683      N       <DEL>   226     PASS
  END=25331683;REFLEN=56909;SVLEN=56909;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:1:0:1000:1000:1.01085:0.000000:0.505426:65.2:0:7:10:5,1:0

Header

The VCF header includes somatic-specific fields in addition to the common CNV header lines:

##fileformat=VCFv4.4
##ModelSource=DEPTH+BAF
##EstimatedTumorPurity=0.72
##DiploidCoverage=384.000000
##OverallPloidy=2.103412
##OutlierBafFraction=0.031287
##AlternativeModelDedup=0.72,192
##AlternativeModelDup=0.72,768
...

Description

ModelSource

basis on which the final tumor model was chosen (e.g., DEPTH+BAF, DEPTH+BAF_DOUBLED, VAF, SAMPLE_MEDIAN).

EstimatedTumorPurity

fraction of cells in the sample due to tumor. Range: [0, 1] or NA if a confident model could not be determined.

DiploidCoverage

expected read count for a target bin in a diploid region.

OverallPloidy

length-weighted average of copy number for PASS events in the tumor fraction.

OutlierBafFraction

fraction of B-allele frequencies incompatible with their segment call. High values may indicate a mismatched normal, cross-sample contamination, or bone marrow transplantation.

AlternativeModelDedup/AlternativeModelDup

alternative models corresponding to one fewer or one more whole-genome duplication, given as (purity, diploid_coverage). Useful for manual investigation.

Records

All coordinates in the VCF are 1-based.

Description

CHROM

The chromosome (or contig) on which the copy number variant occurs.

POS

Start position of the variant. If any of the ALT alleles is a symbolic allele (e.g., <DEL>), POS denotes the coordinate of the base preceding the polymorphism.

Encodes the event type and coordinates of the event (1-based, inclusive). Event types include GAIN, LOSS, REF, CNLOH, and GAINLOH.

REF

Contains N for all CNV events.

ALT

Specifies the type of CNV event: <DEL>, <DUP>, or <LOH>. REF calls have ALT .. With --cnv-enable-legacy-vcf-format (VCF v4.2), the ALT field contains <DEL>,<DUP> in place of <LOH> for AOH/LOH events.

QUAL

Estimated quality score used in hard filtering. Note: different workflows provide different QUAL score distributions - it is recommended to compare QUAL scores only within results from the same workflow (e.g., it is incorrect to compare QUAL scores between the CNV caller and the legacy (depth-only) CNV caller).

FILTER

The FILTER column contains PASS if the CNV event passes all filters, otherwise the column contains the name of the failed filter. Default values are defined in the header line for each available FILTER.

Description

binCount

CNV events with a bin count lower than a threshold.

cnvLength

The length of the CNV is lower than a threshold.

cnvQual

The QUAL of the CNV is lower than a threshold.

INFO

The INFO column contains information representing the event.

Description

REFLEN

Length of the event.

SVLEN

Length of the event. Only present for non-REF records. Note: in VCF v4.2 format (enabled with --cnv-enable-legacy-vcf-format), SVLEN is a signed representation of REFLEN (e.g., a negative value indicates a deletion).

SVTYPE

Always CNV. Only present for non-REF records.

END

End position of the event (1-based, inclusive).

LOHTYPE

Type of loss of heterozygosity. Possible values: CNLOH (Copy-Neutral LOH), GAINLOH (LOH with copy number gain).

HET

Tag identifying subclonal (heterogeneous) calls, present when --cnv-somatic-enable-het-calling is set

CIPOS

Confidence interval around the nominal POS.

CIEND

Confidence interval around the nominal END.

The meaning of the SVLEN, SVTYPE, END, CIPOS, and CIEND fields match their VCF v4.2 definitions.

If using a segment BED file, then the segment identifier is carried over from the input to SEGID field.

When Germline-aware Mode is enabled, DRAGEN annotates somatic VCF entries with:

Description

NCN

Germline copy number from the matched normal sample.

SCND

Somatic copy number difference relative to the germline copy number.

When matching CNV with SV output, additional INFO annotations are added.

FORMAT

The common FORMAT fields are described in the header:

Description

Genotype

Linear copy ratio of the segment mean

Estimated total copy number of tumor fraction

Number of read count bins

Number of improperly paired end reads at start and stop breakpoints

Number of allelic read count sites

CNF

Floating point estimate of copy number

CNQ

Exact total copy number Q-score

MAF

Estimate for the minor allele frequency

MCN

Estimated minor-haplotype copy number

MCNF

Floating point estimate of minor-haplotype copy number

MCNQ

Minor copy number Q-score

Mosaic fraction estimate (for MOSAIC calls)

OBF

Per-segment Outlier BAF Fraction. Percentage of BAF counts which are considered "outlier" with respect to the chosen segment call. Higher values might indicate segments where BAF counts are problematic.

Best estimate of segment's bias-corrected read count

For more information, see CNV VCF.

Cytogenetics Output

<prefix>.cyto.vcf.gz

The Cytogenetics modality output has a similar format to the standard CNV VCF (*.cnv.vcf.gz). A list of differences is indicated below:

Records can have the INFO/RES field. In such case, such field indicate the resolution(s) associated with the record.
Records can have the INFO/SEGID field. In such case, such field can either indicate custom predefined segments indicated in input by the user (similar to the standard CNV VCF), or Cytogenetics-specific predefined segments which are typically whole-arm/-chromosome segments automatically injected during the caller execution. In the latter case, the annotation field indicates the ID or name for the arm or chromosome.
The VCF header is annotated with ##source=DRAGEN_CYTO to indicate the file is generated by the Cytogenetics modality.

Note: The Cyto VCF also provides resolution-specific homozygosity indexes (i.e., computed on each specific resolution's callset). The default minimum size considered is the same as the main HomozygosityIndex, and for each resolution in output, there will be an additional header line on the Cyto VCF indicating the resulting metric, e.g., ##HomozygosityIndex(25k)=0.001015.

CNV Metrics Output

<prefix>.cnv_metrics.csv

The following metrics are reported:

Sex Genotyper

Metric

Description

Estimated sex

Estimated sex of the case sample (and panel of normals samples if applicable).

Confidence score

Range: [0.0, 1.0]. If the sample sex is specified via --sample-sex, this value is 0.0.

DRAGEN Sex Genotyper requires a minimum of 300 target intervals to confidently determine sex genotype; if the panel covers fewer intervals on the sex chromosomes, genotyping will fail and an undetermined genotype is returned. Users may lower this requirement by setting --cnv-sex-genotyper-num-interval-requirement to a smaller value, at the risk of increased false genotype calls.

CNV Summary

Bases in reference genome in use
Average alignment coverage over genome - The average alignment coverage over the genome is calculated by dividing the total number of bases from processed alignment records (excluding those filtered by the Target Counts stage in DRAGEN CNV) by the genome length. Alignment records are filtered taking into consideration duplicate marking status (if available), MAPQ, and mapping status.
Number of alignment records processed
- Number of filtered records (total)
- Number of filtered records (due to duplicates)
- Number of filtered records (due to MAPQ)
- Number of filtered records (due to being unmapped)
PMAD - Pairwise Median Absolute Deviation measures the variation in read coverage between adjacent bins. It measures variability due to various factors, such as DNA degradation, extraction, amplification or library preparation. Higher values indicate noisier sample data. PMAD is calculated as following:
- Define a vector v[i] as normalized counts of i-th interval in log scale, and d[i] as pairwise differences of consecutive normalized counts between i and i+1 intervals, i.e. d[i] = (v[i] - v[i+1])
- PMAD is median absolute deviation of d, i.e. PMAD = Median(|d[i]-Median(d)|)
Coverage MAD - Median absolute deviation of normalized case counts. Higher values indicate noisier sample data.
Median Bin Count - Median of raw counts normalized by interval size.
Number of target intervals
Number of normal samples
Number of segments
Number of amplifications - Note: GAINLOH events (ALT=LOH and CN > 2) are also included here
Number of deletions
Number of CNLOHs (Copy-Neutral LOHs)
Number of PASS amplifications - Note: GAINLOH events (ALT=LOH and CN > 2) are also included here
Number of PASS deletions
Number of PASS CNLOHs (Copy-Neutral LOHs)
Post-Normalization Bin Count Sigma - Standard deviation of post-PoN-normalization median-normalized coverage values.

Coverage MAD and Median Bin Count are only printed for WES germline/somatic CNV. Post-Normalization Bin Count Sigma is only printed when PoN normalization has been applied.

Example:

SEX GENOTYPER,,sample,UNDETERMINED,0.0000
SEX GENOTYPER,,v1r1_normal_60,UNDETERMINED,0.0000
...
CNV SUMMARY,,Bases in reference genome,3217346917
CNV SUMMARY,,OutlierBafFraction,0.049278
CNV SUMMARY,,beta-binomial overdispersion M,184.400000
CNV SUMMARY,,PMAD,0.067799
CNV SUMMARY,,Coverage MAD,0.06750
CNV SUMMARY,,Median Bin Count,1.80
...

For more information, see CNV Metrics

Track Files (IGV)

To generate additional equivalent bigWig and gff files, set the --cnv-enable-tracks option to true. These files can be loaded into IGV along with other tracks that are available, such as RefSeq genes. Using these tracks alongside publicly available tracks allows for easier interpretation of calls. DRAGEN autogenerates IGV session XML file if tracks are generated by DRAGEN CNV. The *.cnv.igv_session.xml can be loaded directly into IGV for analysis.

The following IGV tracks are automatically populated in the output IGV session file:

Track File

Description

Recommended View

*.target.counts.bw

BigWig representation of target counts bins. Values are GC-corrected if GC correction was performed.

Barchart or points

*.improper_pairs.bw

BigWig representation of improper pairs counts.

Barchart

*.tn.bw

BigWig representation of the tangent normalized signal.

Points

*.seg.bw

BigWig representation of the segments.

Points

*.baf.seg.bw

BigWig representation of BAF segments (if available).

Points

*.baf.bedgraph.gz

BED graph representation of B-allele frequency (if available).

Points

*.cnv.gff3

GFF3 representation of CNV events: DEL=blue, DUP=red, filtered=light gray, REF=green (if enabled), AOH/LOH=magenta. An example is shown below (different workflows may output different attributes on the 9th column).

—

Example GFF3 output:

##gff-version 3
chr1    DRAGEN  LOSS    12779193        12859821        30      .       .       Alt=DEL;LinearCopyRatio=0.576;CopyNumber=1;Genotype=0/1;Qual=30;Filter=PASS;Start=12779192;Stop=12859821;Length=80629;BinCount=24;ImproperPairsCount=16,7;color=#0000FF;
chr1    DRAGEN  REF     13106280        13122338        19      .       .       Alt=REF;LinearCopyRatio=1.05981;CopyNumber=2;Genotype=./.;Qual=19;Filter=PASS;Start=13106279;Stop=13122338;Length=16059;BinCount=8;ImproperPairsCount=3,1;color=#00FF00;
chr1    DRAGEN  GAIN    13225213        13247040        66      .       .       Alt=DUP;LinearCopyRatio=2.016;CopyNumber=4;Genotype=./1;Qual=66;Filter=PASS;Start=13225212;Stop=13247040;Length=21828;BinCount=9;ImproperPairsCount=7,5;color=#FF0000;

IGV Session

File extension: *.igv_session.xml

The IGV session XML file is prepopulated with track files generated by DRAGEN. The session file loads the reference genome that best matches the standard reference genomes in an IGV installation, by comparing the name of the --ref-dir specified on the command-line. Standard UCSC human reference genomes are autodetected, but any variations from the standard reference genomes might not be autodetected. To edit the genome detection, alter the genome attribute in the Session element to the reference genome you would like for analysis before loading into IGV. The reference identifier used by IGV might differ from the actual name of the genome. The following is an example edited session file.

<?xml version="1.0" encoding="utf-8"?>
<Session genome="b37" hasGeneTrack="false" hasSequenceTrack="true" version="8">
    <Resources>
        <Resource path="example.cnv.gff3"/>
        <Resource path="example.cnv.excluded_intervals.bed.gz"/>
        <Resource path="example.target.counts.bw"/>
        <Resource path="example.improper.pairs.bw"/>
        <Resource path="example.tn.bw"/>
        <Resource path="example.seg.bw"/>
    </Resources>
    <Panel height="500" width="1200" name="DataPanel">
        ...
    </Panel>
</Session>

Note that depending on the IGV version installed, it may come prepackaged with different flavors of GRCh37. The reference naming conventions have changed so a user may have to edit the genome field in the XML file directly. For example, IGV has traditionally packaged a b37 reference genome, but may also include a 1kg_v37 or a 1kg_b37+decoy, which will appear on the IGV user interface as "1kg, b37" or "1kg, b37+decoy" respectively.

You can determine what the correct encoding of a reference genome by going to File > Save Session... and then inspecting the generated igv_session.xml file.

Germline-aware Mode

To specify germline CNVs from a matched normal sample, use --cnv-normal-cnv-vcf. When specified, CNV records marked as PASS in the normal sample are used during tumor-sample segmentation to make sure that confident germline CNV boundaries are also boundaries in the somatic output. Segments with germline copy number changes that are relative to reference ploidy are excluded from somatic model selection. During somatic copy number calling and scoring, the germline copy number is used to modify the expected depth contribution from the normal contamination fraction of the tumor sample. The process leads to more accurate assignment of somatic copy number in regions of germline CNV. DRAGEN then annotates the somatic WGS CNV VCF entries with germline copy number (NCN) and the somatic copy number difference relative to germline (SCND) for the segments that have germline CNVs.

Example:

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--tumor-bam-input <TUMOR_BAM> \
--bam-input <NORMAL_BAM> \
--enable-variant-caller true \
--cnv-use-somatic-vc-baf true \
--cnv-normal-cnv-vcf <CNV_NORMAL_VCF>

VAF-aware Mode

If both the small variant caller and the CNV caller are enabled in a tumor-matched normal run, somatic SNV variant allele frequencies (VAFs) can inform the purity and ploidy model selection. VAF-based modeling is particularly useful when a tumor has limited copy number variation and/or CNVs are mostly subclonal (e.g., many liquid tumors), preventing the depth+BAF signal from reaching a clear model.

VAF information can also help determine the presence or absence of a whole-genome duplication even in clonal tumors with clear CNVs.

For tumor/matched-normal runs with --enable-variant-caller true, VAF-based modeling is enabled by default. To disable it, set --cnv-use-somatic-vc-vaf false.

Advanced Topics

Cytogenetics Modality

Conventional cytogenetics methodologies typically focus on larger alterations than the ones provided by NGS analyses. The Cytogenetics modality for the CNV caller allows the user to visualize CNAs at different resolutions, aiming at providing a more flexible workspace for different use cases.

It is enabled with --cnv-enable-cyto-output (default true for germline workflows). Not available for somatic WES workflows.

From the same sample, and during the same run, the Cytogenetics modality starts from the high resolution results (before smoothing) provided in the standard output CNV VCF. The output callset then undergoes multiple rounds of smoothing, going progressively from finer resolution to coarser resolution calls (larger alterations). Each round of smoothing produces a smoothed callset which is set aside and becomes the starting point for callsets with higher degree of smoothing.

At the end of the smoothing procedure, the Cytogenetics modality produces several outputs, e.g.:

Multiple GFF3 files, one for each round of smoothing (extension *cyto.<resolution_ID>.gff3).
A single VCF file, with extension *.cyto.vcf.gz. This file contains all callsets identified through the smoothing iterations, where the iteration identifier is stored on the INFO/RES field. Identical alterations across resolutions are deduplicated. In such case, the INFO/RES field will contain a comma-separated list of resolution identifiers.
- Some resolutions will be based on depth of coverage only (no BAF). Their INFO/RES value will reflect the original callset used as a starting point, with added suffix _depth. E.g., for depth-only calls derived from resolution 1M, the new callset will have resolution ID 1M_depth. Note: calls made at different resolutions or with different information (depth+BAF versus depth-only) may occasionally conflict. For instance, in a region that is AOH that also has a mosaic DEL, the region may be reported as AOH for the depth+BAF calling but may be reported as (mosaic) DEL for the depth-only track. The event type with the strongest evidence will be output for each resolution.
- An additional callset which does not conform to the ones above (no INFO/RES field) is the one containing whole-arm/-chromosome aneuploidies. For this callset, all reported records have the chromosome name or arm name in the INFO/SEGID field. Entries for this callset will not be present on any GFF3 file. For more details see the section on whole-chromosome aneuploidies below.
A single IGV session file, with extension *.cyto.igv_session.xml, which provides a convenient way to load the multiple GFF3 files and other typical tracks found on the standard *.cnv.igv_session.xml. Below an example screenshot of one of such IGV sessions:
- The first 5 tracks provide the DRAGEN CNV calls (Blue/DEL, Green/REF, Magenta/AOH, Red/DUP) at decreasing degree of resolution (from high to low, top to bottom).
- The remaining tracks are similar to the standard *cnv.igv_session.xml run, e.g.: poor mappability regions, target counts coverage, improper pairs, B-allele frequency, etc.

Below, an example set of calls from the *.cyto.vcf.gz output file (note additional INFO/RES annotation with respect to *.cnv.vcf.gz output file):

# Example REF call
chr1    819841  DRAGEN:REF:chr1:819841-6103865  N       .       1000    PASS
  END=6103865;REFLEN=5284025;RES=25k,500k,50k
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/0:2:1:1000:1000:2.00155:1.000775:1.000775:129.1:0.5:4544:10920:66,10:0.00368019

# Example GAIN call
chr1    16605768        DRAGEN:GAIN:chr1:16605769-16645359      N       <DUP>   427     PASS
  END=16645359;REFLEN=39591;RES=25k;SVLEN=39591;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE
  ./1:6:.:1:.:6.27065:.:3.135326:404.457:.:23:0:6,11

# Example LOSS call
chr1    25274774        DRAGEN:LOSS:chr1:25274775-25331683      N       <DEL>   226     PASS
  END=25331683;REFLEN=56909;RES=25k,50k;SVLEN=56909;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:1:0:1000:1000:1.01085:0.000000:0.505426:65.2:0:7:10:5,1:0

Selection of appropriate resolution

Since the most-informative resolution may vary depending on circumstances (event sizes, distance between calls, presence of smaller calls causing fragmentation, etc), no one-size-fits-all recommendation can work for all cases. However, some practical recommendations to consider are the following:

Each resolution INFO/RES ID identifies the minimum size for alterations to be considered PASS.
If only minimal call smoothing is necessary, resolution 25k can provide a good balance and provide calls in size ranges compatible with Chromosomal Microarray (CMA).
When comparing against technologies such as karyotyping, resolution 1M may be the more appropriate to reduce call fragmentation.

Note: if the use case under consideration is not impacted by call fragmentation, it is typically recommended to use the *.cnv.vcf.gz or *.cnv_sv.vcf.gz output results (instead of the ones in *.cyto.vcf.gz), to take full advantage of the superior detail of NGS.

Additional options

Option

Description

--cnv-cyto-keep-resolutions=<resolution_list>

Comma-separated list of resolutions to output (currently supported: 25k,50k,500k,1M,1M_depth)

Whole-chromosome Aneuploidy Detection

For some use cases, it is sometimes necessary to inspect a sample at arm or whole-chromosome level. Typically this would require the use of an additional caller, together with the standard CNV caller with automated segment detection. On the same run, the Cytogenetics modality provides such set of calls within the same VCF file (with extension *.cyto.vcf.gz).

chr21  12000000   DRAGEN:GAIN:chr21:12000001-46709983  N   <DUP>  1000  PASS
  END=46709983;REFLEN=34709983;SEGID=chr21q;SVLEN=34709983;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:3:1:1000:1000:3.00155:1.002518:1.500775:193.6:0.334:29570:66224:0,0:0.0016016

chrX   1        DRAGEN:LOSS:chrX:2-156040895     N     <DEL>  1000  PASS
  END=156040895;REFLEN=156040894;SEGID=chrX;SVLEN=156040894;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:1:0:1000:1000:0.996364:0.000996:0.498182:82.2:0.001:122580:144548:0,0:0.00995089

In the example above, two calls derived from such callset. The segment ID annotation (INFO/SEGID) provides the name for the segment call under consideration (i.e., for this example, q-arm of chromosome 21 and the entire chromosome X). REF calls are not displayed by default unless required explicitly by the user (i.e., with --cnv-enable-ref-calls true. Note: this will enable REF calls for both CNV and CYTO VCF files).

Note: acrocentric chromosomes (13, 14, 15, 21, and 22) have short arms characterized by repetitive regions. These regions create mappability issues and they are typically excluded from analysis. Thus, calling short arm alterations for these chromosomes is challenging, being based on a small percentage of total arm's length. To avoid false positive calls (in this case, indicating an alteration on the full short arm with evidence only coming from a minimal portion of it), the algorithm has a hard threshold (default 500 intervals) on the minimum number of intervals required when calling whole-arm alterations. When the chromosome arm call does not satisfy this threshold, the call is filtered with FILTER chromArmBinCount. The default can be changed with option cnv-filter-chrom-arm-bin-count.

Joint SV/CNV calling

Somatic joint calling performs copy number segment matching against all SVs with the starts and ends being matched independently.

Somatic joint calling is not enabled by default and must be enabled with --enable-cnv-sv-somatic true.

To ensure copy number neutral SVs have matching copy number segments, whenever --enable-cnv-sv-somatic is enabled, --cnv-enable-ref-calls is automatically enabled as well.

The following steps are performed:

SV calling is performed.
The SV call set is filtered to only PASS SV records.
For each SV, the breakpoint(s) at which a copy number transition would occur, if it were base-pair consistent with the SV, are obtained.
CNV segmentation is performed to obtain CNV breakpoints.
If --cnv-enable-sv-forced-segmentation is enabled, SV breakpoints are added to the CNV breakpoints. Segments are generated from the combined CNV and SV breakpoints.
- If a matching CNV breakpoint is found, the CNV breakpoint is adjusted to the SV breakpoint rather than adding a new breakpoint.
- If a matching CNV breakpoint is not found, the SV breakpoint is added. CNV segments are therefore split at the internal SV breakpoints.
CNV calling is performed on the segments.
Adjacent CNV segments in which the END/CIEND of the left segment overlaps the POS/CIPOS of the right segment are adjusted to remove the gap.
CNV segment start and end are independently matched to SV breakends based on POS/CIPOS and END/CIEND respectively. When there are multiple matching SVs, the inner-most position is matched.
If a segmentation gap is created due to SV matching, short CNV segments filling the gaps between SVs are created. Short CNV segments CN is set to the CN of the containing pre-adjusted segment.
SV <DEL>/<DUP> records that correspond to a single CNV <DEL>/<DUP> record are merged into a single VCF record. As with germline joint CNV+SV calling, these VCF record contains both the SV and CNV INFO and FORMAT fields.
The joint call set is written to the .cnv_sv.vcf.gz output file. cnv.vcf.gz and .sv.vcf.gz outputs are unaffected.

When --cnv-enable-sv-forced-segmentation is enabled, the somatic joint CNV+SV call set forms a breakpoint graph.

Example command lines

dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--bam-input <NORMALBAM> \
--tumor-bam-input <TUMORBAM> \
--enable-map-align false \
--enable-cnv true \
--enable-sv true \
--enable-cnv-sv-somatic true \
--cnv-enable-sv-forced-segmentation true

Joint SV/CNV VCF Output

The original CNV and SV VCF output files, prior to integration, are available for users in the DRAGEN output directory, as described elsewhere. Additionally, there is an enhanced CNV VCF available with the *.cnv_sv.vcf.gz extension. The VCF header lines in the *.cnv_sv.vcf.gz mostly correspond to a concatenation of the individual header lines from the CNV and SV VCFs, with a few lines deduplicated and some new ones added. For details on the legacy header lines, please refer to the individual CNV and SV user guide sections.

Newly added header lines are described in the following table.

Header Field

Number

Type

Description

END_LEFT_BND_OF

String

ID of CNV whose left end is matched to the end of SV

END_RIGHT_BND_OF

String

ID of CNV whose right end is matched to the end of SV

LEFT_BND

String

ID of SV that matches the left end of CNV record

LEFT_BND_OF

String

ID of CNV whose left end is matched to SV

MatchSv

Integer

ID of original SV that was merged with CNV record

OrigCnvEnd

Integer

Coordinate of original CNV END

OrigCnvPos

Integer

Coordinate of original CNV POS

RIGHT_BND

String

ID of SV that matches the right end of CNV record

RIGHT_BND_OF

String

ID of CNV whose right end is matched to SV

SVCLAIM

String

Claim made by the structural variant call. Valid values are D, J, DJ for: abundance, adjacency and both respectively

Records that can be matched or rescued will have annotations indicating the breakpoint linkage between a CNV and SV record. If a complete match is found, then the MatchSv annotation will be present in the record, indicating the SV record's ID field for this CNV record. In this case, BND notations refer to the merged record ID itself rather than the SV before merging. Furthermore, the use of the SVCLAIM field will indicate if the record has evidence arising from depth signal D, or junction signals J, or both DJ.

Because of the mixing of standalone SV records and CNV records, the FORMAT field may have different annotations. For details on the CNV or SV specific annotations, please refer to the individual CNV and SV user guide sections.

Records that can be matched or rescued will have FILTER set to PASS. The original FILTERs are retained for records that were not matched or rescued. For example, the cnvLength FILTER will still be applied to standalone CNV records (those with SVCLAIM=D).

Example records are shown below.

# Merged record, note presence of SVCLAIM=DJ and MatchSv
chr1    9357666 DRAGEN:LOSS:chr1:9357667-9377061        N       <DEL>   1000    PASS    END=9377061;REFLEN=19395;SVLEN=19395;SVTYPE=DEL;LEFT_BND=DRAGEN:LOSS:chr1:9357667-9377061;OrigCnvPos=9357666;CIPOS=0,2;RIGHT_BND=DRAGEN:LOSS:chr1:9357667-9377061;OrigCnvEnd=9377061;CIEND=0,2;SVCLAIM=DJ;MatchSv=DRAGEN:DEL:1268:0:1:0:0:0;HOMLEN=2;HOMSEQ=TC;SOMATIC;SOMATICSCORE=444.26;LCF;RIGHT_BND_OF=DRAGEN:GAINLOH:chr1:4066343-9357666;LEFT_BND_OF=DRAGEN:LOSS:chr1:9357667-9377061;END_RIGHT_BND_OF=DRAGEN:LOSS:chr1:9357667-9377061;END_LEFT_BND_OF=DRAGEN:GAINLOH:chr1:9377062-9495567      GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:PR:SR:VF:VF1:VAF1:VF2:VAF2       1/1:0:0:1000:585:0.007000:.:0.003500:0.7:.:19:0:95,103:0,100:0,49:0,119:0,119:1.000000:0,119:1.000000
 
# CNV record that did not match, note presence of SVCLAIM=D
chr1    143540109       DRAGEN:GAIN:chr1:143540110-143751543    N       <DUP>   1000    PASS    END=143751543;CIPOS=-269657,1792;CIEND=-1808,799863;REFLEN=211434;SVLEN=211434;SVTYPE=CNV;SVCLAIM=D     GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF       0/1:3:1:1000:1000:3.000000:1.134000:1.500000:300:0.378:139:119:24,19:0.0168067
  
# SV record that did not match, note presence of SVCLAIM=J
chr1    156918006       DRAGEN:DUP:TANDEM:15023:0:1:0:0:0       N       <DUP:TANDEM>    .       PASS    END=156930478;SVTYPE=DUP;SVLEN=12472;CIPOS=0,3;CIEND=0,3;HOMLEN=3;HOMSEQ=GTG;SOMATIC;SOMATICSCORE=174.11;LCF;RIGHT_BND_OF=DRAGEN:GAIN:chr1:156918005-156918006;LEFT_BND_OF=DRAGEN:GAIN:chr1:156918007-156930478;END_RIGHT_BND_OF=DRAGEN:GAIN:chr1:156918007-156930478;END_LEFT_BND_OF=DRAGEN:GAIN:chr1:156930479-157982548;SVCLAIM=J  PR:SR:VF:VF1:VAF1:VF2:VAF2:PSL  114,38:70,27:162,65:93,65:0.411392:69,65:0.485075:DRAGEN_BND_15023_1_1_2_4_0_0

PreviousGermline NextReference

Last updated 15 hours ago

Was this helpful?

hashtagOverview

hashtagWorkflow

hashtagExample Command Lines

hashtagWGS — Tumor-Normal (concurrent SNV caller)

hashtagWGS — Tumor-Only (population SNP VCF)

hashtagWES — Tumor-Normal (concurrent SNV caller)

hashtagWES — Tumor-Only (population SNP VCF)

hashtagRequired Options

hashtagInput Options

hashtagPop SNP download

hashtagOutput Options

hashtagTarget Counting Options

hashtagGC Bias Correction Options

hashtagNormalization Options

hashtagSegmentation Options

hashtagPurity/Ploidy model selection options

hashtagFiltering Options

hashtagOther Options

hashtagCNV Output Files

hashtagTarget Counts Output

hashtagGC-Corrected Counts Output

hashtagB-Allele Counts

hashtagB-Allele Counts BED Graph

hashtagNormalization Output

hashtagExcluded Intervals Output

hashtagPON Metrics Output

hashtagPON Correlation Output

hashtagPON Combined Counts Output

hashtagSegmentation Results

hashtagBAF Segmentation Output

hashtagPurity/Coverage Models Output

hashtagVCF Output

hashtagExample Records

hashtagHeader

hashtagRecords

hashtagFILTER

hashtagINFO

hashtagFORMAT

hashtagCytogenetics Output

hashtagCNV Metrics Output

hashtagTrack Files (IGV)

hashtagIGV Session

hashtagGermline-aware Mode

hashtagVAF-aware Mode

hashtagAdvanced Topics

hashtagCytogenetics Modality

hashtagJoint SV/CNV calling

hashtagExample command lines

hashtagJoint SV/CNV VCF Output

Overview

Workflow

Example Command Lines

WGS — Tumor-Normal (concurrent SNV caller)

WGS — Tumor-Only (population SNP VCF)

WES — Tumor-Normal (concurrent SNV caller)

WES — Tumor-Only (population SNP VCF)

Required Options

Input Options

Pop SNP download

Output Options

Target Counting Options

GC Bias Correction Options

Normalization Options

Segmentation Options

Purity/Ploidy model selection options

Filtering Options

Other Options

CNV Output Files

Target Counts Output

GC-Corrected Counts Output

B-Allele Counts

B-Allele Counts BED Graph

Normalization Output

Excluded Intervals Output

PON Metrics Output

PON Correlation Output

PON Combined Counts Output

Segmentation Results

BAF Segmentation Output

Purity/Coverage Models Output

VCF Output

Example Records

Header

Records

FILTER

INFO

FORMAT

Cytogenetics Output

CNV Metrics Output

Track Files (IGV)

IGV Session

Germline-aware Mode

VAF-aware Mode

Advanced Topics

Cytogenetics Modality

Joint SV/CNV calling

Example command lines

Joint SV/CNV VCF Output