Somatic
Overview
DRAGEN provides somatic copy number variant (CNV) calling workflows that detect copy number aberrations and regions with loss of heterozygosity (LOH) in whole genome sequencing (WGS) and whole exome sequencing (WES) data. The CNV workflows leverage both depth of coverage and B-allele frequencies (BAFs) to provide comprehensive detection of:
Copy number gains (duplications) and losses (deletions)
Copy-neutral loss of heterozygosity (CNLOH)
Subclonal alterations (WGS only, enabled by default)
Minor allele copy number estimation
Workflow
The DRAGEN somatic CNV workflow follows this processing pipeline: 
The pipeline consists of the following modules:
Target Counts — Binning of read counts and other signals from alignments
B-Allele Counts — Extraction of allelic read counts
Bias Correction — Correction of GC bias and other systematic biases
Normalization — Detection of normal ploidy levels and normalization
Segmentation — Breakpoint detection via segmentation of normalized depth and BAF signals
Allele Specific Copy Number (ASCN) Calling — Integration of depth and BAF segments to determine copy number states and allele-specific information
B-Allele Frequency Inputs: The pipeline supports multiple input options for estimating B‑allele frequencies (BAF), depending on the availability of a matched normal sample.
Matched normal already processed
If the matched normal sample has been processed with the germline small variant caller, the resulting VCF file can be provided directly.
Matched normal not yet processed
If the matched normal has not been processed, the user may provide raw reads or aligned reads and enable concurrent execution of the germline small variant caller.
In this case, DRAGEN CNV consumes the small variant caller output to estimate B‑allele frequencies from germline SNVs.
No matched normal available
A population SNV VCF may be provided.
DRAGEN estimates B‑allele frequencies using variants from the population SNV VCF.
For WES, population SNVs are intersected with the regions defined in cnv-target-bed. The target BED file must contain the same target intervals used to generate the PON.
Depth-Only Workflow (Legacy): For applications that require only fold-change detection without purity/ploidy model estimation, a legacy depth-only workflow is also available for WES and targeted panels. See Depth-Only Workflow for details.
Example Command Lines
WGS — Tumor-Normal (concurrent SNV caller)
If the matched normal has not been pre-processed, you can run the somatic SNV caller concurrently with CNV, which feeds germline heterozygous sites directly to the CNV caller:
To additionally enable Germline-aware Mode and VAF-aware Mode, add the following flags:
WGS — Tumor-Only (population SNP VCF)
If no matched normal is available, run in tumor-only mode using a population SNP catalog:
WES — Tumor-Normal (concurrent SNV caller)
WES — Tumor-Only (population SNP VCF)
Required Options
--enable-cnv
Enable CNV processing (set to true)
Input Options
DNA inputs
--tumor-fastq1, --tumor-fastq2
FASTQ input files (requires --enable-map-align true)
--tumor-bam-input
BAM input file
--tumor-cram-input
CRAM input file
B-Allele inputs
--cnv-normal-b-allele-vcf
Specify a matched normal SNV VCF.
--cnv-population-b-allele-vcf
Specify a population SNP catalog.
--cnv-use-somatic-vc-baf
If running in tumor-normal mode with the SNV caller enabled, use this option to specify the germline heterozygous sites.
For more information on specifying b-allele loci, see Specification of B-Allele Loci.
PON inputs
--cnv-target-bed
BED file defining exome capture regions (only for WES)
--cnv-normals-file
Specify individual normal counts file (target.counts.gz or target.counts.gc-corrected.gz) for PON. You can use this option multiple times, one time for each file.
--cnv-normals-list
Specify text file that contains paths to the list of reference target counts files to be used as a panel of normals (new line separated).
--cnv-combined-counts
Specify combined PON file (.combined.counts.txt.gz).
Other inputs
--ref-dir
DRAGEN reference genome hashtable directory
--enable-map-align
Enable mapper and aligner module
--sample-sex
Sample sex (e.g., male, female). If not specified, sex is estimated from data.
Pop SNP download
Population VCF files can be downloaded from link below:
Download links in the table opens an external Illumina download page: https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html
Output Options
--output-directory
Output directory for all results
--output-file-prefix
Prefix prepended to all output file names
--cnv-enable-cyto-output
Enable cytogenetics-compatible output VCF (default false) - only available for WGS
Target Counting Options
--cnv-counts-method
Specifies the counting method for an alignment to be counted in a target bin. Values are midpoint, start, or overlap. The default value is overlap when using the panel of normals approach, which means if an alignment overlaps any part of the target bin, the alignment is counted for that bin. In the self-normalization mode, the default counting method is start.
--cnv-min-mapq
Specifies the minimum MAPQ for an alignment to be counted during target counts generation. The default value is 3 for self-normalization and 20 otherwise. When generating counts for panel of normals, all MAPQ0 alignments are counted.
--cnv-target-bed
Specifies a properly formatted BED file that indicates the target intervals to sample coverage over. For use in WES analysis.
--cnv-interval-width
Specifies the width of the sampling interval for CNV processing. This option controls the effective window size. The default is 1000 for WGS analysis and 500 for WES analysis.
--cnv-skip-contig-list
Specifies a comma-separated list of contig identifiers to skip when generating intervals for WGS analysis. The default contigs that are skipped, if not specified, are chrM,MT,m,chrm.
--cnv-filter-duplicate-alignments
Filter duplicate marked alignments during target counts if option is set to true. The default setting is true unless map/align is enabled and duplicate marking is disabled.
Note that --cnv-filter-duplicate-alignments is only available with duplicate marking option set to true. For more information, see Filter Duplicate Alignments
For more information of target counting method description, see Target Counts
GC Bias Correction Options
--cnv-enable-gc-bias-correction
Enable or disable GC bias correction when generating target counts. The default is true.
--cnv-enable-gcbias-smoothing
Enable or disable smoothing the GC bias correction across adjacent GC bins with an exponential kernel. The default is true.
--cnv-num-gc-bins
Specifies the number of bins for GC bias correction. Each bin represents the GC content percentage. Allowed values are 10, 20, 25, 50, or 100. The default is 25.
For more information, see GC Bias Correction
Normalization Options
A Panel of normals (PON) is used to provide the reference baseline for copy number variants. PON is required for WES, while WGS can use either self (recommended) or PON normalization.
Self-normalization option
--cnv-enable-self-normalization
Enable/disable self normalization mode, which does not require a panel of normals (only available for WGS).
PON normalization options
--cnv-extreme-percentile
Specifies the extreme median percentile value at which to filter out samples. The default is 2.5.
--cnv-max-percent-zero-samples
Specifies the number of zero coverage samples allowed for a target. If the target exceeds the specified threshold, then the target is filtered out. The default value is 5%. The option is sensitive to the number of normal samples being used. Make sure you adjust the threshold accordingly. If your panel of normals size is small and the threshold not adjusted, the option could filter out targets that were not intended to be.
--cnv-max-percent-zero-targets
Specifies the number of zero coverage targets allowed for a sample. If sample exceeds the specified threshold, then the sample is filtered out. The default value is 2.5%. The option is sensitive to the total number of target intervals. Make sure you adjust the threshold accordingly. If the capture kit has a small number of probes and the threshold not adjusted, the option could filter out targets that were not intended to be.
--cnv-target-factor-threshold
Specifies the bottom percentile of panel of normals medians to filter out useable targets. The default is 1% for whole genome processing and 5% for targeted sequencing processing.
--cnv-truncate-threshold
Specifies a percentage threshold for truncating extreme outliers. The default is 0.1%.
--cnv-enable-gender-matched-pon
Enable/disable gender matched PON normalization. If enabled, DRAGEN uses matched gender PON for sex chromosome normalization. Sex chromosome intervals are filtered if PON has no matched gender sample. The default value is true.
--cnv-enable-cross-gender-adjustments-chrX
Enable normalization on chrX by adjusting coverage of PON samples according to the expected number of copies of chrX in male and female samples. If the case sample is male, coverage of female PON samples is scaled down by a factor of 2 on chrX. If the case sample is female, coverage of male PON samples is scaled up by a factor of 2 on chrX. If no male PON samples are available, chrY intervals will be filtered. This feature is only supported for germline enrichment runs. The default value is false; if set to true, then --cnv-enable-gender-matched-pon must also be true.
DRAGEN will select PON normalization if PON is provided. For more information, see normalization
Segmentation Options
The segmentation method for both WGS and WES somatic workflows, and in both tumor-normal and tumor-only configurations, is a variant of shifting level models (SLM) called adaptive shifting level models, or ASLM. This can be overridden with the option --cnv-segmentation-mode (see segmentation), but is not recommended.
--cnv-slm-eta
Probability that the segmenter changes to any other state than the current state going from the current target to the next target. This could also be expressed as the probability that the true depth for adjacent targets is different for reasons that simple counting noise does not adequately explain. Likewise, the stay-in-state probability is (1.0 - eta). The effective default value is 3e-3, the range is (0.0, 1.0) excluding endpoints. Decreasing this value results in longer segments and reduced fragmentation; increasing produces shorter segments with more fragmentation.
--cnv-slm-bafeta
Similar to above, but between adjacent B allele sites. The default value is 1e-7 for somatic WES, 1e-12 for somatic WGS tumor-normal, and 1e-20 for somatic WGS tumor-only. The range is (0.0, 1.0) excluding endpoints. Decreasing this value results in longer segments and reduced fragmentation; increasing produces shorter segments with more fragmentation. However, see below for the limited purpose of segmentation on B allele frequencies.
The B allele segmentation is performed separately and independently of the depth segmentation. It is a crude segmentation to find the segments which have a balanced B allele frequency, indicating both parental haplotypes are present at equal copy number. A subset of these B allele balanced segments, subject to some additional criteria, are then used to identify a common variance parameter for the depth domain. The ordinary SLM method used for depth-based segmentation is then extended to have a state-dependent emission variance computed from the common variance and scaled by the state mean. The B allele segmentation is not directly used after this, but it plays a critical role in determining the parameterization of the depth-based segmentation. However, there is no analogous parameter in ASLM to --cnv-slm-omega in SLM and HSLM as described for germline analyses.
The following options are documented here in proximity to segmentation options because of their direct relevance to each other. Once provisional calls for copy number (CN) and minor copy number (MCN) have been made on the resulting segments from the segmentation stage, given the selected purity/ploidy model, adjacent segments with the same CN and MCN are joined to form a single segment. This is continued until no two adjacent segments satisfy the merging criteria. Segment merging is a critical step which compensates for over-segmentation or over-fragmentation happening at the segmentation stage. However, segment merging cannot split segments apart, so it cannot compensate in the other direction. Thus, segmentation can afford to produce a degree of over-segmentation, but there is no compensatory mechanism for under-segmentation. These options control segment merging in somatic analyses and do not depend on the segmentation option settings.
--cnv-merge-distance
Maximum gap in base pairs between two adjacent segments that still allows them to be merged. The default is 10000 for somatic WGS, meaning segments must be within 10 kb of each other. For WES, the default is effectively unlimited, since target intervals are inherently non-contiguous.
--cnv-merge-threshold
Maximum difference in segment mean (linear copy ratio) between two adjacent segments that still allows them to be merged. The default is 0.025 for somatic WGS and 0.4 for somatic WES.
Setting --cnv-merge-threshold to zero disables segment merging entirely. This is not recommended.
You can specify additional CBS options
--cnv-cbs-alpha
Specifies the significance level for the test to accept change points. The default is 0.01.
--cnv-cbs-eta
Specifies the Type I error rate of the sequential boundary for early stopping when using the permutation method. The default is 0.05.
--cnv-cbs-kmax
Specifies maximum width of smaller segment for permutation. The default is 25.
--cnv-cbs-min-width
Specifies the minimum number of markers for a changed segment. The default is 2.
--cnv-cbs-nmin
Specifies the minimum length of data for maximum statistic approximation. The default is 200.
--cnv-cbs-nperm
Specifies the number of permutations used for p-value computation. The default is 10000.
--cnv-cbs-trim
Specifies the proportion of data to be trimmed for variance calculations. The default is 0.025.
For more information, see segmentation
Purity/Ploidy model selection options
--cnv-use-somatic-vc-vaf
Use the variant allele frequencies (VAFs) from the somatic SNVs to help select the tumor model for the sample. For more information, see VAF-aware Mode.
--cnv-somatic-essential-genes-bed
BED file containing genes where the model should not predict HOMDEL
--cnv-somatic-enable-het-calling
Enable HET-calling mode for heterogeneous segments.
--cnv-somatic-enable-lower-ploidy-limit
Enable check on lower ploidy limit based on essential genes
--cnv-normal-cnv-vcf
Specify germline CNVs from the matched normal sample. For more information, see Germline-aware Mode.
--cnv-somatic-min-purity
Specify minimum purity to consider
--cnv-somatic-max-purity
Specify maximum purity to consider
--cnv-ascn-min-ploidy
Specify minimum ploidy to consider
--cnv-ascn-max-ploidy
Specify maximum ploidy to consider
For more information, see ASCN calling
Filtering Options
--cnv-enable-ref-calls
Emit copy-neutral (REF) calls in output VCF (defaultrue for WGS, false for WES)
--cnv-filter-qual
QUAL value at which to hard filter CNV VCF (default 40 for WGS/WES, 90 for WES depth-only)
--cnv-filter-length
Minimum event length (bp) for PASS calls (default 10000 for WGS, 0 for WES)
--cnv-filter-del-mean
SM value used to hard filter DELs in CNV VCF (Somatic WGS)
--cnv-filter-dup-mean
SM value used to hard filter DUPs in CNV VCF (Somatic WGS)
--cnv-filter-cnloh-maf
MAF value used to hard filter CNLOHs in CNV VCF (Somatic WGS)
--cnv-somatic-filter-het-length
Minimum event length to hard filter subclonal CNV VCF
--cnv-post-vcf-target-bed
BED file to keep only VCF entries overlapping with target regions
If --cnv-post-vcf-target-bed is specified, VCF records that do not overlap the provided BED intervals are filtered out. This is a post‑processing hard filter applied only to the output VCF and does not affect any upstream workflow steps or CNV modeling.
Other Options
--cnv-enable-tracks
Enables generation of IGV track files
--cnv-generate-pon-metric-file
Generate PON metric file for WES/targeted panel
--cnv-exclude-bed
BED file specifying intervals to exclude from analysis
--cnv-exclude-bed-min-overlap
Minimum overlap fraction for exclusion (default 0.5)
--cnv-sex-genotyper-num-interval-requirement
Number of sex contig interval requirements for sex genotyper (default:300)
CNV Output Files
The somatic CNV workflow generates the following output files:
<prefix>.tumor.target.counts.gz
Raw target counts before bias correction
gzipped TSV
<prefix>.tumor.target.counts.gc-corrected.gz
GC-bias corrected target counts
gzipped TSV
<prefix>.tumor.ballele.counts.gz
B-allele counts at population SNP sites
gzipped TSV
<prefix>.baf.bedgraph.gz
B-allele frequency in bedgraph format
gzipped bedGraph
<prefix>.tn.tsv.gz
Tangent-normalized coverage signal
gzipped TSV
<prefix>.cnv.excluded_intervals.bed.gz
List of target regions excluded
gzipped TSV
<prefix>.cnv.pon_metrics.tsv.gz
Coverage statistics of PON per interval
gzipped TSV
<prefix>.cnv.pon_correlation.txt.gz
Correlation between CASE and PON
gzipped TSV
<prefix>.seg
Segmentation results (depth and BAF)
TSV
<prefix>.cnv.purity.coverage.models.tsv
Model likelihood score for purity/ploidy estimation
TSV
<prefix>.cnv.vcf.gz
Primary CNV calls (VCF v4.4 by default)
gzipped VCF
<prefix>.cyto.vcf.gz
Cytogenetics-compatible calls (if enabled)
gzipped VCF
<prefix>.cnv_metrics.csv
Summary metrics including predicted sex
CSV
<prefix>.cnv.gff3
Variant calls in GFF format
GFF
<prefix>.tn.bw
Tangent-normalized signal track
BigWig
Target Counts Output
<prefix>.tumor.target.counts.gz
Compressed tab-delimited file containing the number of read counts per target interval. This is the raw signal as extracted from the alignments of the BAM or CRAM file. The format is identical for both the case sample and any panel of normals samples. There is also a bigWig representation of a target.counts.diploid file, which is normalized to the normal ploidy level of 2 instead of raw counts.
Columns:
Contig identifier
Start position
End position
Target interval name
Count of alignments in this interval
Count of improperly paired alignments in this interval
Header lines starting with # contain the DRAGEN version, command line, and other meta information.
Example:
For more information, see Target Counts File
GC-Corrected Counts Output
<prefix>.tumor.target.counts.gc-corrected.gz
Contains GC-corrected read counts per target interval. The format is equivalent to the *.target.counts.gz file:
Contig identifier
Start position
End position
Target interval name
GC-corrected read counts in this interval
Count of improperly paired alignments in this interval
Example:
For more information, see GC bias correction
B-Allele Counts
In somatic ASCN runs, B-allele counts are calculated at sites in the tumor sample where the normal sample is likely to be heterozygous. When analyzed in conjunction with a matched normal sample, the sites are those that are called as heterozygous SNVs in the normal sample. When analyzed in tumor-only mode, sites are selected from a population collection (similar to germline ASCN runs). Each B-allele site consists of a reference allele and a variant allele, and the number of reads in the sample supporting each of these alleles is counted.
B-allele counts are written both to gzipped tsv file *.ballele.counts.gz and gzipped bedgraph file *.baf.bedgraph.gz.
<prefix>.ballele.counts.gz
Columns:
Contig identifier
Start, BED-style (zero-based inclusive) start position of the reference allele
Stop, BED-style (one-based inclusive) stop position of the reference allele
Base sequence for the reference allele
Base sequence for the first allele being counted
Base sequence for the second allele being counted
The number of qualified reads containing a sequence matching the first allele
The number of qualified reads containing a sequence matching the second allele
Additionally, in the case of B-allele sites from a population VCF, the following two additional columns are added after the columns listed above:
Population frequency for the first allele
Population frequency for the second allele
Example:
B-Allele Counts BED Graph
<prefix>.baf.bedgraph.gz
B-allele frequency in bedgraph format. Allele count ratios are calculated by sorting alleles according to base priority {A, T, G, C} (descending), producing frequencies deterministically distributed above and below 0.5. This provides easy visualization in IGV of significant BAF changes between neighboring segments.
Example:
Normalization Output
<prefix>.tn.tsv.gz
Contains the normalized signal of the case sample per target interval, i.e., the log2-transformed copy ratio signal. A strong signal deviation from 0.0 indicates a potential for a CNV event. The format is equivalent to the *.target.counts.gz file:
Contig identifier
Start position
End position
Target interval name
Log2-transformed copy ratio in this interval
Count of improperly paired alignments in this interval
Header lines are also included that start with #. In some cases, the normalization counts could be patched internally with intervals from other processes, such as the SegDups extension. In such cases, patches are indicated (sorted in order of application) with header lines starting with #patch:
and the original (unpatched) *.tn.tsv.gz is renamed as *.tn.unpatched.tsv.gz. Note: this file is reported in output for inspection, but most use cases will use the (patched) *.tn.tsv.gz file downstream of normalization.
An example of a *.tn.tsv.gz file is shown below.
For more information, see Normalization
Excluded Intervals Output
<prefix>.cnv.excluded_intervals.bed.gz
To improve accuracy, the DRAGEN CNV Pipeline excludes genomic intervals if one or more of the target intervals failed at least one quality requirement. The excluded intervals are reported to *.cnv.excluded_intervals.bed.gz file. The file has a bed format, identifies the regions of the genome that are not callable for CNV analysis and describes the reason intervals were excluded in the fourth column. The following are the possible reasons for exclusion.
Example:
4th column provides reason for excluded intervals
For more information, see Excluded Intervals File
PON Metrics Output
<prefix>.cnv.pon_metrics.tsv.gz
The DRAGEN CNV Pipeline generates the PON Metrics File (.cnv.pon_metrics.tsv.gz) if a Panel of Normals is provided and --cnv-generate-pon-metric-file is set to true. If PON size is less than 2, then an empty file will be generated.
The PON Metric File includes basic statistics of the coverage profile for each interval. To remove sample coverage bias, DRAGEN applies sample median normalization, and then computes the metrics.
Example:
For more information, see PON Metrics File
PON Correlation Output
<prefix>.cnv.pon_correlation.txt.gz
The DRAGEN CNV Pipeline generates the PON Correlation File (.cnv.pon_correlation.txt.gz) if a Panel of Normals is provided. The PON Correlation File includes correlation between CASE sample and each PON sample.
Example:
For more information, see PON Correlation File
PON Combined Counts Output
<prefix>.combined.counts.txt.gz
If PON samples are provided by --cnv-normals-file or --cnv-normals-list, then CNV generate single PON file for later uses by --cnv-combined-counts option.
Example:
Segmentation Results
<prefix>.seg
Contains the segments produced by the segmentation algorithm. The Segment_Mean value of a segment is the ratio of the mean of that segment to the whole-sample median, without log transformation (linear copy-ratio). A strong signal deviation from 1.0 indicates a potential for a CNV event.
The file has the following columns:
Sample name
Contig identified
Start position
End position
Number of intervals in the segment
Linear copy-ratio of the segment
An example of a *.seg file is shown below.
BAF Segmentation Output
<prefix>.baf.seg
In addition to segmentation of target counts, some workflows perform segmentation of B-allele loci. The output file has suffix *.baf.seg and it has the same format of the *.seg file with two modifications. First, the Segment_Mean value is the mean over B-allele loci of the smaller observed allele fraction. Second, there is an additional column:
BAF_SLM_STATE: Integer between 0 and 10, indicating bins of minor-allele fraction (low to high), or.when the BAF data are too variable to estimate a minor-allele fraction
An example of BAF segmentation output file is shown below:
Purity/Coverage Models Output
<prefix>.cnv.purity.coverage.models.tsv
Contains the tested purity and diploid-coverage models along with their log-likelihood scores. Each row corresponds to a candidate model evaluated by the ASCN caller during model selection.
Columns:
Model purity (Cellularity) — fraction of cells in the sample due to tumor [0, 1]
Model diploid coverage — expected read count for a target bin in a diploid region
Model log-likelihood — log-likelihood score for this purity/coverage hypothesis
Approximate ploidy - approximate sample ploidy estimation before CNV calling, derived from the sample mean coverage
Failed constraints - model search constraints that were not satisfied by the model
The model with the highest log-likelihood is selected as the best estimate of tumor purity and ploidy. The selected purity is reported as EstimatedTumorPurity in the VCF header.
Example:
VCF Output
<prefix>.cnv.vcf.gz
The CNV VCF file follows the standard VCF format v4.4. The VCF header is annotated with ##source=<DRAGEN_SOURCE>, where <DRAGEN_SOURCE> identifies the caller which produced the VCF, e.g.:
DRAGEN_ASCN: CNV callerDRAGEN_ASCN_SV: CNV caller + SV supportDRAGEN_CNV: legacy depth-only CNV caller (note: for legacy reasons this caller uses VCF version v4.2)
Due to the nature of how CNV events are represented, not all fields are applicable. In general, if more information is available about an event, then the information is annotated. To include copy neutral (REF) calls, set --cnv-enable-ref-calls to true. AOH/LOH events are not available in the legacy depth-only caller.
Example Records
Header
The VCF header includes somatic-specific fields in addition to the common CNV header lines:
ModelSource
basis on which the final tumor model was chosen (e.g., DEPTH+BAF, DEPTH+BAF_DOUBLED, VAF, SAMPLE_MEDIAN).
EstimatedTumorPurity
fraction of cells in the sample due to tumor. Range: [0, 1] or NA if a confident model could not be determined.
DiploidCoverage
expected read count for a target bin in a diploid region.
OverallPloidy
length-weighted average of copy number for PASS events in the tumor fraction.
OutlierBafFraction
fraction of B-allele frequencies incompatible with their segment call. High values may indicate a mismatched normal, cross-sample contamination, or bone marrow transplantation.
AlternativeModelDedup/AlternativeModelDup
alternative models corresponding to one fewer or one more whole-genome duplication, given as (purity, diploid_coverage). Useful for manual investigation.
Records
All coordinates in the VCF are 1-based.
CHROM
The chromosome (or contig) on which the copy number variant occurs.
POS
Start position of the variant. If any of the ALT alleles is a symbolic allele (e.g., <DEL>), POS denotes the coordinate of the base preceding the polymorphism.
ID
Encodes the event type and coordinates of the event (1-based, inclusive). Event types include GAIN, LOSS, REF, CNLOH, and GAINLOH.
REF
Contains N for all CNV events.
ALT
Specifies the type of CNV event: <DEL>, <DUP>, or <LOH>. REF calls have ALT .. With --cnv-enable-legacy-vcf-format (VCF v4.2), the ALT field contains <DEL>,<DUP> in place of <LOH> for AOH/LOH events.
QUAL
Estimated quality score used in hard filtering. Note: different workflows provide different QUAL score distributions - it is recommended to compare QUAL scores only within results from the same workflow (e.g., it is incorrect to compare QUAL scores between the CNV caller and the legacy (depth-only) CNV caller).
FILTER
The FILTER column contains PASS if the CNV event passes all filters, otherwise the column contains the name of the failed filter. Default values are defined in the header line for each available FILTER.
binCount
CNV events with a bin count lower than a threshold.
cnvLength
The length of the CNV is lower than a threshold.
cnvQual
The QUAL of the CNV is lower than a threshold.
INFO
The INFO column contains information representing the event.
REFLEN
Length of the event.
SVLEN
Length of the event. Only present for non-REF records. Note: in VCF v4.2 format (enabled with --cnv-enable-legacy-vcf-format), SVLEN is a signed representation of REFLEN (e.g., a negative value indicates a deletion).
SVTYPE
Always CNV. Only present for non-REF records.
END
End position of the event (1-based, inclusive).
LOHTYPE
Type of loss of heterozygosity. Possible values: CNLOH (Copy-Neutral LOH), GAINLOH (LOH with copy number gain).
HET
Tag identifying subclonal (heterogeneous) calls, present when --cnv-somatic-enable-het-calling is set
CIPOS
Confidence interval around the nominal POS.
CIEND
Confidence interval around the nominal END.
The meaning of the SVLEN, SVTYPE, END, CIPOS, and CIEND fields match their VCF v4.2 definitions.
If using a segment BED file, then the segment identifier is carried over from the input to SEGID field.
When Germline-aware Mode is enabled, DRAGEN annotates somatic VCF entries with:
NCN
Germline copy number from the matched normal sample.
SCND
Somatic copy number difference relative to the germline copy number.
When matching CNV with SV output, additional INFO annotations are added.
FORMAT
The common FORMAT fields are described in the header:
GT
Genotype
SM
Linear copy ratio of the segment mean
CN
Estimated total copy number of tumor fraction
BC
Number of read count bins
PE
Number of improperly paired end reads at start and stop breakpoints
AS
Number of allelic read count sites
CNF
Floating point estimate of copy number
CNQ
Exact total copy number Q-score
MAF
Estimate for the minor allele frequency
MCN
Estimated minor-haplotype copy number
MCNF
Floating point estimate of minor-haplotype copy number
MCNQ
Minor copy number Q-score
MF
Mosaic fraction estimate (for MOSAIC calls)
OBF
Per-segment Outlier BAF Fraction. Percentage of BAF counts which are considered "outlier" with respect to the chosen segment call. Higher values might indicate segments where BAF counts are problematic.
SD
Best estimate of segment's bias-corrected read count
For more information, see CNV VCF.
Cytogenetics Output
<prefix>.cyto.vcf.gz
The Cytogenetics modality output has a similar format to the standard CNV VCF (*.cnv.vcf.gz). A list of differences is indicated below:
Records can have the
INFO/RESfield. In such case, such field indicate the resolution(s) associated with the record.Records can have the
INFO/SEGIDfield. In such case, such field can either indicate custom predefined segments indicated in input by the user (similar to the standard CNV VCF), or Cytogenetics-specific predefined segments which are typically whole-arm/-chromosome segments automatically injected during the caller execution. In the latter case, the annotation field indicates the ID or name for the arm or chromosome.The VCF header is annotated with
##source=DRAGEN_CYTOto indicate the file is generated by the Cytogenetics modality.
Note: The Cyto VCF also provides resolution-specific homozygosity indexes (i.e., computed on each specific resolution's callset). The default minimum size considered is the same as the main HomozygosityIndex, and for each resolution in output, there will be an additional header line on the Cyto VCF indicating the resulting metric, e.g., ##HomozygosityIndex(25k)=0.001015.
CNV Metrics Output
<prefix>.cnv_metrics.csv
The following metrics are reported:
Sex Genotyper
Estimated sex
Estimated sex of the case sample (and panel of normals samples if applicable).
Confidence score
Range: [0.0, 1.0]. If the sample sex is specified via --sample-sex, this value is 0.0.
DRAGEN Sex Genotyper requires a minimum of 300 target intervals to confidently determine sex genotype; if the panel covers fewer intervals on the sex chromosomes, genotyping will fail and an undetermined genotype is returned. Users may lower this requirement by setting --cnv-sex-genotyper-num-interval-requirement to a smaller value, at the risk of increased false genotype calls.
CNV Summary
Bases in reference genome in use
Average alignment coverage over genome - The average alignment coverage over the genome is calculated by dividing the total number of bases from processed alignment records (excluding those filtered by the Target Counts stage in DRAGEN CNV) by the genome length. Alignment records are filtered taking into consideration duplicate marking status (if available), MAPQ, and mapping status.
Number of alignment records processed
Number of filtered records (total)
Number of filtered records (due to duplicates)
Number of filtered records (due to MAPQ)
Number of filtered records (due to being unmapped)
PMAD - Pairwise Median Absolute Deviation measures the variation in read coverage between adjacent bins. It measures variability due to various factors, such as DNA degradation, extraction, amplification or library preparation. Higher values indicate noisier sample data. PMAD is calculated as following:
Define a vector v[i] as normalized counts of i-th interval in log scale, and d[i] as pairwise differences of consecutive normalized counts between i and i+1 intervals, i.e. d[i] = (v[i] - v[i+1])
PMAD is median absolute deviation of d, i.e. PMAD = Median(|d[i]-Median(d)|)
Coverage MAD - Median absolute deviation of normalized case counts. Higher values indicate noisier sample data.
Median Bin Count - Median of raw counts normalized by interval size.
Number of target intervals
Number of normal samples
Number of segments
Number of amplifications - Note: GAINLOH events (ALT=LOH and CN > 2) are also included here
Number of deletions
Number of CNLOHs (Copy-Neutral LOHs)
Number of PASS amplifications - Note: GAINLOH events (ALT=LOH and CN > 2) are also included here
Number of PASS deletions
Number of PASS CNLOHs (Copy-Neutral LOHs)
Post-Normalization Bin Count Sigma - Standard deviation of post-PoN-normalization median-normalized coverage values.
Coverage MAD and Median Bin Count are only printed for WES germline/somatic CNV. Post-Normalization Bin Count Sigma is only printed when PoN normalization has been applied.
Example:
For more information, see CNV Metrics
Track Files (IGV)
To generate additional equivalent bigWig and gff files, set the --cnv-enable-tracks option to true. These files can be loaded into IGV along with other tracks that are available, such as RefSeq genes. Using these tracks alongside publicly available tracks allows for easier interpretation of calls. DRAGEN autogenerates IGV session XML file if tracks are generated by DRAGEN CNV. The *.cnv.igv_session.xml can be loaded directly into IGV for analysis.
The following IGV tracks are automatically populated in the output IGV session file:
*.target.counts.bw
BigWig representation of target counts bins. Values are GC-corrected if GC correction was performed.
Barchart or points
*.improper_pairs.bw
BigWig representation of improper pairs counts.
Barchart
*.tn.bw
BigWig representation of the tangent normalized signal.
Points
*.seg.bw
BigWig representation of the segments.
Points
*.baf.seg.bw
BigWig representation of BAF segments (if available).
Points
*.baf.bedgraph.gz
BED graph representation of B-allele frequency (if available).
Points
*.cnv.gff3
GFF3 representation of CNV events: DEL=blue, DUP=red, filtered=light gray, REF=green (if enabled), AOH/LOH=magenta. An example is shown below (different workflows may output different attributes on the 9th column).
—
Example GFF3 output:
IGV Session

File extension: *.igv_session.xml
The IGV session XML file is prepopulated with track files generated by DRAGEN. The session file loads the reference genome that best matches the standard reference genomes in an IGV installation, by comparing the name of the --ref-dir specified on the command-line. Standard UCSC human reference genomes are autodetected, but any variations from the standard reference genomes might not be autodetected. To edit the genome detection, alter the genome attribute in the Session element to the reference genome you would like for analysis before loading into IGV. The reference identifier used by IGV might differ from the actual name of the genome. The following is an example edited session file.
Note that depending on the IGV version installed, it may come prepackaged with different flavors of GRCh37. The reference naming conventions have changed so a user may have to edit the genome field in the XML file directly. For example, IGV has traditionally packaged a b37 reference genome, but may also include a 1kg_v37 or a 1kg_b37+decoy, which will appear on the IGV user interface as "1kg, b37" or "1kg, b37+decoy" respectively.
You can determine what the correct encoding of a reference genome by going to File > Save Session... and then inspecting the generated igv_session.xml file.

Germline-aware Mode
To specify germline CNVs from a matched normal sample, use --cnv-normal-cnv-vcf. When specified, CNV records marked as PASS in the normal sample are used during tumor-sample segmentation to make sure that confident germline CNV boundaries are also boundaries in the somatic output. Segments with germline copy number changes that are relative to reference ploidy are excluded from somatic model selection. During somatic copy number calling and scoring, the germline copy number is used to modify the expected depth contribution from the normal contamination fraction of the tumor sample. The process leads to more accurate assignment of somatic copy number in regions of germline CNV. DRAGEN then annotates the somatic WGS CNV VCF entries with germline copy number (NCN) and the somatic copy number difference relative to germline (SCND) for the segments that have germline CNVs.
Example:
VAF-aware Mode
If both the small variant caller and the CNV caller are enabled in a tumor-matched normal run, somatic SNV variant allele frequencies (VAFs) can inform the purity and ploidy model selection. VAF-based modeling is particularly useful when a tumor has limited copy number variation and/or CNVs are mostly subclonal (e.g., many liquid tumors), preventing the depth+BAF signal from reaching a clear model.
VAF information can also help determine the presence or absence of a whole-genome duplication even in clonal tumors with clear CNVs.
For tumor/matched-normal runs with --enable-variant-caller true, VAF-based modeling is enabled by default. To disable it, set --cnv-use-somatic-vc-vaf false.
Advanced Topics
Cytogenetics Modality
Conventional cytogenetics methodologies typically focus on larger alterations than the ones provided by NGS analyses. The Cytogenetics modality for the CNV caller allows the user to visualize CNAs at different resolutions, aiming at providing a more flexible workspace for different use cases.
It is enabled with --cnv-enable-cyto-output (default true for germline workflows). Not available for somatic WES workflows.
From the same sample, and during the same run, the Cytogenetics modality starts from the high resolution results (before smoothing) provided in the standard output CNV VCF. The output callset then undergoes multiple rounds of smoothing, going progressively from finer resolution to coarser resolution calls (larger alterations). Each round of smoothing produces a smoothed callset which is set aside and becomes the starting point for callsets with higher degree of smoothing.

At the end of the smoothing procedure, the Cytogenetics modality produces several outputs, e.g.:
Multiple GFF3 files, one for each round of smoothing (extension
*cyto.<resolution_ID>.gff3).A single VCF file, with extension
*.cyto.vcf.gz. This file contains all callsets identified through the smoothing iterations, where the iteration identifier is stored on theINFO/RESfield. Identical alterations across resolutions are deduplicated. In such case, theINFO/RESfield will contain a comma-separated list of resolution identifiers.Some resolutions will be based on depth of coverage only (no BAF). Their
INFO/RESvalue will reflect the original callset used as a starting point, with added suffix_depth. E.g., for depth-only calls derived from resolution1M, the new callset will have resolution ID1M_depth. Note: calls made at different resolutions or with different information (depth+BAF versus depth-only) may occasionally conflict. For instance, in a region that is AOH that also has a mosaic DEL, the region may be reported as AOH for the depth+BAF calling but may be reported as (mosaic) DEL for the depth-only track. The event type with the strongest evidence will be output for each resolution.An additional callset which does not conform to the ones above (no
INFO/RESfield) is the one containing whole-arm/-chromosome aneuploidies. For this callset, all reported records have the chromosome name or arm name in theINFO/SEGIDfield. Entries for this callset will not be present on any GFF3 file. For more details see the section on whole-chromosome aneuploidies below.
A single IGV session file, with extension
*.cyto.igv_session.xml, which provides a convenient way to load the multiple GFF3 files and other typical tracks found on the standard*.cnv.igv_session.xml. Below an example screenshot of one of such IGV sessions:The first 5 tracks provide the DRAGEN CNV calls (Blue/DEL, Green/REF, Magenta/AOH, Red/DUP) at decreasing degree of resolution (from high to low, top to bottom).
The remaining tracks are similar to the standard
*cnv.igv_session.xmlrun, e.g.: poor mappability regions, target counts coverage, improper pairs, B-allele frequency, etc.

Below, an example set of calls from the *.cyto.vcf.gz output file (note additional INFO/RES annotation with respect to *.cnv.vcf.gz output file):
Selection of appropriate resolution
Since the most-informative resolution may vary depending on circumstances (event sizes, distance between calls, presence of smaller calls causing fragmentation, etc), no one-size-fits-all recommendation can work for all cases. However, some practical recommendations to consider are the following:
Each resolution
INFO/RESID identifies the minimum size for alterations to be considered PASS.If only minimal call smoothing is necessary, resolution 25k can provide a good balance and provide calls in size ranges compatible with Chromosomal Microarray (CMA).
When comparing against technologies such as karyotyping, resolution 1M may be the more appropriate to reduce call fragmentation.
Note: if the use case under consideration is not impacted by call fragmentation, it is typically recommended to use the *.cnv.vcf.gz or *.cnv_sv.vcf.gz output results (instead of the ones in *.cyto.vcf.gz), to take full advantage of the superior detail of NGS.
Additional options
--cnv-cyto-keep-resolutions=<resolution_list>
Comma-separated list of resolutions to output (currently supported: 25k,50k,500k,1M,1M_depth)
Whole-chromosome Aneuploidy Detection
For some use cases, it is sometimes necessary to inspect a sample at arm or whole-chromosome level. Typically this would require the use of an additional caller, together with the standard CNV caller with automated segment detection. On the same run, the Cytogenetics modality provides such set of calls within the same VCF file (with extension *.cyto.vcf.gz).
In the example above, two calls derived from such callset. The segment ID annotation (INFO/SEGID) provides the name for the segment call under consideration (i.e., for this example, q-arm of chromosome 21 and the entire chromosome X). REF calls are not displayed by default unless required explicitly by the user (i.e., with --cnv-enable-ref-calls true. Note: this will enable REF calls for both CNV and CYTO VCF files).
Note: acrocentric chromosomes (13, 14, 15, 21, and 22) have short arms characterized by repetitive regions. These regions create mappability issues and they are typically excluded from analysis. Thus, calling short arm alterations for these chromosomes is challenging, being based on a small percentage of total arm's length. To avoid false positive calls (in this case, indicating an alteration on the full short arm with evidence only coming from a minimal portion of it), the algorithm has a hard threshold (default 500 intervals) on the minimum number of intervals required when calling whole-arm alterations. When the chromosome arm call does not satisfy this threshold, the call is filtered with FILTER chromArmBinCount. The default can be changed with option cnv-filter-chrom-arm-bin-count.
Joint SV/CNV calling
Somatic joint calling performs copy number segment matching against all SVs with the starts and ends being matched independently.
Somatic joint calling is not enabled by default and must be enabled with --enable-cnv-sv-somatic true.
To ensure copy number neutral SVs have matching copy number segments, whenever --enable-cnv-sv-somatic is enabled, --cnv-enable-ref-calls is automatically enabled as well.
The following steps are performed:
SV calling is performed.
The SV call set is filtered to only PASS SV records.
For each SV, the breakpoint(s) at which a copy number transition would occur, if it were base-pair consistent with the SV, are obtained.
CNV segmentation is performed to obtain CNV breakpoints.
If
--cnv-enable-sv-forced-segmentationis enabled, SV breakpoints are added to the CNV breakpoints. Segments are generated from the combined CNV and SV breakpoints.If a matching CNV breakpoint is found, the CNV breakpoint is adjusted to the SV breakpoint rather than adding a new breakpoint.
If a matching CNV breakpoint is not found, the SV breakpoint is added. CNV segments are therefore split at the internal SV breakpoints.
CNV calling is performed on the segments.
Adjacent CNV segments in which the END/CIEND of the left segment overlaps the POS/CIPOS of the right segment are adjusted to remove the gap.
CNV segment start and end are independently matched to SV breakends based on POS/CIPOS and END/CIEND respectively. When there are multiple matching SVs, the inner-most position is matched.
If a segmentation gap is created due to SV matching, short CNV segments filling the gaps between SVs are created. Short CNV segments CN is set to the CN of the containing pre-adjusted segment.
SV
<DEL>/<DUP>records that correspond to a single CNV<DEL>/<DUP>record are merged into a single VCF record. As with germline joint CNV+SV calling, these VCF record contains both the SV and CNV INFO and FORMAT fields.The joint call set is written to the
.cnv_sv.vcf.gzoutput file.cnv.vcf.gzand.sv.vcf.gzoutputs are unaffected.
When --cnv-enable-sv-forced-segmentation is enabled, the somatic joint CNV+SV call set forms a breakpoint graph.
Example command lines
Joint SV/CNV VCF Output
The original CNV and SV VCF output files, prior to integration, are available for users in the DRAGEN output directory, as described elsewhere. Additionally, there is an enhanced CNV VCF available with the *.cnv_sv.vcf.gz extension. The VCF header lines in the *.cnv_sv.vcf.gz mostly correspond to a concatenation of the individual header lines from the CNV and SV VCFs, with a few lines deduplicated and some new ones added. For details on the legacy header lines, please refer to the individual CNV and SV user guide sections.
Newly added header lines are described in the following table.
END_LEFT_BND_OF
1
String
ID of CNV whose left end is matched to the end of SV
END_RIGHT_BND_OF
1
String
ID of CNV whose right end is matched to the end of SV
LEFT_BND
1
String
ID of SV that matches the left end of CNV record
LEFT_BND_OF
1
String
ID of CNV whose left end is matched to SV
MatchSv
1
Integer
ID of original SV that was merged with CNV record
OrigCnvEnd
1
Integer
Coordinate of original CNV END
OrigCnvPos
1
Integer
Coordinate of original CNV POS
RIGHT_BND
1
String
ID of SV that matches the right end of CNV record
RIGHT_BND_OF
1
String
ID of CNV whose right end is matched to SV
SVCLAIM
A
String
Claim made by the structural variant call. Valid values are D, J, DJ for: abundance, adjacency and both respectively
Records that can be matched or rescued will have annotations indicating the breakpoint linkage between a CNV and SV record. If a complete match is found, then the MatchSv annotation will be present in the record, indicating the SV record's ID field for this CNV record. In this case, BND notations refer to the merged record ID itself rather than the SV before merging. Furthermore, the use of the SVCLAIM field will indicate if the record has evidence arising from depth signal D, or junction signals J, or both DJ.
Because of the mixing of standalone SV records and CNV records, the FORMAT field may have different annotations. For details on the CNV or SV specific annotations, please refer to the individual CNV and SV user guide sections.
Records that can be matched or rescued will have FILTER set to PASS. The original FILTERs are retained for records that were not matched or rescued. For example, the cnvLength FILTER will still be applied to standalone CNV records (those with SVCLAIM=D).
Example records are shown below.
Last updated
Was this helpful?