Somatic CNV Calling ASCN (WGS)
Last updated
Was this helpful?
Last updated
Was this helpful?
To detect somatic copy number aberrations and regions with loss of heterozygosity, run the DRAGEN CNV Caller on a tumor sample with a VCF that contains germline SNVs. The output file is a VCF file. Components of the germline CNV caller are reused in the somatic algorithm with the addition of a somatic modeling component, which estimates tumor purity and ploidy. More details are available at Preprocessing, Segmentation and ASCN calling sections.
The germline SNVs are used to compute B-allele ratios in the tumor, which allows for allele-specific copy number calling on the tumor sample. Where possible, use of the small-variant VCF from a matched normal sample is preferred (tumor-normal mode) for best results, but a catalog of population SNPs can be used when a matched normal sample is not available (tumor-only mode). See section on for more details.
When a matched normal sample is available, the sample should first be processed using the germline small variant caller. In this case, only germline-heterozygous SNV sites are used for determining B-allele ratios. If no matched normal is available, population SNP B-allele ratios are computed as for matched normal heterozygous loci, but are treated as variants of unknown germline genotype; possible genotype assignments are statistically integrated to determine allele-specific copy number.
In matched normal mode, a VCF containing germline copy number changes for the individual may optionally be input. This makes sure that germline CNVs are output as separate segments in the somatic whole-genome sequencing (WGS) CNV VCF, and annotated with the germline copy number so that it is clear whether there are specifically-somatic copy number changes in the region.
You can use the following somatic WGS CNV calling command-line options:
--tumor-fastq1
, --tumor-fastq2
, --tumor-bam-input
, --tumor-cram-input
Specify a tumor input file.
--cnv-normal-b-allele-vcf
--cnv-population-b-allele-vcf
--cnv-use-somatic-vc-baf
--sample-sex
If known, specify the sex of the sample. If the sample sex is not specified, the caller attempts to estimate the sample sex from tumor alignments.
--cnv-normal-cnv-vcf
--cnv-use-somatic-vc-vaf
--cnv-somatic-enable-het-calling
The following is an example command line for running tumor-normal somatic WGS CNV calling with a matched normal SNV VCF.
If a matched normal is not available, you must disable CNV calling or run in tumor-only mode. Running with a mismatched normal in tumor-normal mode yields unexpected results. The following example command line runs tumor-only somatic WGS CNV calling with a population SNV VCF.
The following example command line runs tumor normal somatic WGS CNV calling concurrently with the Somatic SNV Caller, which allows you to use the matched normal germline heterozygous sites directly from the SNV Caller with the command cnv-use-somatic-vc-baf true
.
You can enable additional features when a matched normal sample and the outputs from DRAGEN Germline analysis are also available. If a matched normal sample is available, enable germline-aware mode and VAF-aware mode using the following example command line. For more information on germline-aware mode and VAF-aware mode, see Germline-aware Mode and VAF-aware Mode.
To specify germline CNVs from a matched normal sample, use --cnv-normal-cnv-vcf
. When specified, CNV records marked as PASS
in the normal sample are used during tumor-sample segmentation to make sure that confident germline CNV boundaries are also boundaries in the somatic output. Segments with germline copy number changes that are relative to reference ploidy are excluded from somatic model selection. During somatic copy number calling and scoring, the germline copy number is used to modify the expected depth contribution from the normal contamination fraction of the tumor sample. The process leads to more accurate assignment of somatic copy number in regions of germline CNV. DRAGEN then annotates the somatic WGS CNV VCF entries with germline copy number (NCN) and the somatic copy number difference relative to germline (SCND) for the segments that have germline CNVs.
If both the small variant caller and the CNV caller are enabled in a tumor-matched normal run, the somatic SNV results can affect the estimated purity and ploidy of the tumor sample. The somatic SNV variant allele frequencies (VAFs) that are captured by the allele depth values from passing somatic SNVs reflect the combination of tumor purity, total tumor copy number at a somatic SNV locus, and the number of tumor copies bearing the somatic allele. Clusters of somatic SNVs with similar allele depths inform the tumor model.
When a tumor has limited copy number variation and/or CNVs are mostly subclonal, such as in many liquid tumors, VAFs can help prevent incorrect or low-confidence estimated tumor models. Incorrect or low-confidence estimated tumor models can lead to wrong or filtered calls. VAF information can also help determine the presence or absence of a genome duplication even in samples from clonal tumors with clear CNVs.
To utilize VAF information, run somatic WGS CNV calling with small variant calling on tumor and matched-normal read alignment inputs. For example, you could use the following command line:
--enable-variant-caller=true --enable-cnv=true --tumor-bam-input <TUMOR_BAM> --bam-input <NORMAL_BAM>
For tumor/matched-normal runs with --enable-variant-caller true
, VAF-based modeling is enabled by default. To disable VAF-based modeling, set --cnv-use-somatic-vc-vaf
to false
.
Specify a matched normal SNV VCF. For more information on specifying b-allele loci, see .
Specify a population SNP catalog. For more information on specifying b-allele loci, see .
If running in tumor-normal mode with the SNV caller enabled, use this option to specify the germline heterozygous sites. For more information on specifying b-allele loci, see .
Specify germline CNVs from the matched normal sample. For more information, see .
Use the variant allele frequencies (VAFs) from the somatic SNVs to help select the tumor model for the sample. For more information, see .
Enable HET-calling mode for heterogeneous segments. For more information, see .
The target counting stage and its output are the same as for the germline CNV calling case. The target intervals with the read counts are output in a *.target.counts.gz
file. If there is insufficient read depth coverage detected, processing will halt. For low depth tumor samples, the value of --cnv-interval-width
can be increased from to capture more alignments. The B-allele counting occurs in parallel with the read counting phase, and the values are output in a *.baf.bedgraph.gz
file. This file can be loaded into IGV along with other bigwig files generated by DRAGEN for visualization. See for more details on output files.