Ploidy Caller

The Ploidy Caller uses the per contig median coverage values from the Ploidy Estimator to detect aneuploidy and chromosomal mosaicism in mammalian germline samples from whole genome sequencing data.

The Ploidy Caller runs by default except in the following circumstances:

  • The Ploidy Estimator cannot determine if the input data is from whole genome sequencing. For example, data from exome or targeted sequencing.

  • The reference genome does not contain any autosomes following the expected naming convention (e.g. chr1 or 1).

  • There is no germline sample. For example, tumor-only analysis.

Calling Model

Chromosomal mosaicism is detected when there is a significant shift in median coverage of a chromosome compared to the overall autosomal median coverage.

The following table displays some examples of expected shifts in coverage for a give aneuploidy and mosaic fraction.

Neutral Copy NumberVariant Copy NumberMosaic FractionExpected Coverage Shift

2

1

10%

-5%

2

1

5%

-2.5%

2

3

5%

+2.5%

2

3

10%

+5%

The Ploidy Caller models coverage as a normal distribution for both the null (neutral) and the alternative (mosaic) hypotheses. The two normal distributions have equal mean at the median autosomal coverage for the sample, but the variance of the alternative normal distribution is greater than that of the null normal distribution. The baseline variance of the two models at 30x coverage was determined empirically from a cohort of ~2500 WGS samples. The actual variance used for the two models is calculated from the baseline variance at 30x coverage, adjusting for the median autosomal coverage of the sample. Below are the likelihood distributions for the null and alternative hypotheses for a sample with 35x median autosomal coverage.

After applying an empirically estimated prior for chromosomal mosaicism the Ploidy Caller generates ploidy calls according to the posterior probability of the null and alternative hypotheses as shown below for a sample with 35x median autosomal sequencing coverage.

At 35x median autosomal coverage, the threshold for deciding between a neutral (REF) and an alternative (DEL or DUP) call is roughly at +/- 5% shift in coverage for an autosome. At 100x median autosomal coverage, the threshold is at roughly +/- 3% shift in coverage for an autosome. A Q20 threshold is used to filter low quality calls.

Reference Sex Karyotype

In addition to detecting aneuploidy and chromosomal mosaicism in autosomes where the expected reference ploidy is 2, the Ploidy Caller can also detect these variants in allosomes. The reference sex karyotype used for making calls on the allosomes is determined from the sex karyotype of the sample either provided on the command line using the --sample-sex option or from the Ploidy Estimator. If the sex karyotype of the sample is not provided on the command line and not determined by the Ploidy Estimator, then the sex karyotype is assumed to be XX. Whenever the sex karyotype contains at least one Y chromosome, the reference sex karyotpye is XY. If the sex karyotype does not contain at least one Y chromosome, then the reference sex karyotype is XX. The following table displays each of the possible sex karyotypes for a sample. If the Y chromosome reference ploidy is zero, then ploidy calling is not performed on the Y chromosome.

Sex KaryotypeX Reference PloidyY Reference Ploidy

XX

2

0

XY

1

1

XXY

1

1

XYY

1

1

X0

2

0

XXXY

1

1

XXX

2

0

Ploidy Caller Output File

The Ploidy Caller generates a <output-file-prefix>.ploidy.vcf.gz file in the output directory. The output file follows the VCF 4.2 Specification. A single record is reported for each reference autosome and allosome, except for the Y chromosome if the reference sex karotype is XX. Calls are not made for other sequences in the reference genome, such as mitochondrial DNA, unlocalized or unplaced sequences, alternate contigs, decoy contigs, or the Epstein-Barr virus sequence. The VCF header is annotated with ##source=DRAGEN_PLOIDY to indicate the file is generated by the DRAGEN PLOIDY pipeline.

The following information is provided in the VCF file.

  • Meta-information--The VCF output file contains common meta-information such as DRAGENVersion and DRAGEN CommandLine, as well as Ploidy Caller specific information. The VCF header contains the meta-information for median autosome depth of coverage, the provided sex karyotype if available, the estimated sex karyotype from the Ploidy Estimator if available, and the reference sex karyotype. The following is an example of the header lines:

##autosomeDepthOfCoverage=36.635
##providedSexKaryotype=XY
##estimatedSexKaryotype=X0
##referenceSexKaryotype=XY
  • FILTER Fields--The VCF output file includes the LowQual filter, which filters results with quality score below 20.

  • INFO Fields--The VCF output INFO fields include the following:

    • END—End position of the variant described in this record.

    • SVTYPE—Type of structural variant.

  • FORMAT Fields--The VCF output file includes the following format fields. There is no GT FORMAT field. A variant call in the VCF displays either <DUP> or <DEL> in the ALT column. A non-variant call displays . in the ALT column. If using the output file for downstream use, a GT field can be added for variant calls using ./1 for a diploid contig and 1 for a haploid contig. For non-variant calls, use 0/0 for diploid and 0 for haploid.

    • DC—Depth of coverage.

    • NDC—Normalized depth of coverage.

Example Output File

The following is an example output file for a sample with mosaic loss of the Y chromosome.

##fileformat=VCFv4.2
...
##autosomeDepthOfCoverage=36.635
##providedSexKaryotype=XY
##estimatedSexKaryotype=X0
##referenceSexKaryotype=XY
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=DEL,Description="Deletion relative to the reference">
##ALT=<ID=DUP,Description="Region of elevated copy number relative to the reference">
##FILTER=<ID=LowQual,Description="QUAL below 20">
##FORMAT=<ID=DC,Number=1,Type=Float,Description="Depth of coverage">
##FORMAT=<ID=NDC,Number=1,Type=Float,Description="Normalized depth of coverage">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	MySampleName
chr1	1	.	N	.	31.1252	PASS	END=248956422	DC:NDC	36.836:1.00549
chr2	1	.	N	.	31.451	PASS	END=242193529	DC:NDC	36.668:1.0009
...
chr21	1	.	N	.	31.4499	PASS	END=46709983	DC:NDC	36.6:0.999045
chr22	1	.	N	.	28.8148	PASS	END=50818468	DC:NDC	37.2:1.01542
chrX	1	.	N	.	29.7892	PASS	END=156040895	DC:NDC	18:0.982667
chrY	1	.	N	<DEL>	150	PASS	END=57227415;SVTYPE=DEL	DC:NDC	5.7:0.311178

The following is an example output file for a sample with trisomy 21.

##fileformat=VCFv4.2
...
##autosomeDepthOfCoverage=36.635
##providedSexKaryotype=XY
##estimatedSexKaryotype=XY
##referenceSexKaryotype=XY
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=DEL,Description="Deletion relative to the reference">
##ALT=<ID=DUP,Description="Region of elevated copy number relative to the reference">
##FILTER=<ID=LowQual,Description="QUAL below 20">
##FORMAT=<ID=DC,Number=1,Type=Float,Description="Depth of coverage">
##FORMAT=<ID=NDC,Number=1,Type=Float,Description="Normalized depth of coverage">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	MySampleName
chr1	1	.	N	.	31.1252	PASS	END=248956422	DC:NDC	36.836:1.00549
chr2	1	.	N	.	31.451	PASS	END=242193529	DC:NDC	36.668:1.0009
...
chr21	1	.	N	<DUP>	31.4499	PASS	END=46709983	DC:NDC	54.9:1.49857
chr22	1	.	N	.	28.8148	PASS	END=50818468	DC:NDC	37.2:1.01542
chrX	1	.	N	.	29.7892	PASS	END=156040895	DC:NDC	18:0.982667
chrY	1	.	N	.	29.5322	PASS	END=57227415;SVTYPE=DEL	DC:NDC	18.6:1.0153

Cell Line Artifacts

Samples derived from cell lines frequently have coverage artifacts that might result in variant ploidy calls on some chromosomes. Chromosomes 17, 19, and 22 are the most common for the cell line coverage artifacts. When performing accuracy assessments of ploidy calls on cell line samples, filter out chromosomes with known cell line artifacts.

Last updated