# Germline

## Overview

DRAGEN provides germline copy number variant (CNV) calling workflows that detect copy number aberrations and regions with absence of heterozygosity (AOH) in whole genome sequencing (WGS) and whole exome sequencing (WES) data. The CNV workflows leverage both depth of coverage and B-allele frequencies (BAFs) to provide comprehensive detection of:

* Copy number gains (duplications) and losses (deletions)
* Copy-neutral loss of heterozygosity (CNLOH)
* Whole-arm and whole-chromosome aneuploidies (via Cytogenetics modality)
* Mosaic alterations (WGS only, enabled by default)
* Minor allele copy number estimation

For applications that do not require allele-specific information, mosaic alterations and whole-arm/-chromosome aneuploidies, our legacy depth-only workflow is also available. See [Depth-Only Workflow](https://github.com/illumina-swi/dragen-docs/blob/release/4.5-prod/product-guides/dragen-v4.5/user-guide/dragen-dna-pipeline/cnv-calling/legacy/cnv-germline-legacy.md) for details.

## Workflow

The germline CNV workflow follows this processing pipeline:

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-b881a7bf771ea40fdb3ea3fc287411f0348f8a53%2Fcnv-calling.germline.png?alt=media)

The pipeline consists of the following modules:

1. **Target Counts** — Binning of read counts and other signals from alignments
2. **B-Allele Counts** — Extraction of allelic read counts
3. **Bias Correction** — Correction of GC bias and other systematic biases
4. **Normalization** — Detection of normal ploidy levels and normalization
5. **Segmentation** — Breakpoint detection via segmentation of normalized depth and BAF signals
6. **ASCN Calling** — Integration of depth and BAF segments to determine copy number states and allele-specific information

## Example Command Lines

### WGS

Note: add `--cnv-stop-after-intrinsic-corrections=true` if interested only in target counts generation + bias correction.

```bash
dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--bam-input <BAM> \
--cnv-population-b-allele-vcf <POP_SNP_VCF>
```

### WES

```bash
dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--bam-input <BAM> \
--cnv-target-bed <CNV_TARGET_BED> \
--cnv-normals-list <PANEL_OF_NORMALS> \
--cnv-population-b-allele-vcf <POP_SNP_VCF>
```

Alternatively, you can use a pre-combined panel of normals file:

```bash
dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-map-align false \
--enable-cnv true \
--bam-input <BAM> \
--cnv-target-bed <CNV_TARGET_BED> \
--cnv-combined-counts <CNV_PANEL_OF_NORMALS> \
--cnv-population-b-allele-vcf <POP_SNP_VCF>
```

## Required Options

| Option       | Description                           |
| ------------ | ------------------------------------- |
| --enable-cnv | Enable CNV processing (set to `true`) |

### Input

| Option                        | Description                                                                       |
| ----------------------------- | --------------------------------------------------------------------------------- |
| --fastq-file1, --fastq-file2  | FASTQ input files (requires `--enable-map-align true`)                            |
| --bam-input                   | BAM input file                                                                    |
| --cram-input                  | CRAM input file                                                                   |
| --ref-dir                     | DRAGEN reference genome hashtable directory                                       |
| --enable-map-align            | Enable mapper and aligner module                                                  |
| --cnv-population-b-allele-vcf | Population SNP catalog for BAF estimation                                         |
| --cnv-target-bed              | BED file defining exome capture regions (only for WES)                            |
| --sample-sex                  | Sample sex (e.g., `male`, `female`). If not specified, sex is estimated from data |

You can download a suitable population SNP catalog (Resource file "CNV Population SNP VCF") for your associated reference at [this page](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html)

### Segmentation

The default segmentation mode depends on the sample type. Germline WGS samples use SLM by default. Germline WES samples use HSLM by default. The segmentation mode can be set explicitly with `--cnv-segmentation-mode (SLM|HSLM)`. See [Segmentation](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/cnv-reference#segmentation) in the CNV reference section for a description of SLM and its HSLM variant.

| Option          | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| --------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --cnv-slm-eta   | Probability that the segmenter changes to any other state than the current state going from the current target to the next target. This could also be expressed as the probability that the true depth for adjacent targets is different for reasons that simple counting noise does not adequately explain. Likewise, the stay-in-state probability is (1.0 - eta). The default value is 4e-5, the range is (0.0, 1.0) excluding endpoints. Decreasing this value results in longer segments and reduced fragmentation; increasing produces shorter segments with more fragmentation. |
| --cnv-slm-omega | Scaling parameter modulating the relative weight between experimental and biological variance. The default is 0.3, the range is (0.0, 1.0) excluding endpoints. In general, decreasing this value produces longer segments with less fragmentation; increasing produces shorter segments with more fragmentation.                                                                                                                                                                                                                                                                      |

The following options apply to the HSLM segmentation method, which is only used in germline WES.

| Option            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --cnv-slm-stepeta | Distance normalization parameter. The default is 10000. Modifies the effective eta based on the genomic distance between consecutive target intervals. This can progressively relax stay-in-state, or "stickiness", of the segmenter as adjacent targets become farther apart, making the method adaptive to unequal spacing. Decreasing produces shorter segments with more fragmentation; increasing produces longer segments with less fragmentation. |
| --cnv-slm-fw      | Minimum number of depth bins or targets required for a segment to be retained. This is an internal hard filter at the segmentation stage. The default is 0, which disables it. This is largely vestigial; use of this option is not recommended.                                                                                                                                                                                                         |

The following options are documented here in proximity to segmentation options because of their direct relevance to each other. Once provisional calls for copy number (CN) and minor copy number (MCN) have been made on the resulting segments from the segmentation stage, adjacent segments with the same CN and MCN are joined together to form one single segment. This is continued until no two adjacent segments satisfy the merging criteria. Segment merging is a critical step which compensates for over-segmentation or over-fragmentation happening at the segmentation stage. However, segment merging cannot split segments apart, so it cannot compensate in the other direction. **Thus, segmentation can afford to produce a degree of over-segmentation, but there is no compensatory mechanism for under-segmentation.** The following options control segment merging in germline analyses and do not depend on segmentation method or the segmentation options in use.

| Option                | Description                                                                                                                                                                                                                            |
| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| --cnv-merge-distance  | Maximum gap in base pairs between two adjacent segments that still allows them to be merged. The default is 1000 for germline WGS. For WES the default is effectively unlimited, since target intervals are inherently non-contiguous. |
| --cnv-merge-threshold | Maximum difference in segment mean (linear copy ratio) between two adjacent segments that still allows them to be merged. The default is 0.2 for germline WGS and 0.4 for germline WES.                                                |

Setting `--cnv-merge-threshold` to zero disables segment merging entirely. This is not recommended.

### Normalization

The following options are mutually exclusive:

| Option                          | Description                                                                |
| ------------------------------- | -------------------------------------------------------------------------- |
| --cnv-normals-list              | Text file containing paths to reference target counts files (one per line) |
| --cnv-normals-file              | Individual normal counts file (use multiple times for multiple files)      |
| --cnv-combined-counts           | Combined panel of normals file (`.combined.counts.txt.gz`)                 |
| --cnv-enable-self-normalization | Use self-normalization for sample normalization (only available for WGS)   |

### Output

| Option               | Description                               |
| -------------------- | ----------------------------------------- |
| --output-directory   | Output directory for all results          |
| --output-file-prefix | Prefix prepended to all output file names |

### Workflow Configuration

| Option                                 | Description                                                            | Default | WGS | WES |
| -------------------------------------- | ---------------------------------------------------------------------- | ------- | :-: | :-: |
| --cnv-enable-mosaic-calling            | Enable detection of mosaic alterations                                 | `true`  |  ✓  |     |
| --cnv-enable-cyto-output               | Enable cytogenetics-compatible output VCF                              | `true`  |  ✓  |  ✓  |
| --cnv-enable-legacy-vcf-format         | Use VCF v4.2 format instead of VCF v4.4                                | `false` |  ✓  |  ✓  |
| --cnv-stop-after-intrinsic-corrections | Stop processing after generating target counts and GC-corrected counts | `false` |  ✓  |  ✓  |

**Note:** Mosaic calling is available for WES but not recommended (disabled by default) due to lack of extensive validation.

### Output Filtering

| Option                        | Description                                                | Default        |
| ----------------------------- | ---------------------------------------------------------- | -------------- |
| --cnv-enable-ref-calls        | Emit copy-neutral (REF) calls in output VCF                | `true` for WGS |
| --cnv-filter-length           | Minimum event length (bp) for PASS calls                   | `10000`        |
| --cnv-exclude-bed             | BED file specifying intervals to exclude from analysis     | Not set        |
| --cnv-exclude-bed-min-overlap | Minimum overlap fraction for exclusion                     | `0.5`          |
| --cnv-post-vcf-target-bed     | BED file used to only emit calls overlapping BED intervals | Not set        |

## Output Files

The germline CNV workflow generates the following output files:

| File                           | Description                                | Format           |
| ------------------------------ | ------------------------------------------ | ---------------- |
| .target.counts.gz              | Raw target counts before bias correction   | gzipped TSV      |
| .target.counts.gc-corrected.gz | GC-bias corrected target counts            | gzipped TSV      |
| .tn.tsv.gz                     | Tangent-normalized coverage signal         | gzipped TSV      |
| .ballele.counts.gz             | B-allele counts at population SNP sites    | gzipped TSV      |
| .baf.bedgraph.gz               | B-allele frequency in bedgraph format      | gzipped bedGraph |
| .seg                           | Segmentation results (depth and BAF)       | TSV              |
| .cnv.vcf.gz                    | Primary CNV calls (VCF v4.4 by default)    | gzipped VCF      |
| .cyto.vcf.gz                   | Cytogenetics-compatible calls (if enabled) | gzipped VCF      |
| .cnv\_metrics.csv              | Summary metrics including predicted sex    | CSV              |
| .cnv.gff3                      | Variant calls in GFF format                | GFF              |
| .tn.bw                         | Tangent-normalized signal track            | BigWig           |

### Target Counts Output

`<prefix>.target.counts.gz`

Compressed tab-delimited file containing the number of read counts per target interval. This is the raw signal as extracted from the alignments of the BAM or CRAM file. The format is identical for both the case sample and any panel of normals samples. There is also a bigWig representation of a `target.counts.diploid` file, which is normalized to the normal ploidy level of 2 instead of raw counts.

Columns:

1. Contig identifier
2. Start position
3. End position
4. Target interval name
5. Count of alignments in this interval
6. Count of improperly paired alignments in this interval

Header lines starting with `#` contain the DRAGEN version, command line, and other meta information.

Example:

```
#TARGET COUNTS FILE
##DRAGENVersion=<VERSION_INFO>
##DRAGENCommandLine=<CommandLineOptions>
...
contig  start  stop   name                <SampleName> improper_pairs
1       565480 565959 target-wgs-1-565480 7          6
1       566837 567182 target-wgs-1-566837 9          0
1       713984 714455 target-wgs-1-713984 34         4
1       721116 721593 target-wgs-1-721116 47         1
1       724219 724547 target-wgs-1-724219 24         21
```

`<prefix>.target.counts.gc-corrected.gz`

Contains GC-corrected read counts per target interval. The format is equivalent to the `*.target.counts.gz` file:

1. Contig identifier
2. Start position
3. End position
4. Target interval name
5. GC-corrected read counts in this interval
6. Count of improperly paired alignments in this interval

Example:

```
#GC CORRECTED FILE
##DRAGENVersion=<VERSION_INFO>
##DRAGENCommandLine=<CommandLineOptions>
...
contig  start   stop    name    <SampleName> improper_pairs
chr1    818022  819840  target-wgs-chr1-818022:819840   1071.353133     6
chr1    819840  821337  target-wgs-chr1-819840:821337   1051.014997     19
chr1    821337  822485  target-wgs-chr1-821337:822485   1098.6502       10
chr1    822485  824431  target-wgs-chr1-822485:824431   1117.28308      7
```

For more information, see [Target Counts File](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/cnv-reference#target-counts-file) and [GC Bias Correction](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/cnv-reference#gc-bias-correction).

### Normalized Coverage Output

`<prefix>.tn.tsv.gz`

Contains the normalized signal of the case sample per target interval, i.e., the log2-transformed copy ratio signal. A strong signal deviation from 0.0 indicates a potential for a CNV event. The format is equivalent to the `*.target.counts.gz` file:

1. Contig identifier
2. Start position
3. End position
4. Target interval name
5. Log2-transformed copy ratio in this interval
6. Count of improperly paired alignments in this interval

Header lines are also included that start with `#`. In some cases, the normalization counts could be patched internally with intervals from other processes, such as the SegDups extension. In such cases, patches are indicated (sorted in order of application) with header lines starting with `#patch`:

```
#patch 1 = <normalized_counts_patch_1_filename>
#patch 2 = <normalized_counts_patch_2_filename>
...
```

and the original (unpatched) `*.tn.tsv.gz` is renamed as `*.tn.unpatched.tsv.gz`. Note: this file is reported in output for inspection, but most use cases will use the (patched) `*.tn.tsv.gz` file downstream of normalization.

An example of a `*.tn.tsv.gz` file is shown below.

```
#title = Normalized coverage profile
#sex = UNDETERMINED
contig  start   stop    name    <SampleName> improper_pairs
chr1    818022  819840  target-wgs-chr1-818022:819840   -0.18479358083014644    6
chr1    819840  821337  target-wgs-chr1-819840:821337   -0.21244441644669046    19
chr1    821337  822485  target-wgs-chr1-821337:822485   -0.14849555308041734    10
chr1    822485  824431  target-wgs-chr1-822485:824431   -0.12423291178926463    7
chr1    830446  832304  target-wgs-chr1-830446:832304   -0.1438261733656668     1
```

For more information, see [Normalization](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/cnv-reference#normalization).

### B-Allele Counts

In germline ASCN runs, B-allele counts are calculated at bi-allelic sites taken from a collection of high-frequency SNVs in the population. Each B-allele site consists of a reference allele and a variant allele, and the number of reads in the sample supporting each of these alleles is counted.

B-allele counts are written both to gzipped tsv file `*.ballele.counts.gz` and gzipped bedgraph file `*.baf.bedgraph.gz`.

`<prefix>.ballele.counts.gz`

Columns:

1. Contig identifier
2. Start, BED-style (zero-based inclusive) start position of the reference allele
3. Stop, BED-style (one-based inclusive) stop position of the reference allele
4. Base sequence for the reference allele
5. Base sequence for the first allele being counted
6. Base sequence for the second allele being counted
7. The number of qualified reads containing a sequence matching the first allele
8. The number of qualified reads containing a sequence matching the second allele
9. Population frequency for the first allele
10. Population frequency for the second allele

Example:

```
contig  start   stop    refAllele       allele1 allele2 allele1Count    allele2Count    allele1AF       allele2AF
chr1    51478   51479   T       T       A       4       2       0.6747  0.3253
chr1    82733   82734   T       T       C       111     36      0.79346 0.20654
chr1    83083   83084   T       T       A       0       0       0.1538  0.8462
chr1    86330   86331   A       A       G       9       9       0.87384 0.12616
chr1    88315   88316   G       G       A       0       0       0.8926  0.1074
```

`<prefix>.baf.bedgraph.gz`

B-allele frequency in bedgraph format. Allele count ratios are calculated by sorting alleles according to base priority {A, T, G, C} (descending), producing frequencies deterministically distributed above and below 0.5. This provides easy visualization in IGV of significant BAF changes between neighboring segments.

Example:

```
chr1    11021   11022   0.333333
chr1    14463   14464   0.755102
chr1    16494   16495   0.317708
chr1    38741   38742   0.5
chr1    39014   39015   0.44186
```

### Segmentation Results

`<prefix>.seg`

Contains the segments produced by the segmentation algorithm. The `Segment_Mean` value of a segment is the ratio of the mean of that segment to the whole-sample median, without log transformation (linear copy-ratio). A strong signal deviation from 1.0 indicates a potential for a CNV event.

The file has the following columns:

1. Sample name
2. Contig identified
3. Start position
4. End position
5. Number of intervals in the segment
6. Linear copy-ratio of the segment

An example of a `*.seg` file is shown below.

```
Sample  Chromosome      Start   End     Num_Probes      Segment_Mean
<SampleName> chr1    818022  1117426 224     0.82500341336435279
<SampleName> chr1    1117426 4063702 2438    0.91726081432236528
<SampleName> chr1    4063702 4067591 3       0.38861386123247205
<SampleName> chr1    4067591 7705829 3302    0.93021316913709917
<SampleName> chr1    7705829 9357003 1405    0.98147825043799442
<SampleName> chr1    9357003 9377365 19      0.50269670724395654
<SampleName> chr1    9377365 12859821        2905    1.0684818476332989
```

`<prefix>.baf.seg`

In addition to segmentation of target counts, some workflows perform segmentation of B-allele loci. The output file has suffix `*.baf.seg` and it has the same format of the `*.seg` file with two modifications. First, the `Segment_Mean` value is the mean over B-allele loci of the smaller observed allele fraction. Second, there is an additional column:

7. `BAF_SLM_STATE`: Integer between 0 and 10, indicating bins of minor-allele fraction (low to high), or `.` when the BAF data are too variable to estimate a minor-allele fraction

An example of BAF segmentation output file is shown below:

```
Sample  Chromosome      Start   End     Num_Probes      Segment_Mean    BAF_SLM_STATE
<SampleName> chr1    820348  1104646 194     0.29301737166888697     6
<SampleName> chr1    1105091 1533754 444     0.26185904799069076     5
<SampleName> chr1    1533810 1534166 9       0.41958837071702065     8
<SampleName> chr1    1534217 9356793 6689    0.26034515815016335     5
<SampleName> chr1    9358304 9376529 27      0.46450553586280602     10
```

### VCF Output

`<prefix>.cnv.vcf.gz`

The CNV VCF file follows the standard VCF format [v4.4](https://samtools.github.io/hts-specs/VCFv4.4.pdf). The VCF header is annotated with `##source=<DRAGEN_SOURCE>`, where `<DRAGEN_SOURCE>` identifies the caller which produced the VCF, e.g.:

* `DRAGEN_ASCN`: CNV caller
* `DRAGEN_ASCN_SV`: CNV caller + SV support
* `DRAGEN_CNV`: [legacy depth-only CNV caller](https://github.com/illumina-swi/dragen-docs/blob/release/4.5-prod/product-guides/dragen-v4.5/user-guide/dragen-dna-pipeline/cnv-calling/legacy/cnv-germline-legacy.md) (note: for legacy reasons this caller uses VCF version [v4.2](https://samtools.github.io/hts-specs/VCFv4.2.pdf))

Due to the nature of how CNV events are represented, not all fields are applicable. In general, if more information is available about an event, then the information is annotated. To include copy neutral (REF) calls, set `--cnv-enable-ref-calls` to true. AOH/LOH events are not available in the legacy depth-only caller.

#### Example Records

```bash
# Example REF call
chr1    819841  DRAGEN:REF:chr1:819841-6103865  N       .       1000    PASS
  END=6103865;REFLEN=5284025
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/0:2:1:1000:1000:2.00155:1.000775:1.000775:129.1:0.5:4544:10920:66,10:0.00368019

# Example copy-neutral LOH call
chr1    6104347 DRAGEN:CNLOH:chr1:6104348-6727324       N       <LOH>   1000    PASS
  END=6727324;REFLEN=622977;SVLEN=622977;LOHTYPE=AOH;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  1/1:2:0:1000:1000:1.9876:0.001988:0.993798:128.2:0.001:528:916:10,12:0.00766703

# Example GAIN call
chr1    16715826        DRAGEN:GAIN:chr1:16715827-16949283      N       <DUP>   744     PASS
  END=16949283;REFLEN=233457;SVLEN=233457;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:3:1:1000:99:3.08217:1.134239:1.541085:198.8:0.368:49:26:20,14:0.0384615

# Example GAIN LOH call
chr15   20212550        DRAGEN:GAINLOH:chr15:20212551-20421468  N       <LOH>   390     PASS
  END=20421468;REFLEN=208918;SVLEN=208918;LOHTYPE=AOH;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  1/1:6:0:1:1:5.90559:0.000000:2.952793:380.91:0:76:1:9,8:0

# Example LOSS call
chr1    25274774        DRAGEN:LOSS:chr1:25274775-25331683      N       <DEL>   226     PASS
  END=25331683;REFLEN=56909;SVLEN=56909;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:1:0:1000:1000:1.01085:0.000000:0.505426:65.2:0:7:10:5,1:0

# Example MOSAIC GAIN call
chr17   26781858        DRAGEN:GAIN:chr17:26781859-26940176  N  <DUP>   70      PASS
  END=26940176;CIPOS=-6985,1424;CIEND=-1519,1732;REFLEN=158318;MOSAIC;SVLEN=158318;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE
  ./1:3:.:0:.:2.267890:.:0.268000:1.133945:123.6:.:78:0:17,118

# Example MOSAIC LOSS call
chr17   21884022        DRAGEN:LOSS:chr17:21884023-21988202  N  <DEL>   1000    PASS
  END=21988202;CIPOS=-1254,1259;CIEND=-1574,1412;REFLEN=104180;MOSAIC;SVLEN=104180;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE
  0/1:1:0:0:1000:1.135780:.:0.864000:0.567890:61.9:.:79:0:80,348
```

#### Header

The following is an example of some of the header lines that are specific to CNV:

```
##fileformat=VCFv4.4
##ModelSource=DEPTH+BAF
##DiploidCoverage=371.000000
##OverallPloidy=1.998571
##HomozygosityIndex=0.001064
##OutlierBafFraction=0.024958
...
```

The following header lines are specific to the germline WGS ASCN caller:

| ID                 | Description                                                                                                                                                                                                                                                                                 |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ModelSource        | The primary basis on which the final model was chosen. Value: `DEPTH+BAF`.                                                                                                                                                                                                                  |
| DiploidCoverage    | Expected read count for a target bin in a diploid region.                                                                                                                                                                                                                                   |
| OverallPloidy      | Length-weighted average of copy number for PASS events.                                                                                                                                                                                                                                     |
| OutlierBafFraction | A QC metric that measures the fraction of b-allele frequencies that are incompatible with the segment the BAFs belong to. High values might indicate substantial cross-sample contamination, or a different source of a mosaic genome, such as bone marrow transplantation. Range: \[0, 1]. |
| HomozygosityIndex  | Autosomal AOH/LOH percentage, considering only PASS AOH/LOH ≥ 2 Mb (default). Used as a proxy for consanguinity. A custom minimum size can be set through `--cnv-min-length-homozygosity-index`. The Cyto VCF (`*.cyto.vcf.gz`) also provides resolution-specific homozygosity indexes.     |

#### Records

All coordinates in the VCF are 1-based.

| ID    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| CHROM | The chromosome (or contig) on which the copy number variant occurs.                                                                                                                                                                                                                                                                                                                                                                                                                               |
| POS   | Start position of the variant. If any of the ALT alleles is a symbolic allele (e.g., `<DEL>`), POS denotes the coordinate of the base preceding the polymorphism.                                                                                                                                                                                                                                                                                                                                 |
| ID    | Encodes the event type and coordinates of the event (1-based, inclusive). Event types include `GAIN`, `LOSS`, `REF`, `CNLOH`, and `GAINLOH`.                                                                                                                                                                                                                                                                                                                                                      |
| REF   | Contains `N` for all CNV events.                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| ALT   | Specifies the type of CNV event: `<DEL>`, `<DUP>`, or `<LOH>`. REF calls have ALT `.`. With `--cnv-enable-legacy-vcf-format` (VCF v4.2), the `ALT` field contains `<DEL>,<DUP>` in place of `<LOH>` for AOH/LOH events.                                                                                                                                                                                                                                                                           |
| QUAL  | Estimated quality score used in hard filtering. Note: different workflows provide different QUAL score distributions - it is recommended to compare QUAL scores only within results from the same workflow (e.g., it is incorrect to compare QUAL scores between the CNV caller and the [legacy (depth-only) CNV caller](https://github.com/illumina-swi/dragen-docs/blob/release/4.5-prod/product-guides/dragen-v4.5/user-guide/dragen-dna-pipeline/cnv-calling/legacy/cnv-germline-legacy.md)). |

#### FILTER

The FILTER column contains `PASS` if the CNV event passes all filters, otherwise the column contains the name of the failed filter. Default values are defined in the header line for each available FILTER.

| ID               | Description                                                                                                                                                                                                                                                |
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| binCount         | CNV events with a bin count lower than a threshold.                                                                                                                                                                                                        |
| chromArmBinCount | A whole-arm alteration call is based on a minimal portion (default 500 intervals) of the entire arm (e.g., in acrocentric chromosomes, where the short arm is mainly consisting of poor mappability regions, that are ignored during copy-number calling). |
| cnvLength        | The length of the CNV is lower than a threshold.                                                                                                                                                                                                           |
| cnvMosaicLength  | A MOSAIC call below a certain length has been filtered as candidate FP.                                                                                                                                                                                    |
| cnvQual          | The QUAL of the CNV is lower than a threshold.                                                                                                                                                                                                             |
| mosaicFraction   | The mosaic fraction of a CNV is below a defined threshold (`--cnv-filter-mosaic-fraction`). This filter is applied only to small CNVs with lengths shorter than the specified size threshold (`--cnv-filter-mosaic-fraction-max-length`, default: 200000). |

#### INFO

The INFO column contains information representing the event.

| ID      | Description                                                                                                                                                                                                                    |
| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| REFLEN  | Length of the event.                                                                                                                                                                                                           |
| SVLEN   | Length of the event. Only present for non-REF records. Note: in VCF v4.2 format (enabled with `--cnv-enable-legacy-vcf-format`), `SVLEN` is a signed representation of `REFLEN` (e.g., a negative value indicates a deletion). |
| SVTYPE  | Always `CNV`. Only present for non-REF records.                                                                                                                                                                                |
| END     | End position of the event (1-based, inclusive).                                                                                                                                                                                |
| LOHTYPE | Type of loss of heterozygosity. Possible values: `AOH` (Absence of Heterozygosity).                                                                                                                                            |
| MOSAIC  | Tag identifying mosaic calls (if mosaic calling is enabled).                                                                                                                                                                   |
| CIPOS   | Confidence interval around the nominal `POS`.                                                                                                                                                                                  |
| CIEND   | Confidence interval around the nominal `END`.                                                                                                                                                                                  |

If using a segment BED file, then the segment identifier is carried over from the input to `SEGID` field.

When matching CNV with SV output, additional INFO annotations are added.

#### FORMAT

The common FORMAT fields are described in the header:

| ID   | Description                                                                                                                                                                                                |
| ---- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| GT   | Genotype                                                                                                                                                                                                   |
| SM   | Linear copy ratio of the segment mean                                                                                                                                                                      |
| CN   | Estimated total copy number of sample                                                                                                                                                                      |
| BC   | Number of read count bins                                                                                                                                                                                  |
| PE   | Number of improperly paired end reads at start and stop breakpoints                                                                                                                                        |
| AS   | Number of allelic read count sites                                                                                                                                                                         |
| CNF  | Floating point estimate of copy number                                                                                                                                                                     |
| CNQ  | Exact total copy number Q-score                                                                                                                                                                            |
| MAF  | Estimate for the minor allele frequency                                                                                                                                                                    |
| MCN  | Estimated minor-haplotype copy number                                                                                                                                                                      |
| MCNF | Floating point estimate of minor-haplotype copy number                                                                                                                                                     |
| MCNQ | Minor copy number Q-score                                                                                                                                                                                  |
| MF   | Mosaic fraction estimate (for MOSAIC calls)                                                                                                                                                                |
| OBF  | Per-segment Outlier BAF Fraction. Percentage of BAF counts which are considered "outlier" with respect to the chosen segment call. Higher values might indicate segments where BAF counts are problematic. |
| SD   | Best estimate of segment's bias-corrected read count                                                                                                                                                       |

For more information, see [CNV VCF](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/cnv-reference#cnv-vcf-file).

### Cytogenetics Output

`<prefix>.cyto.vcf.gz`

The Cytogenetics modality output has a similar format to the standard CNV VCF (`*.cnv.vcf.gz`). A list of differences is indicated below:

* Records can have the `INFO/RES` field. In such case, such field indicate the resolution(s) associated with the record.
* Records can have the `INFO/SEGID` field. In such case, such field can either indicate custom predefined segments indicated in input by the user (similar to the standard CNV VCF), or Cytogenetics-specific predefined segments which are typically whole-arm/-chromosome segments automatically injected during the caller execution. In the latter case, the annotation field indicates the ID or name for the arm or chromosome.
* The VCF header is annotated with `##source=DRAGEN_CYTO` to indicate the file is generated by the Cytogenetics modality.

**Note:** The Cyto VCF also provides resolution-specific homozygosity indexes (i.e., computed on each specific resolution's callset). The default minimum size considered is the same as the main `HomozygosityIndex`, and for each resolution in output, there will be an additional header line on the Cyto VCF indicating the resulting metric, e.g., `##HomozygosityIndex(25k)=0.001015`.

### CNV Metrics Output

`<prefix>.cnv_metrics.csv`

DRAGEN CNV outputs metrics in CSV format. The following metrics are reported:

**Sex Genotyper**

| Metric           | Description                                                                               |
| ---------------- | ----------------------------------------------------------------------------------------- |
| Estimated sex    | Estimated sex of the case sample (and panel of normals samples if applicable).            |
| Confidence score | Range: \[0.0, 1.0]. If the sample sex is specified via `--sample-sex`, this value is 0.0. |

DRAGEN Sex Genotyper requires a minimum of 300 target intervals to confidently determine sex genotype; if the panel covers fewer intervals on the sex chromosomes, genotyping will fail and an undetermined genotype is returned. Users may lower this requirement by setting `--cnv-sex-genotyper-num-interval-requirement` to a smaller value, at the risk of increased false genotype calls.

**CNV Summary**

* Bases in reference genome in use
* Average alignment coverage over genome - The average alignment coverage over the genome is calculated by dividing the total number of bases from processed alignment records (excluding those filtered by the Target Counts stage in DRAGEN CNV) by the genome length. Alignment records are filtered taking into consideration duplicate marking status (if available), MAPQ, and mapping status.
* Number of alignment records processed
  * Number of filtered records (total)
  * Number of filtered records (due to duplicates)
  * Number of filtered records (due to MAPQ)
  * Number of filtered records (due to being unmapped)
* PMAD - Pairwise Median Absolute Deviation measures the variation in read coverage between adjacent bins. It measures variability due to various factors, such as DNA degradation, extraction, amplification or library preparation. Higher values indicate noisier sample data. PMAD is calculated as following:
  * Define a vector v\[i] as normalized counts of i-th interval in log scale, and d\[i] as pairwise differences of consecutive normalized counts between i and i+1 intervals, i.e. d\[i] = (v\[i] - v\[i+1])
  * PMAD is median absolute deviation of d, i.e. PMAD = Median(|d\[i]-Median(d)|)
* Coverage MAD - Median absolute deviation of normalized case counts. Higher values indicate noisier sample data.
* Median Bin Count - Median of raw counts normalized by interval size.
* Number of target intervals
* Number of normal samples
* Number of segments
* Number of amplifications - Note: GAINLOH events (ALT=LOH and CN > 2) are also included here
* Number of deletions
* Number of CNLOHs (Copy-Neutral LOHs)
* Number of PASS amplifications - Note: GAINLOH events (ALT=LOH and CN > 2) are also included here
* Number of PASS deletions
* Number of PASS CNLOHs (Copy-Neutral LOHs)
* Post-Normalization Bin Count Sigma - Standard deviation of post-PoN-normalization median-normalized coverage values.

Coverage MAD and Median Bin Count are only printed for WES germline/somatic CNV. Post-Normalization Bin Count Sigma is only printed when PoN normalization has been applied.

Example (not all metrics are shown):

```
SEX GENOTYPER,,<SampleName>,FEMALE,0.0000
CNV SUMMARY,,Bases in reference genome,3217346917
CNV SUMMARY,,Average alignment coverage over genome,34.5
CNV SUMMARY,,PMAD,0.031799
CNV SUMMARY,,Number of target intervals,2873465
CNV SUMMARY,,Number of segments,1247
CNV SUMMARY,,Number of amplifications,87
CNV SUMMARY,,Number of deletions,54
CNV SUMMARY,,Number of CNLOHs,12
CNV SUMMARY,,Number of PASS amplifications,65
CNV SUMMARY,,Number of PASS deletions,38
CNV SUMMARY,,Number of PASS CNLOHs,8
```

For more information, see [CNV Metrics](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/cnv-reference#cnv-metrics-file).

### Track Files (IGV)

To generate additional equivalent bigWig and gff files, set the `--cnv-enable-tracks` option to true. These files can be loaded into IGV along with other tracks that are available, such as RefSeq genes. Using these tracks alongside publicly available tracks allows for easier interpretation of calls. DRAGEN autogenerates IGV session XML file if tracks are generated by DRAGEN CNV. The `*.cnv.igv_session.xml` can be loaded directly into IGV for analysis.

The following IGV tracks are automatically populated in the output IGV session file:

| Track File            | Description                                                                                                                                                                                                            | Recommended View   |
| --------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
| \*.target.counts.bw   | BigWig representation of target counts bins. Values are GC-corrected if GC correction was performed.                                                                                                                   | Barchart or points |
| \*.improper\_pairs.bw | BigWig representation of improper pairs counts.                                                                                                                                                                        | Barchart           |
| \*.tn.bw              | BigWig representation of the tangent normalized signal.                                                                                                                                                                | Points             |
| \*.seg.bw             | BigWig representation of the segments.                                                                                                                                                                                 | Points             |
| \*.baf.seg.bw         | BigWig representation of BAF segments (if available).                                                                                                                                                                  | Points             |
| \*.baf.bedgraph.gz    | BED graph representation of B-allele frequency (if available).                                                                                                                                                         | Points             |
| \*.cnv.gff3           | GFF3 representation of CNV events: DEL=blue, DUP=red, filtered=light gray, REF=green (if enabled), AOH/LOH=magenta. An example is shown below (different workflows may output different attributes on the 9th column). | —                  |

Example GFF3 output:

```
##gff-version 3
chr1    DRAGEN  LOSS    12779193        12859821        30      .       .       Alt=DEL;LinearCopyRatio=0.576;CopyNumber=1;Genotype=0/1;Qual=30;Filter=PASS;Start=12779192;Stop=12859821;Length=80629;BinCount=24;ImproperPairsCount=16,7;color=#0000FF;
chr1    DRAGEN  REF     13106280        13122338        19      .       .       Alt=REF;LinearCopyRatio=1.05981;CopyNumber=2;Genotype=./.;Qual=19;Filter=PASS;Start=13106279;Stop=13122338;Length=16059;BinCount=8;ImproperPairsCount=3,1;color=#00FF00;
chr1    DRAGEN  GAIN    13225213        13247040        66      .       .       Alt=DUP;LinearCopyRatio=2.016;CopyNumber=4;Genotype=./1;Qual=66;Filter=PASS;Start=13225212;Stop=13247040;Length=21828;BinCount=9;ImproperPairsCount=7,5;color=#FF0000;
```

#### IGV Session

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-5b52de28953daaaf9d7ca6cea3fb035b9a9ad341%2Fcnv-calling.IGVTracks.png?alt=media)

File extension: `*.igv_session.xml`

The IGV session XML file is prepopulated with track files generated by DRAGEN. The session file loads the reference genome that best matches the standard reference genomes in an IGV installation, by comparing the name of the `--ref-dir` specified on the command-line. Standard UCSC human reference genomes are autodetected, but any variations from the standard reference genomes might not be autodetected. To edit the genome detection, alter the `genome` attribute in the `Session` element to the reference genome you would like for analysis before loading into IGV. The reference identifier used by IGV might differ from the actual name of the genome. The following is an example edited session file.

```
<?xml version="1.0" encoding="utf-8"?>
<Session genome="b37" hasGeneTrack="false" hasSequenceTrack="true" version="8">
    <Resources>
        <Resource path="example.cnv.gff3"/>
        <Resource path="example.cnv.excluded_intervals.bed.gz"/>
        <Resource path="example.target.counts.bw"/>
        <Resource path="example.improper.pairs.bw"/>
        <Resource path="example.tn.bw"/>
        <Resource path="example.seg.bw"/>
    </Resources>
    <Panel height="500" width="1200" name="DataPanel">
        ...
    </Panel>
</Session>
```

Note that depending on the IGV version installed, it may come prepackaged with different flavors of GRCh37. The reference naming conventions have changed so a user may have to edit the `genome` field in the XML file directly. For example, IGV has traditionally packaged a `b37` reference genome, but may also include a `1kg_v37` or a `1kg_b37+decoy`, which will appear on the IGV user interface as "1kg, b37" or "1kg, b37+decoy" respectively.

You can determine what the correct encoding of a reference genome by going to `File > Save Session...` and then inspecting the generated igv\_session.xml file.

When the Cytogenetics Modality is enabled, DRAGEN CNV produces an additional IGV session xml `*.cyto.igv_session.xml` shown below.

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-3d9e4f253516593e404d240f2a3b64d10793f64a%2Fcyto.IGVExample.png?alt=media)

## Advanced Topics

### Cytogenetics Modality

Conventional cytogenetics methodologies typically focus on larger alterations than the ones provided by NGS analyses. The Cytogenetics modality for the CNV caller allows the user to visualize CNAs at different resolutions, aiming at providing a more flexible workspace for different use cases.

It is enabled with `--cnv-enable-cyto-output` (default true for germline workflows). Not available for somatic WES workflows.

From the same sample, and during the same run, the Cytogenetics modality starts from the high resolution results (before smoothing) provided in the standard output CNV VCF. The output callset then undergoes multiple rounds of smoothing, going progressively from finer resolution to coarser resolution calls (larger alterations). Each round of smoothing produces a smoothed callset which is set aside and becomes the starting point for callsets with higher degree of smoothing.

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-c21aae6744a21423831bbe81a8961166c8549eeb%2Fcyto.Smoothing.BlockDiagram.png?alt=media)

At the end of the smoothing procedure, the Cytogenetics modality produces several outputs, e.g.:

* Multiple GFF3 files, one for each round of smoothing (extension `*cyto.<resolution_ID>.gff3`).
* A single VCF file, with extension `*.cyto.vcf.gz`. This file contains all callsets identified through the smoothing iterations, where the iteration identifier is stored on the `INFO/RES` field. Identical alterations across resolutions are deduplicated. In such case, the `INFO/RES` field will contain a comma-separated list of resolution identifiers.
  * Some resolutions will be based on depth of coverage only (no BAF). Their `INFO/RES` value will reflect the original callset used as a starting point, with added suffix `_depth`. E.g., for depth-only calls derived from resolution `1M`, the new callset will have resolution ID `1M_depth`. Note: calls made at different resolutions or with different information (depth+BAF versus depth-only) may occasionally conflict. For instance, in a region that is AOH that also has a mosaic DEL, the region may be reported as AOH for the depth+BAF calling but may be reported as (mosaic) DEL for the depth-only track. The event type with the strongest evidence will be output for each resolution.
  * An additional callset which does not conform to the ones above (no `INFO/RES` field) is the one containing whole-arm/-chromosome aneuploidies. For this callset, all reported records have the chromosome name or arm name in the `INFO/SEGID` field. Entries for this callset will not be present on any GFF3 file. For more details see the section on whole-chromosome aneuploidies below.
* A single IGV session file, with extension `*.cyto.igv_session.xml`, which provides a convenient way to load the multiple GFF3 files and other typical tracks found on the standard `*.cnv.igv_session.xml`. Below an example screenshot of one of such IGV sessions:
  * The first 5 tracks provide the DRAGEN CNV calls (Blue/DEL, Green/REF, Magenta/AOH, Red/DUP) at decreasing degree of resolution (from high to low, top to bottom).
  * The remaining tracks are similar to the standard `*cnv.igv_session.xml` run, e.g.: poor mappability regions, target counts coverage, improper pairs, B-allele frequency, etc.

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-3d9e4f253516593e404d240f2a3b64d10793f64a%2Fcyto.IGVExample.png?alt=media)

Below, an example set of calls from the `*.cyto.vcf.gz` output file (note additional `INFO/RES` annotation with respect to `*.cnv.vcf.gz` output file):

```
# Example REF call
chr1    819841  DRAGEN:REF:chr1:819841-6103865  N       .       1000    PASS
  END=6103865;REFLEN=5284025;RES=25k,500k,50k
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/0:2:1:1000:1000:2.00155:1.000775:1.000775:129.1:0.5:4544:10920:66,10:0.00368019

# Example GAIN call
chr1    16605768        DRAGEN:GAIN:chr1:16605769-16645359      N       <DUP>   427     PASS
  END=16645359;REFLEN=39591;RES=25k;SVLEN=39591;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE
  ./1:6:.:1:.:6.27065:.:3.135326:404.457:.:23:0:6,11

# Example LOSS call
chr1    25274774        DRAGEN:LOSS:chr1:25274775-25331683      N       <DEL>   226     PASS
  END=25331683;REFLEN=56909;RES=25k,50k;SVLEN=56909;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:1:0:1000:1000:1.01085:0.000000:0.505426:65.2:0:7:10:5,1:0
```

**Selection of appropriate resolution**

Since the most-informative resolution may vary depending on circumstances (event sizes, distance between calls, presence of smaller calls causing fragmentation, etc), no one-size-fits-all recommendation can work for all cases. However, some practical recommendations to consider are the following:

* Each resolution `INFO/RES` ID identifies the *minimum size* for alterations to be considered PASS.
* If only minimal call smoothing is necessary, resolution 25k can provide a good balance and provide calls in size ranges compatible with Chromosomal Microarray (CMA).
* When comparing against technologies such as karyotyping, resolution 1M may be the more appropriate to reduce call fragmentation.

Note: if the use case under consideration is not impacted by call fragmentation, it is typically recommended to use the `*.cnv.vcf.gz` or `*.cnv_sv.vcf.gz` output results (instead of the ones in `*.cyto.vcf.gz`), to take full advantage of the superior detail of NGS.

**Additional options**

| Option                                          | Description                                                                                    |
| ----------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| --cnv-cyto-keep-resolutions=\<resolution\_list> | Comma-separated list of resolutions to output (currently supported: 25k,50k,500k,1M,1M\_depth) |

**Whole-chromosome Aneuploidy Detection**

For some use cases, it is sometimes necessary to inspect a sample at arm or whole-chromosome level. Typically this would require the use of an additional caller, together with the standard CNV caller with automated segment detection. On the same run, the Cytogenetics modality provides such set of calls within the same VCF file (with extension `*.cyto.vcf.gz`).

```
chr21  12000000   DRAGEN:GAIN:chr21:12000001-46709983  N   <DUP>  1000  PASS
  END=46709983;REFLEN=34709983;SEGID=chr21q;SVLEN=34709983;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:3:1:1000:1000:3.00155:1.002518:1.500775:193.6:0.334:29570:66224:0,0:0.0016016

chrX   1        DRAGEN:LOSS:chrX:2-156040895     N     <DEL>  1000  PASS
  END=156040895;REFLEN=156040894;SEGID=chrX;SVLEN=156040894;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:SM:SD:MAF:BC:AS:PE:OBF
  0/1:1:0:1000:1000:0.996364:0.000996:0.498182:82.2:0.001:122580:144548:0,0:0.00995089
```

In the example above, two calls derived from such callset. The segment ID annotation (`INFO/SEGID`) provides the name for the segment call under consideration (i.e., for this example, q-arm of chromosome 21 and the entire chromosome X). REF calls are not displayed by default unless required explicitly by the user (i.e., with `--cnv-enable-ref-calls true`. Note: this will enable REF calls for both CNV and CYTO VCF files).

Note: acrocentric chromosomes (13, 14, 15, 21, and 22) have short arms characterized by repetitive regions. These regions create mappability issues and they are typically excluded from analysis. Thus, calling short arm alterations for these chromosomes is challenging, being based on a small percentage of total arm's length. To avoid false positive calls (in this case, indicating an alteration on the full short arm with evidence only coming from a minimal portion of it), the algorithm has a hard threshold (default 500 intervals) on the minimum number of intervals required when calling whole-arm alterations. When the chromosome arm call does not satisfy this threshold, the call is filtered with `FILTER` `chromArmBinCount`. The default can be changed with option `cnv-filter-chrom-arm-bin-count`.

### MOSAIC fraction estimation

For MOSAIC alterations, DRAGEN attempts inference of the mosaic fraction (`MF`), that is, the percentage of cells showing the alteration.

After copy number calling, the call's mosaic fraction is preliminarily estimated from the total and minor-allele copy-number (`CN`, `MCN`) and floating point estimates (`CNF`, `MCNF`). For example: in the case of `CN=4`, `CNF=4.48`, the `CN` of the population without the alteration is considered $CN'=5$, and then the mosaic fraction preliminary estimate is $MF=1-0.48=0.52$.

The call observed `MAF` is then cross-checked with the expected `MAF`:

$$MAF=\frac{M\_1n\_1(1-q)+M\_2n\_2q}{n\_1(1-q)+n\_2q}$$

\* Note: this algorithm assumes only 2 cells populations (population 2: with the MOSAIC alteration called with `CN` and `MCN`, population 1: all remaining cells).

where:

* $MAF$ is the expected `MAF` of the mixture
* $n\_1$ and $n\_2$ represent the expected `CN` of the 2 cell populations
* $M\_1$ and $M\_2$ represent the expected `MAF` of the 2 cell populations
* $q$ denotes the mosaic fraction (aka `MF`, fraction of the 2nd cell population)

If the observed `MAF` is consistent with the expected `MAF` (considering a 5% tolerance on the `MF` value), the `MF` value is returned. Otherwise, the algorithm investigates alternative ($n\_1$, $q$) configurations that are compatible with $n\_2$ and the copy-number floating point estimate (`CNF`). If at least one alternative passes the expected `MAF` compatibility check, the updated $q$ is returned in the `MF` field. In all other cases, `MF=.`.

### Low-pass WGS support

The germline WGS caller supports reliable detection of CNVs from low-pass WGS data. Low-pass WGS is a highly cost-effective approach for CNV detection, providing genome-wide resolution at substantially lower cost than standard WGS or WES:

* Cost-effective CNV detection at low sequencing depth (1× to 10×)
* Comparable performance to WGS for cytogenetic-scale events (>1 Mb)
* Detects CNVs down to a few hundred kilobases
* Supports whole-chromosome aneuploidy and mosaic events

#### CNV Detection Capabilities

* **Variant types**:
  * Deletions
  * Duplications
* **Resolution tiers**:
  * Cytogenetic (coarse): ≥ 1 Mb
  * CNV (fine): 200 kb – 1 Mb
* **Minimum event size**:
  * 200 kb hard filter
* **B-allele frequency (BAF)**:
  * Not estimated in low-pass mode

#### Output Files

| Output File | Resolution           | Size Range    |
| ----------- | -------------------- | ------------- |
| cyto.vcf.gz | Coarse (cytogenetic) | ≥ 1 Mb        |
| cnv.vcf.gz  | Fine                 | 200 kb – 1 Mb |

#### Command-Line Usage

Enable low-pass CNV calling using the `--cnv-enable-lowpass=true` option:

```bash
dragen \
  -1 sample_R1.fastq.gz \
  -2 sample_R2.fastq.gz \
  --RGID RGID \
  --RGSM RGSM \
  --enable-map-align=true \
  --enable-map-align-output=true \
  --ref-dir=<REFERENCE> \
  --output-file-prefix=dragen \
  --output-directory=<OUTPUT_DIR> \
  --cnv-enable-lowpass=true
```

#### Example records

**CNV**

```
chr4       123918579       DRAGEN:LOSS:chr4:123918580-124314854       N       <DEL>   190     PASS
  END=124314854;CIPOS=-59095,63223;CIEND=-54689,56222;REFLEN=396275;SVLEN=396275;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE
  0/1:1:0:1000:190:0.987000:.:.:0.493500:197.4:.:7:0:7,10
```

**Cytogenetics**

```
chrX       1       DRAGEN:GAIN:chrX:2-156040895       N       <DUP>   1000    PASS
  END=156040895;REFLEN=156040894;SEGID=chrX;SVLEN=156040894;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE       1:2:0:1000:.:1.942077:.:.:1.942077:355.4:.:2472:0:0,0
```

**Mosaic events**

```
chr8       1       DRAGEN:GAIN:chr8:2-145138636       N       <DUP>   1000    PASS
  END=145138636;REFLEN=145138635;SEGID=chr8;MOSAIC;SVLEN=145138635;SVTYPE=CNV
  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE
  ./1:3:.:0:.:2.272202:.:0.272000:1.136101:314.7:.:2530:0:0,0
```

#### Hard Filter Options

Low-pass CNV calling applies filters based on CNV length and bin count to reduce noise associated with low sequencing coverage.

| Option                 | Default | Description                         |
| ---------------------- | ------- | ----------------------------------- |
| --cnv-filter-length    | 200 kb  | Minimum CNV length for a PASS call. |
| --cnv-filter-bin-count | 4       | Minimum bin count for a PASS call.  |

### CNV with SV Support

The DRAGEN CNV caller leverages depth/BAF as its primary signal for calling copy number variants. CNV alone poses challenges for calling events that are less than 10kbp. The sensitivity of CNVs at lengths less than 10kbp can be improved by leveraging junction signals from the DRAGEN structural variant caller.

When both the DRAGEN CNV and SV caller are executed in a single invocation, then an additional integration step is done at the end of a DRAGEN run to improve the CNV calls. This feature is enabled automatically when DRAGEN detects a germline WGS analysis.

The SV/CNV Integration module takes in DEL and DUP calls from the output data structures of the germline CNV and SV callers, identifies putative matches, updates annotations, filters, scores, and outputs the refined records in CNV VCF. By leveraging junction signals from the SV caller and depth/BAF signals from the CNV caller, this approach allows for sensitive CNV detection down to 1kbp while also improving recall and precision across length scales. This is achieved by rescuing previously low quality calls if evidence is found from both callers, and also by adjusting CNV breakends to the more accurate SV breakends. The matching algorithm takes into account the proximity of the events as well as the transition states at the breakends, among other things.

#### Example command lines

The following is an example command line for running a germline WGS analysis for both CNV and SV.

```
dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--bam-input <BAM> \
--enable-map-align false \
--enable-cnv true \
--cnv-population-b-allele-vcf <POP_SNP_VCF> \
--enable-sv true
```

Other optional CNV or SV parameters can also be added.

Note: There is a high sensitivity mode that can be enabled with `--sv-cnv-enable-high-sensitivity-mode=true`. This option is experimental and will disable many filters in the processing chain to allow for more SV+CNV calls to pass. It is recommended that users apply their own training and downstream filters when using this option.

#### VCF Output

CNV calls with SV support are output in the CNV VCF (`*.cnv.vcf.gz`). The VCF header includes all header information from the individual CNV and SV callers, with some header lines deduplicated and additional header lines added from SV/CNV integration. For details on the individual caller header lines, please refer to the CNV and SV sections of the user guide. In cases where users want to obtain a separate CNV/SV VCF file while keeping the original CNV and SV VCF outputs, they can specify `--sv-cnv-output-as-cnv-vcf=false`. CNV calls with SV support are then output in a separate CNV/SV VCF with the `*.cnv_sv.vcf.gz` extension. In this case, the original CNV and SV VCF files prior to integration are also available in the DRAGEN output directory, as described elsewhere.

Newly added header lines from SV/CNV integration are described in the following table.

| Header Field        | Number | Type    | Description                                                                                                         |
| ------------------- | ------ | ------- | ------------------------------------------------------------------------------------------------------------------- |
| END\_LEFT\_BND\_OF  | 1      | String  | ID of CNV whose left end is matched to the end of SV                                                                |
| END\_RIGHT\_BND\_OF | 1      | String  | ID of CNV whose right end is matched to the end of SV                                                               |
| LEFT\_BND           | 1      | String  | ID of SV that matches the left end of CNV record                                                                    |
| LEFT\_BND\_OF       | 1      | String  | ID of CNV whose left end is matched to SV                                                                           |
| MatchSv             | 1      | Integer | ID of original SV that was merged with CNV record                                                                   |
| OrigCnvEnd          | 1      | Integer | Coordinate of original CNV end                                                                                      |
| OrigCnvPos          | 1      | Integer | Coordinate of original CNV pos                                                                                      |
| RIGHT\_BND          | 1      | String  | ID of SV that matches the right end of CNV record                                                                   |
| RIGHT\_BND\_OF      | 1      | String  | ID of CNV whose right end is matched to SV                                                                          |
| SVCLAIM             | A      | String  | Claim made by the structural variant call. Valid values are D, J, DJ for abundance, adjacency and both respectively |

Records that can be matched or rescued will have annotations indicating the breakpoint linkage between a CNV and SV record. If a complete match is found, then the `MatchSv` annotation will be present in the record, indicating the SV record's `ID` field for this CNV record. In this case, BND notations refer to the merged record ID itself rather than the SV before merging. Furthermore, the use of the `SVCLAIM` field will indicate if the record has evidence arising from depth/BAF signal `D`, or junction signals `J`, or both `DJ`.

Because of the mixing of standalone SV records and CNV records, the FORMAT field may have different annotations. For details on the CNV or SV specific annotations, please refer to the individual CNV and SV user guide sections.

Records that can be matched or rescued will have FILTER set to PASS. The original FILTERs are retained for records that were not matched or rescued. For example, the `cnvLength` FILTER will still be applied to standalone CNV records (those with `SVCLAIM=D`).

Example records are shown below.

```
# Merged record, note presence of SVCLAIM=DJ and MatchSv
chr1    24478046        DRAGEN:LOSS:chr1:24478047-24480950      N       <DEL>   1000    PASS
  END=24480950;CIPOS=0,9;CIEND=0,9;REFLEN=2904;SVLEN=2904;SVTYPE=DEL;LEFT_BND=DRAGEN:LOSS:chr1:24478047-24480950;OrigCnvPos=24477572;RIGHT_BND=DRAGEN:LOSS:chr1:24478047-24480950;OrigCnvEnd=24481505;SVCLAIM=DJ;MatchSv=DRAGEN:DEL:3301:0:1:0:0:0;CIGAR=1M2904D;HOMLEN=9;HOMSEQ=CCACCACGC;RIGHT_BND_OF=DRAGEN:REF:chr1:21465759-24478046;LEFT_BND_OF=DRAGEN:LOSS:chr1:24478047-24480950;END_RIGHT_BND_OF=DRAGEN:LOSS:chr1:24478047-24480950;END_LEFT_BND_OF=DRAGEN:REF:chr1:24480951-25405351   GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE:OBF:GQ:PL:PR:SR:SB:FS:MLQS:VF:VF1:VAF1:VF2:VAF2       1/1:1:0:15:15:1.212644:0.369856:.:0.606322:105.5:0.305:3:10:54,59:0:11:999,14,0:59,45:28,16:15,13,6,10:4.442:.:71,54:36,54:0.600000:35,54:0.606742
 
# CNV record that did not match, note presence of SVCLAIM=D
chr1    169257422       DRAGEN:LOSS:chr1:169257423-169273108    N       <DEL>   277     PASS
  END=169273108;CIPOS=-7669,1182;CIEND=-1265,10511;REFLEN=15686;SVLEN=15686;SVTYPE=CNV;SVCLAIM=D  GT:CN:MCN:CNQ:MCNQ:CNF:MCNF:MF:SM:SD:MAF:BC:AS:PE:OBF   0/1:1:0:1000:1000:1.067816:0.000000:.:0.533908:92.9:0:14:38:1,5:0
  
# SV record that did not match, note presence of SVCLAIM=J
chr1    16744928        DRAGEN:LOSS:chr1:16744929-16746692      N       <DEL>   999     PASS
  END=16746692;SVTYPE=DEL;SVLEN=1764;CIGAR=1M1764D;CIPOS=0,24;CIEND=0,24;HOMLEN=24;HOMSEQ=GCCAACATGGTGAAACCCTGTCTC;SVCLAIM=J GT:GQ:PL:PR:SR:SB:FS:MLQS:VF:VF1:VAF1:VF2:VAF2  1/1:4:999,6,0:22,21:49,33:22,27,18,15:3.011:.:63,43:38,43:0.530864:25,43:0.632353
```

### Coverage Uniformity

The DRAGEN CNV pipeline provides a measure of the quality of the data for a sample. If using the WGS self-normalization method, the additional `CoverageUniformity` metric is present in the VCF header. The CNV pipeline assumes that post-normalization target counts are independently and identically distributed (IID). Coverage in most high-quality WGS samples is uniform enough for the CNV caller to produce accurate calls, but some samples violate the IID assumption. Issues during library preparation or sample contamination can lead to several extreme outliers and/or waviness of target counts, which can result in a large number of false positive CNV calls. The `CoverageUniformity` metric quantifies the degree of local coverage correlation in the sample to help identify poor-quality samples.

A larger value for this metric means the coverage in a sample is less uniform, which indicates that the sample has more nonrandom noise, and could be considered poor quality. The CoverageUniformity metric depends on factors other than sample quality, such as the `cnv-interval-width` setting and sample mean coverage. It is recommended to use this score to compare the quality of samples from similar mean coverage and the same command line options. Because of this, DRAGEN CNV only provides the metric and does not take any action based on it.

### Call Smoothing

The segmentation stage might produce adjacent or nearby segments that are assigned the same copy number and have similar depth and BAF data. This segmentation can result in a region with consistent true copy number being fragmented into several pieces. The fragmentation might be undesirable for downstream use of copy number estimates. Also, for some uses, it can be preferable to smooth short segments that would be assigned different copy numbers whether due to a true copy number change or an artifact. To reduce undesirable fragmentation, initial segments can be merged during a postcalling segment smoothing step.

After initial calling, segments shorter than the specified value of `--cnv-filter-length` are deemed negligible. Among the remaining nonnegligible segments, successive pairs are evaluated for merging. The caller combines two successive segments that are within `--cnv-merge-distance` of one another and have the same CN and MCN assignments, along with any intervening negligible segments into a single segment that is recalled and rescored. If the merged segment receives the same CN and MCN as its constituent nonneglible pieces with a sufficiently high-quality score, the original segments are replaced with the merged segment. The merged segment might be further merged with other initial or merged segments to either side. Merging proceeds until all segment pairs that meet the criteria are merged.

### QUAL Model

QUAL estimation is based on a model associated with the most likely diploid coverage estimated from depth of coverage and B-allele frequency.

Given such diploid coverage, for each segment, the algorithm calls the most likely copy number state (complete with total copy number CN, and minor allele copy number MCN).

The probability of the REF state is used in input to the scoring algorithm which outputs the QUAL value (a PHRED score capped at 1000). The QUAL value is the PHRED score where the probability of error is the probability of REF when an alteration is called, or the probability of having a non-REF call when the segment should be called REF.

Note: this is different from how QUAL is computed in the legacy [depth-only caller](https://github.com/illumina-swi/dragen-docs/blob/release/4.5-prod/product-guides/dragen-v4.5/user-guide/dragen-dna-pipeline/cnv-calling/legacy/cnv-germline-legacy.md).

## Comparison with ROH caller

Both the [ROH caller](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/small-variant-calling/roh-caller) and the germline CNV caller can detect runs-of-homozygosity (ROH) regions.

The two algorithms underlying the two different approaches might occasionally disagree. The differences are due to the following:

* The ROH caller requires minor-allele frequency to be \~0. In contrast, the germline CNV caller will assign to each segment its most likely copy-number state. This can include MOSAIC alterations, not available in the ROH caller.
* The ROH caller is dependent on the small variant caller, and only uses the SNPs that it calls. In contrast, the germline CNV caller works with a catalog of SNPs from population variation studies, such as 1000 Genomes.
* The ROH caller uses a blacklist bed file to filter certain sites and reduce call fragmentation. In contrast, the germline CNV caller does not need to filter any site but provides an alternative smoothing algorithm to reduce call fragmentation, which is agnostic on the sample under consideration.
* The ROH caller identifies ROH regions but does not provide the total copy number of the region under consideration. In contrast, the germline CNV caller also reports the copy number for the region (which could be different from reference ploidy).

## Limitations

The following features (available in the [depth-only workflow](https://github.com/illumina-swi/dragen-docs/blob/release/4.5-prod/product-guides/dragen-v4.5/user-guide/dragen-dna-pipeline/cnv-calling/legacy/cnv-germline-legacy.md)) are not yet supported:

* Multisample/Pedigree mode

## Multisample Germline CNV Calling

Multisample Germline CNV calling is possible starting from tangent normalized counts files (`*.tn.tsv.gz`) specified with the `--cnv-input` option (one per sample). Multisample CNV analysis benefits from using joint segmentation to increase the sensitivity of detection of copy number variable segments. For each copy number variable segment identified, the copy number genotype of each sample is emitted in a single VCF entry to facilitate annotation and interpretation.

Multisample Germline CNV analysis is supported for [legacy (depth-only) WGS and WES workflows](https://github.com/illumina-swi/dragen-docs/blob/release/4.5-prod/product-guides/dragen-v4.5/user-guide/dragen-dna-pipeline/cnv-calling/legacy/cnv-germline-legacy.md).

### Example command lines

The following is an example command line for running a trio analysis:

```
dragen \
-r <HASHTABLE> \
--output-directory <OUTPUT> \
--output-file-prefix <SAMPLE> \
--enable-cnv true \
--cnv-input <FATHER_TN_TSV> \
--cnv-input <MOTHER_TN_TSV> \
--cnv-input <PROBAND_TN_TSV> \
--pedigree-file <PEDIGREE_FILE>
```

### De Novo CNV Calling Options

Make sure all input samples have gone through the same single sample workflow and have identical intervals. If the samples are WES inputs, then you must generate the samples using the same panel of normals, and the autosomal intervals for all samples must match.

The following options are used in DeNovo CNV calling:

| Option                    | Description                                                                                                                      |
| ------------------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| --cnv-input               | Input tangent-normalized signal files (`*.tn.tsv.gz`) from single sample runs. Can be specified multiple times, once per sample. |
| --cnv-filter-de-novo-qual | Phred-scaled threshold for calling an event as de novo in the proband. Default: `0.125`.                                         |
| --pedigree-file           | Pedigree file specifying the relationship between input samples.                                                                 |

### Joint Segmentation

First, CNV calling is performed on each sample independently. Joint segmentation then uses the copy number variable segments from each single sample analysis to derive a set of joint copy number variable segments. This set of joint segments is determined simply by taking the union of all breakpoints from the copy number variable segments of all samples. This results in the splitting of any partially overlapping segments across different samples. For example:

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-f92097350c021cae34d6d787223d79a21d80621a%2Fmultisample-cnv-calling.JointSegmentation.png?alt=media)

Following joint segmentation, copy number calling is again performed independently on each sample using the joint segments. Segments can be merged as with the single sample analysis, but each joint segment is emitted in the multisample VCF as a single entry. The quality score (`QS` in the VCF) from the sample's merged segment, if applicable, is used for filtering the call. Sample calls are filtered using the sample's FT field in the multisample VCF. The `QUAL` column of the multisample VCF is always missing (ie, "."). The `FILTER` column of the multisample VCF is `SampleFT` if none of the sample's `FT` fields are `PASS`, and `PASS` if any of the sample's `FT` fields are `PASS`.

Note, however, that when a single segment in one sample overlaps multiple segments in another sample, the larger segment annotation is replicated across multiple records, e.g. (only relevant VCF fields are printed below):

```
DRAGEN:REF:chr22:21917617-22385563	GT:SM:CN:BC:PE:QS:FT	./.:1.01773:2:867:0,0:62:PASS	./.:1.00693:2:379:0,0:61:PASS
DRAGEN:LOSS:chr22:22385564-22549952	GT:SM:CN:BC:PE:QS:FT	./.:1.01773:2:867:0,0:62:PASS	0/1:0.695867:1:135:0,0:7:cnvQual
DRAGEN:LOSS:chr22:22549953-23041393	GT:SM:CN:BC:PE:QS:FT	./.:1.01773:2:867:0,0:62:PASS	0/1:0.614398:1:341:0,0:40:PASS
DRAGEN:LOSS:chr22:23041394-23055519	GT:SM:CN:BC:PE:QS:FT	./.:1.01773:2:867:0,0:62:PASS	0/1:0.31226:1:141:0,0:52:PASS
DRAGEN:LOSS:chr22:23055520-23198595	GT:SM:CN:BC:PE:QS:FT	0/1:0.57652:1:168:0,0:41:PASS	0/1:0.31226:1:141:0,0:52:PASS
DRAGEN:LOSS:chr22:23198596-23241095	GT:SM:CN:BC:PE:QS:FT	0/1:0.57652:1:168:0,0:41:PASS	1/1:0.128:0:39:0,0:42:PASS
```

The previous can be visualized as:

![](https://25033470-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG9szlFZupV6Q2DasL98y%2Fuploads%2Fgit-blob-1bb69f40ac266ad8ea957220ceca5b08895c4f3b%2Fmultisample-cnv-calling.JointSegmentationMergeExample.png?alt=media)

### De Novo Calling Stage

A de novo event is defined as the existence of a genotype at a particular locus in a proband's genome that did not result from standard Mendelian inheritance from the parents. The de novo calling stage identifies putative de novo events in the proband of each trio of a multisample analysis. In some cases, these putative de novo events may be real, but they can also arise from sequencing or analysis artifacts. Consequently, a de novo quality score is assigned to each putative de novo event and used to filter out low-quality de novo events. Trios are specified by specifying a .ped file with the `--pedigree-file` option. Multiple trios can be specified (eg, quad analysis), and all valid trios will be processed.

For each joint segment in a trio, the de novo caller determines if there is a Mendelian inheritance conflict for the called copy number genotypes. The CNV caller does not identify the copy number for each allele of a given diploid segment, which means assumptions are made about the possible allelic composition of the parent genotypes.

The assumption is that the copy number 0 allele is not present for diploid regions of a parent's genome (sex dependent) when the assigned copy number is 2 or greater. This results in simplifications, as follows:

| Parent Copy Number Genotype | Possible Copy Number Alleles | Assumed Possible Copy Number Alleles |
| --------------------------- | ---------------------------- | ------------------------------------ |
| 2                           | 0/2, 1/1                     | 1/1                                  |
| 3                           | 0/3, 1/2                     | 1/2                                  |
| 4                           | 0/4, 1/3, 2/2                | 1/3, 2/2                             |
| N                           | x/(N-x) for x <= N/2         | x/(N-x) for 1 <= x <= N/2            |

The following are examples of consistent and inconsistent copy number genotypes for diploid regions using these assumptions:

| Mother Copy Number | Father Copy Number | Proband Copy Number | Mendelian Consistent? |
| ------------------ | ------------------ | ------------------- | --------------------- |
| 2                  | 2                  | 2                   | Yes                   |
| 2                  | 2                  | 1                   | No                    |
| 3                  | 2                  | 4                   | No                    |
| 3                  | 2                  | 2                   | Yes                   |
| 2                  | 0                  | 2                   | No                    |

If a joint segment has a Mendelian inheritance conflict, a Phred-scaled de novo quality score (`DQ` field in the VCF) is calculated using the likelihoods for each copy number state (see Quality Scoring section) of each sample in the trio, combined with a prior for the trio genotypes:

$$DQ = -10log \left( \frac{1-\sum\_C{p(CN\_m|data) \cdot p(CN\_f|data) \cdot p(CN\_p|data) \cdot p(CN\_m,CN\_f,CN\_p)}}{\sum\_G{p(CN\_m|data) \cdot p(CN\_f|data) \cdot p(CN\_p|data) \cdot p(CN\_m,CN\_f,CN\_p)}} \right)$$

Where:

* $$G$$ is the set of all genotypes
* $$C$$ is the set of conflicting genotypes
* $$CN\_m$$ is the Mother copy number
* $$CN\_f$$ is the Father copy number
* $$CN\_p$$ is the Proband copy number
* $$p(CN\_m,CN\_f,CN\_p)$$ is the prior for the trio genotype

The `DN` field in the VCF is used to indicate the de novo status for each segment. Possible values are:

* `Inherited` - the called trio genotype is consistent with Mendelian inheritance
* `LowDQ` - the called trio genotype is inconsistent with Mendelian inheritance and DQ is less than the de novo quality threshold (default 0.125)
* `DeNovo` - the called trio genotype is inconsistent with Mendelian inheritance and `DQ` is greater than or equal to the de novo quality threshold (default 0.125)

### Multisample CNV VCF Output

The records in a multisample CNV VCF differ slightly from the single sample case. The major differences are as follows:

The per-record entries are broken down into the segments among the union of all the input samples breakpoints, which means there are more entries in the overall VCF.

The `QUAL` column is not used and its value is ".". The per-sample quality is carried over into the `SAMPLE` columns with the `QS` tag.

The `FILTER` column indicates `PASS` if any of the individual `SAMPLE` columns `PASS`. Otherwise, it indicates `SampleFT`.

The per-sample annotations are carried over from their originating calls. The single sample filters are applied at the sample level and are emitted in the `FT` annotation.

Additionally, if a valid pedigree is used, then de novo calling is performed, which adds the following two annotations to the proband sample.

```
##FORMAT=<ID=DQ,Number=1,Type=Float,Description="De novo quality">
##FORMAT=<ID=DN,Number=1,Type=String,Description="Possible values are `Inherited', 'DeNovo' or 'LowDQ'. Threshold for a passing de novo call is DQ > 0.125000">
```

While the VCF contains many entries, due to the joint segmentation stage, the number of de novo events can be found by extracting entries that have a `DN` and `DQ` annotation. These records are also extracted and are converted to GFF3 in the de novo calling case.

### Chromosome X and Y behavior

The sample sex from the single sample analysis (either estimated or overriden using the `--sample-sex` option) can be overriden by specifying the sex in the input pedigree file (i.e. 1 for male and 2 for female). To use the sample sex from the single sample analysis an unknown sex can be specified in the pedigree file using the value 0 (rather than 1 for male or 2 for female).

Note that when all samples in the pedigree are female, then no calls on chrY will be emitted for any sample. When the pedigree includes at least one male sample, only the male samples will have genotype info reported in the VCF for chrY and any VCF entries on chrY will have a "missing" Genotype column (i.e. ".") for all corresponding female samples in the pedigree.
