# Contamination Detection

The DRAGEN cross-sample contamination module estimates the fraction of sequencing reads originating from another human sample using a probabilistic mixture model.

DRAGEN provides **two contamination detection modes**. The appropriate mode depends on sample type, coverage, and expected contamination level.

***

## Quick Decision Guide

| What are you running?                 | Sample characteristics                | Setting to use                   | What DRAGEN does                                                                                                         |
| ------------------------------------- | ------------------------------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| General germline or somatic (default) | >= 20X coverage; FFPE/CNV/LOH allowed | `--qc-detect-contamination=true` | Runs GATK-based model; automatically falls back to legacy VerifyBamID-like model if GATK fails (e.g. high contamination) |
| RNA-seq                               | Variable expression and coverage      | `--qc-detect-contamination=true` | Runs GATK-based model in experimental mode; results are best-effort and qualitative                                      |
| Low coverage germline                 | Low coverage (\~10×), no FFPE/CNV/LOH | `--qc-cross-cont-vcf`            | Runs legacy VerifyBamID-like model directly; robust at low coverage                                                      |

***

## Fallback Mechanism

When `--qc-detect-contamination=true` is specified, DRAGEN:

1. First attempts contamination estimation using the **GATK-based model**
2. Automatically falls back to the **legacy VerifyBamID-like model** if the GATK-based model fails to converge, most commonly at high contamination levels

No additional settings are required to enable fallback behavior.

***

## GATK-Based Contamination Detection (Default)

**Use for:**\
Germline, tumor-only, and tumor-normal workflows. This is the **recommended default**.

**Enable**

```
--qc-detect-contamination=true
```

**Population Marker Resources**

```
/opt/dragen/<VERSION>/resources/qc/somatic_sample_cross_contamination_resource_*.vcf.gz
```

(hg19, hg38, hs37d5)

Markers can also be supplied explicitly:

```
--qc-somatic-contam-vcf <population_markers.vcf>
```

**Behavior**

* Accounts for FFPE damage, copy number variation (CNV), and loss of heterozygosity (LOH)
* Empirically adjusts base qualities to reduce FFPE deamination and oxidation noise
* Optimized for low-to-moderate contamination levels

***

### RNA-seq Support (Experimental)

`--qc-detect-contamination=true` can be run on RNA-seq data.

**Limitations**

* Less stable than DNA due to expression and coverage variability
* Results are qualitative indicators only
* Feature is experimental

***

## Legacy Contamination Detection (VerifyBamID-like)

**Use for:**\
Clean germline samples, especially at **low coverage (\~10×)**, or when fallback occurs.

**Enable**

```
--qc-cross-cont-vcf <population_markers.vcf>
```

**Population Marker Resources**

```
/opt/dragen/<VERSION>/resources/qc/sample_cross_contamination_resource_*.vcf.gz
```

(hg19, hg38, hs37d5)

**Behavior**

* Models the sample as a mixture of individuals
* Performs well on clean germline data
* Robust at low coverage
* Can remain informative at high contamination
* Not robust to FFPE, CNVs, or extended ROH

***

## Output and Interpretation

The contamination estimate is reported as a fraction:

```
MAPPING/ALIGNING SUMMARY Estimated sample contamination 0.011
```

This corresponds to **1.1% contamination**.

**Interpretation Guidance**

* Contamination should be well below the minimum allele frequency of interest
* Example: at 1% contamination, variants below \~5% AF may be unreliable
* The metric saturates near \~30% contamination

***

## Coverage and Validity Requirements

Contamination estimation requires **≥100 valid pileups**.

A pileup is valid if:

* Coverage ≥ **10×**
* ≥ **95% of reads are valid**

Soft-clipped reads are excluded. Excessive soft clipping is often caused by untrimmed adapters. If contamination is reported as **NA**, inspect marker loci in IGV and correct adapter issues upstream.

***

## Legacy Model–Specific Settings

| Setting                            | Description                                                                                                                     |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `--qc-contam-min-cov`              | Minimum coverage per pileup (default: 10).                                                                                      |
| `--qc-contam-min-valid-read-ratio` | Minimum fraction of valid reads (default: 0.95). Can be lowered to \~0.75, but adapter trimming issues should be fixed instead. |

***

## Key Takeaways

* Use **GATK-based contamination detection** for most workflows
* Use the **legacy model** for low-coverage clean germline samples
* High contamination triggers **automatic fallback** when using `--qc-detect-contamination=true`
* RNA-seq support is **experimental**


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragen.illumina.com/dragen-v4.5/product-guides/dragen-v4.5/qc-metrics-reporting/contamination-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
