# Minimal Checklist

This section provides a succinct checklist of critical Quality Control (QC) metrics for evaluating DRAGEN run performance. These metrics are located in the standard CSV output files in the run directory.

## 1. Basic Run & Alignment QC

**Source File:** `<output_prefix>.mapping_metrics.csv`\
**Default Status:** Enabled

Note that mapping metrics are computed on the raw input sample (prior to optional UMI collapsing).

| Metric Name                             | Critical Trend / Success Criteria                                                                                                                                                                                                                                                                                                                                                                                 |
| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Total input reads**                   | **Volume check:** Confirm read count matches sequencer output and assay expectations. E.g \~**600–900M reads** for 30X human WGS. Low counts indicate flow cell loading issues or demultiplexing failures.                                                                                                                                                                                                        |
| **Q30 bases**                           | **Base quality:** Generally **> 85–90%**. A global drop indicates chemistry or flow cell issues (see FastQC metrics for positional detail).                                                                                                                                                                                                                                                                       |
| **Mapped reads**                        | **Alignment success:** Aim for **> 95%** (human WGS). Low mapping (< 90%) suggests wrong reference genome, severe contamination, or poor library quality.                                                                                                                                                                                                                                                         |
| **Supplementary (chimeric) alignments** | **Structural integrity:** Reads split across distant loci. For high-quality human germline WGS, expected values are typically **< 2–3%**. Values **> 3–5%** warrant investigation and may indicate library chimera artifacts (e.g. PCR stitching, FFPE-related fragmentation) or, in somatic samples, true structural variation. Sustained levels **> 5%** in germline samples are generally considered abnormal. |
| **Soft-clipped bases (R1/R2)**          | **Adapter/quality trimming:** High percentages indicate adapter read-through, short insert sizes, or poor-quality read ends (e.g. FFPE artifacts). For high-quality libraries, typical values are **< 2–3%**. Values **> 3–5%** suggest suboptimal trimming or degraded DNA. Levels **> 5%** should be treated as a QC concern and reviewed alongside FastQC metrics.                                             |
| **Estimated sample contamination**      | **Purity check:** (Requires `--qc-detect-contamination=true`). For human germline samples, **> 1–2%** contamination can materially impact variant calling accuracy, especially for low-frequency somatic variants.                                                                                                                                                                                                |
| **Duplicate reads**                     | **Library complexity:** Elevated duplication rates suggest reduced library complexity or over-amplification. For high-quality germline WGS, typical values are **< 20%**. For WES and other targeted assays, higher duplication rates (e.g. **20–50%**) are common and should be interpreted in the context of on-target coverage and assay design.                                                               |
| **Insert length (median)**              | **Fragment size:** Should match the library prep target (e.g. \~350 bp). Deviations can affect coverage uniformity.                                                                                                                                                                                                                                                                                               |

## 2. FastQC (Sequence Composition)

**Source File:** `<output_prefix>.fastqc_metrics.csv`\
**Default Status:** Enabled

Note that FastQC metrics are computed on the raw input sample (prior to optional UMI collapsing).

| Metric Name                       | Critical Trend / Success Criteria                                                                                                                                                                                        |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Positional base mean quality**  | **Cycle decay:** Identify quality drop-off at read ends or specific cycles (e.g. fluidics issues).                                                                                                                       |
| **Read GC content**               | **Contamination/bias:** Deviation from expected distribution (e.g. human \~40–45% GC) suggests contamination or severe PCR bias.                                                                                         |
| **Sequence positions (adapters)** | **Adapter content:** High levels indicate untrimmed adapters or short inserts. This can lead to high levels (e.g. **> 5%**) of reported soft-clipped bases in the mapping metrics. Confirm trimming options are enabled. |

## 3. UMI QC (Applies Only to UMI Designs)

**Source File:** `<output_prefix>.umi_metrics.csv`\
**Default Status:** Enabled as part of the UMI pipeline (requires `--umi-enable=true`)

| Metric Name          | Critical Trend / Success Criteria                                                                                                                                                                      |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Consensus reads**  | **Conversion efficiency:** Low ratio relative to total input reads suggests low molecular complexity or insufficient sequencing depth.                                                                 |
| **Mean family size** | **Saturation:** Family sizes near **1.0** indicate under-sequencing; error correction is ineffective without duplicate families. Very high mean family sizes may indicate excessive PCR amplification. |

## 4. Coverage QC

**Source File:** `<output_prefix>.*_coverage_metrics.csv`\
(e.g. `wgs_coverage_metrics.csv` or `target_bed_coverage_metrics.csv`)\
**Default Status:** Enabled

Please note coverage metrics are computed post read deduplication or UMI collapsing. Reads with MAPQ=0 are ignored.

| Metric Name                                             | Critical Trend / Success Criteria                                                                                                                                      |
| ------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Average alignment coverage over target bed / genome** | **Depth check:** Primary driver of sensitivity. Ensure it meets the assay target (e.g. **30×** germline, **100×+** somatic).                                           |
| **Uniformity of coverage (PCT > 0.2× mean)**            | **Bias check:** Low uniformity indicates coverage bias (e.g. GC bias), leading to variant calling blind spots. Germline WGS typically expects **≥ 80–90%** uniformity. |
| **PCT of genome with coverage \[x: inf)**               | **Callability:** (e.g. `PCT ≥ 20×`). Germline WGS typically requires **> 95% at 20×**.                                                                                 |
| **Aligned bases in target bed / genome**                | **Yield:** Total usable data. Useful for normalizing performance across runs or flow cells.                                                                            |

## 5. Variant-Level Sanity Check (Optional)

**Source File:** `<output_prefix>.vc_metrics.csv`\
**Default Status:** Conditional (variant caller enabled)

| Metric Name     | Critical Trend / Success Criteria                                                                                                                                                                                      |
| --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Ti/Tv Ratio** | **Biological plausibility:** For human germline WGS, expect approximately **1.9–2.2**. Significantly lower values (e.g. **< 1.7**) often indicate elevated false-positive rates due to sequencing or alignment errors. |

**Note:**\
Ti/Tv is a biological sanity check, not a standalone QC pass/fail metric. Expected values depend on organism, assay type, and genomic region.

### How to Use This Checklist

* Treat this as a **minimum QC review**, not an exhaustive list.
* Values outside these ranges warrant investigation but are **not automatic failures**.
* Some metrics are assay-specific and may require optimization for particular use cases.
