Minimal Checklist
This section provides a succinct checklist of critical Quality Control (QC) metrics for evaluating DRAGEN run performance. These metrics are located in the standard CSV output files in the run directory.
1. Basic Run & Alignment QC
Source File: <output_prefix>.mapping_metrics.csv
Default Status: Enabled
Note that mapping metrics are computed on the raw input sample (prior to optional UMI collapsing).
Total input reads
Volume check: Confirm read count matches sequencer output and assay expectations. E.g ~600–900M reads for 30X human WGS. Low counts indicate flow cell loading issues or demultiplexing failures.
Q30 bases
Base quality: Generally > 85–90%. A global drop indicates chemistry or flow cell issues (see FastQC metrics for positional detail).
Mapped reads
Alignment success: Aim for > 95% (human WGS). Low mapping (< 90%) suggests wrong reference genome, severe contamination, or poor library quality.
Supplementary (chimeric) alignments
Structural integrity: Reads split across distant loci. For high-quality human germline WGS, expected values are typically < 2–3%. Values > 3–5% warrant investigation and may indicate library chimera artifacts (e.g. PCR stitching, FFPE-related fragmentation) or, in somatic samples, true structural variation. Sustained levels > 5% in germline samples are generally considered abnormal.
Soft-clipped bases (R1/R2)
Adapter/quality trimming: High percentages indicate adapter read-through, short insert sizes, or poor-quality read ends (e.g. FFPE artifacts). For high-quality libraries, typical values are < 2–3%. Values > 3–5% suggest suboptimal trimming or degraded DNA. Levels > 5% should be treated as a QC concern and reviewed alongside FastQC metrics.
Estimated sample contamination
Purity check: (Requires --qc-detect-contamination=true). For human germline samples, > 1–2% contamination can materially impact variant calling accuracy, especially for low-frequency somatic variants.
Duplicate reads
Library complexity: Elevated duplication rates suggest reduced library complexity or over-amplification. For high-quality germline WGS, typical values are < 20%. For WES and other targeted assays, higher duplication rates (e.g. 20–50%) are common and should be interpreted in the context of on-target coverage and assay design.
Insert length (median)
Fragment size: Should match the library prep target (e.g. ~350 bp). Deviations can affect coverage uniformity.
2. FastQC (Sequence Composition)
Source File: <output_prefix>.fastqc_metrics.csv
Default Status: Enabled
Note that FastQC metrics are computed on the raw input sample (prior to optional UMI collapsing).
Positional base mean quality
Cycle decay: Identify quality drop-off at read ends or specific cycles (e.g. fluidics issues).
Read GC content
Contamination/bias: Deviation from expected distribution (e.g. human ~40–45% GC) suggests contamination or severe PCR bias.
Sequence positions (adapters)
Adapter content: High levels indicate untrimmed adapters or short inserts. This can lead to high levels (e.g. > 5%) of reported soft-clipped bases in the mapping metrics. Confirm trimming options are enabled.
3. UMI QC (Applies Only to UMI Designs)
Source File: <output_prefix>.umi_metrics.csv
Default Status: Enabled as part of the UMI pipeline (requires --umi-enable=true)
Consensus reads
Conversion efficiency: Low ratio relative to total input reads suggests low molecular complexity or insufficient sequencing depth.
Mean family size
Saturation: Family sizes near 1.0 indicate under-sequencing; error correction is ineffective without duplicate families. Very high mean family sizes may indicate excessive PCR amplification.
4. Coverage QC
Source File: <output_prefix>.*_coverage_metrics.csv
(e.g. wgs_coverage_metrics.csv or target_bed_coverage_metrics.csv)
Default Status: Enabled
Please note coverage metrics are computed post read deduplication or UMI collapsing. Reads with MAPQ=0 are ignored.
Average alignment coverage over target bed / genome
Depth check: Primary driver of sensitivity. Ensure it meets the assay target (e.g. 30× germline, 100×+ somatic).
Uniformity of coverage (PCT > 0.2× mean)
Bias check: Low uniformity indicates coverage bias (e.g. GC bias), leading to variant calling blind spots. Germline WGS typically expects ≥ 80–90% uniformity.
PCT of genome with coverage [x: inf)
Callability: (e.g. PCT ≥ 20×). Germline WGS typically requires > 95% at 20×.
Aligned bases in target bed / genome
Yield: Total usable data. Useful for normalizing performance across runs or flow cells.
5. Variant-Level Sanity Check (Optional)
Source File: <output_prefix>.vc_metrics.csv
Default Status: Conditional (variant caller enabled)
Ti/Tv Ratio
Biological plausibility: For human germline WGS, expect approximately 1.9–2.2. Significantly lower values (e.g. < 1.7) often indicate elevated false-positive rates due to sequencing or alignment errors.
Note: Ti/Tv is a biological sanity check, not a standalone QC pass/fail metric. Expected values depend on organism, assay type, and genomic region.
How to Use This Checklist
Treat this as a minimum QC review, not an exhaustive list.
Values outside these ranges warrant investigation but are not automatic failures.
Some metrics are assay-specific and may require optimization for particular use cases.
Last updated
Was this helpful?