Somatic ML for Small VC (Beta)

DRAGEN Somatic Tumor-Normal small variant calling has an optional workflow which employs machine learning-based variant recalibration. Variant calling accuracy is improved using powerful yet efficient machine learning techniques that augment the variant caller. A supervised machine learning method was developed to build a model that processes read and other contextual evidence to remove false positives and recover false negatives, for both SNVs and INDELs.

DRAGEN Somatic ML can be applied to WGS or WES samples. It also supports FF and FFPE sample types. DRAGEN Somatic ML should not be used when running with mutational signatures analysis or with HRD or TMB biomarkers enabled in the DRAGEN run. See Somatic ML limitations below.

Setup

DRAGEN Somatic ML is enabled using --vc-ml-enable-recalibration true. DRAGEN Somatic ML runs concurrently with DRAGEN Somatic SNV VC.

Inputs

DRAGEN Somatic ML requires a run with BAM, CRAM or FASTQ input, since the machine learning model extracts information from the read pile-up. Recalibration of existing VCF files is not supported.

It is recommended that a pre-built systematic noise file (v2) should be supplied to run Somatic ML for optimal performance. They are available at DRAGEN Software Support Sitepagearrow-up-right.

Outputs

DRAGEN Somatic ML recalibrates all quality scores, changing the value of the SQ field in the output VCF/GVCF.

DRAGEN Somatic ML PHRED scores (SQ) are better calibrated than and differ significantly from those with ML disabled and, as a consequence, SQ scores are lower. For this reason, the default SQ filtering thresholds are much lower when DRAGEN Somatic ML is enabled (5 and 3 for SNVs and INDELs respectively, compared to 17.5 for both SNVs and INDELs when DRAGEN Somatic ML is disabled).

The following variant types are recalibrated:

  • Autosomes and sex chromosomes

  • ForceGT calls

  • Non-primary contigs

The output SQ of DRAGEN Somatic ML is empirically more accurately calibrated than DRAGEN Somatic SNV VC without ML. Note that a small number of variant calls may have degraded accuracy with ML enabled compared to VC without ML.

Limitations

The following features should not be run together with DRAGEN Somatic ML:

  • Mutational signatures analysis

  • TMB biomarker

  • HRD biomarker

These use cases are untested and not fully supported in v4.5, but will be supported in future DRAGEN releases.

Last updated

Was this helpful?