DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • MSI Algorithm
  • Command-Line Options
  • Example command for tumor-only mode
  • Example command for tumor-normal mode
  • Assay-Specific Settings
  • Microsatellite sites files
  • Microsatellite site list columns
  • Custom Microsatellite files
  • Germline variant filtering
  • Normal references of miscrosatellite repeat distribution
  • MSI Output
  • MSI score report
  • Distribution of repeat lengths
  • Difference between tumor and normal samples

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4
  3. DRAGEN DNA Pipeline
  4. Biomarkers

Microsatellite Instability

PreviousTumor Mutational BurdenNextHomologous Recombination Deficiency

Last updated 2 days ago

Was this helpful?

Microsatellites are genomic regions of short DNA motifs that are repeated 5–50 times and are associated with high mutation rates. Microsatellite Instability (MSI) results from deficiencies in the DNA mismatch repair pathway and can be used as a critical biomarker to predict immunotherapy responses in multiple tumor types.

DRAGEN MSI supports running in tumor-normal and tumor-only modes. The tumor-only mode will require a panel of normals. The panel of normals can be generated using the collect-evidence mode.

The default microsatellite site lists and the panel of normals are available for WES and WGS (). Custom panels other than WES and WGS may require more extensive validation and possibly require .

MSI Algorithm

The MSI algorithm performs the following steps:

  1. Tabulate the number of read alignments for each microsatellite site in tumor and normal samples.

    • A read is counted toward a repeat length only if the sequence contains the repeat sequence, 5 bases each on the left and right flanks as specified in the microsatellite site list.

    • When msi-read-stitching is turned on, a pair of reads are counted as one read if they are overlapping with each other.

  2. Calculate Jensen-Shannon distance of tumor and normal distributions

    • In tumor-normal mode, the JS distance is calculated bewteen the tumor sample and the normal sample.

    • In tumor-only mode, we first calculate intra-normal JS distances between all pairs of normal samples. Then, we normalize the mean JS distance between the tumor sample and all normal samples by the mean intra-normal distance.

  3. Compute P-values for each site using

    • chi-square testing between tumor and normal distributions in tumor-normal mode, and

    • student-t testing between mean tumor and normal distributions in tumor-only mode.

  4. Determine if the site is assessed if the followign criteria are satisfied:

    • the total number of supporting reads is greater than SpanningCoverageThreshold in both tumor and normal samples

    • the number of reads supporting the reference repeat length is larger than MinReferencePeakHeight.

  5. Determine if a site is unstable based on both the Jensen-Shannon distance and P-values. A site is unstable if JS distance is larger than DistanceThreshold (default=0.1), and P-value is smaller than PValueThreshold (default=0.01).

  6. Determine if the site passes filters based on specific peak heights if the following criteria are satisfied:

    • the number of reads supporting (reference repeat length - 1) is greater than or equal to MinLeftPeakHeight

    • the ratio of number of reads supporting reference repeat length and (reference repeat length - 1) is between MinLeftPeakRatio and MaxLeftPeakRatio.

    If a filter is not passed, the site is counted toward total assessed site, but is not counted toward unstable sites even though distance and P-value pass the thresholds.

  7. Summarize stats and produce a report in the given assessed site count, unstable site count, the percentage of unstable sites in all assessed sites and the sum of the Jensen-Shannon distance of all the unstable sites. The parameter values mentioned above are also reported.

Command-Line Options

Example command for tumor-only mode

It is recommended to use tumor-only mode rather than tumor-normal if a panel of normals is available. It is also recommended to match the sample types of the panel of normals and the tumor sample for optimal performance. For example, a panel of normals that are FFPE samples should be used with FFPE or FF (Fresh-Frozen) tumor samples.

The TSO500 panels do not have normal controls, and are only tested and validated in tumor-only mode.

dragen \
--msi-command tumor-only \
--msi-coverage-threshold 60 \ 
--msi-microsatellites-file ${microsatellite_file} \
--msi-ref-normal-dir ${normal_reference_directory} \
--output-directory ${output_directory} \
--output-file-prefix ${prefix} \
--enable-map-align=true \
--RGID=read_group_ID \ 
--RGSM=read_group_sample \
--ref-dir ${reference_directory} \
--enable-map-align-output=true \
--enable-sort true \
--enable-duplicate-marking=true \
--tumor-fastq1 ${tumor_fq1} \
--tumor-fastq2 ${tumor_fq2}

Example command for tumor-normal mode

The paired normal sample is specified by --fastq-file1 and --fastq-file2.

dragen \
--msi-command tumor-normal \
--msi-coverage-threshold 60 \ 
--msi-microsatellites-file ${microsatellite_file} \
--output-directory ${output_directory} \
--output-file-prefix ${prefix} \
--enable-map-align true \
--RGID=read_group_ID \ 
--RGSM=read_group_sample \
--ref-dir ${reference_directory} \
--enable-map-align-output true \
--enable-sort true \
--enable-duplicate-marking true \
--tumor-fastq1 ${tumor_fq1} \
--tumor-fastq2 ${tumor_fq2} \
--fastq-file1 ${fq1} \
--fastq-file2 ${fq2}
Option
Description

msi-command

Mode of execution: tumor-only, tumor-normal, or collect-evidence.

msi-microsatellites-file

msi-ref-normal-dir

msi-ref-normal-input

Full name of a combined file with reference normal repeat length distributions from multiple samples.

msi-read-stitching

Whether to count overlapping reads as one fragment. It is recommended to set this option to True for libraries with short fragments. When read-stitching is turned on, the coverage of reads on each site will be lowered. It is recommended to lower msi-coverage-threshold especially for lower coverage samples.

msi-coverage-threshold

msi-distance-threshold

Threshold for distance distributions to be considered different. Default is 0.1. For liquid samples, a value of 0.02 is recommended.

Assay-Specific Settings

TSO500 Solid microsatellite instability is defined as all samples with "PercentageUnstableSites >= 20". It is generally recommended to use "PercentageUnstableSites" as metric for determining the MSI status. This metric is normalized, and is expected to be more consistent for different pipelines and with different input site files. The exact thresholds for other assays may still depend on the sample noise characteristics (PCR / UMI etc) and may need some empirical calibration.

Sample Type
Assay
Microsatelitte file
Specific Settings
PercentageUnstableSites Threshold

Solid

TSO500

Part of TSO500 resource bundle. Repeats 10 - 50. 130 sites.

msi-distance-threshold=0.1

20

Heme

TSO500

N/A

N/A

N/A

Liquid (cfDNA)

TSO500

Part of TSO500 resource bundle. Repeats 6,7. 2344 sites.

msi-distance-threshold=0.02

TBD

Solid, Heme

WES

Available for download. Repeats 10 - 50. Approx. 3.5K sites.

msi-distance-threshold=0.1

TBD

Liquid (cfDNA)

WES

Available for download. Repeats 10 - 50. Approx. 3.5K sites.

msi-distance-threshold=0.02

TBD

Solid, Heme

WGS

Available for download. Repeats 10 - 50.Approx. 1 mil sites.

msi-distance-threshold=0.1

TBD

Liquid (cfDNA)

WGS

Available for download. Repeats 10 - 50. Approx. 1 mil sites.

msi-distance-threshold=0.02

TBD

Microsatellite sites files

The following is an example of a microsatellite file:

#chromosome     location        repeat_unit_length      repeat_unit_binary      repeat_times    left_flank_binary       right_flank_binary      repeat_unit_bases       left_flank_bases    right_flank_bases
chr1	985443	1	2	15	676	992	G	GGGCA	TTGAA
chr1	7980985	1	0	10	231	1020	A	ATGCT	TTTTA
chr1	8022800	1	3	19	13	41	T	AAATC	AAGGC
chr1	8029500	1	2	10	39	0	G	AAGCT	AAAAA
chr1	9146447	1	3	15	887	248	T	TCTCT	ATTGA
chr1	9767837	1	3	12	704	195	T	GTAAA	ATAAT

For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the panel of interest. This will avoid using any off-target reads in the MSI analysis.

Microsatellite site list columns

Column name
Description

chromosome

Chromosome of the site

location

Start location of the site

repeat_unit_length

Size of the repeat unit

repeat_unit_binary

Binary encoding of the repeat unit base converted to decimal (A: 0, C: 1, G: 2, T: 3)

repeat_times

Number of repeats units in reference

left_flank_binary

Left flank bases in terms of binary encoding converted to decimal

right_flank_binary

Right flank bases in terms of binary encoding converted to decimal

repeat_unit_bases

Repeat unit base in A/T/C/G

left_flank_bases

Five bases on the left flank of the microsatellite site

right_flank_bases

Five bases on the right flank of the microsatellite site

Custom Microsatellite files

Custom Microsatellite site files may be required if a small panel is targeted and/or the default site files do not have sufficient overlapping sites.

Custom Microsatellite site files can be generated by using MSIsensor-Pro https://github.com/xjtu-omics/msisensor-pro/wiki/Best-Practices.

msisensor-pro scan -d /path/to/reference.fa -o ${microsatellite_file}

A subsequent post-processing step is required for the site list to be used by DRAGEN:

  • only keep microsatellites sites with a repeat unit of length 1

  • keep sites with 10 - 50bp repeats (a max length of 100bp repeats is supported)

  • remove any sites containing Ns in the left or right anchors

  • downsample the remaining sites to contain no more than 1 million sites (to avoid excessive run time)

An error would occur if long (>100bp) microsatellite sites are present in the file.

The Microsatellite site file output by MSI-sensor Pro is in a different format as the DRAGEN site file. A post-processing step is required to convert the format.

Germline variant filtering

We recommend filtering out microsatellite sites that overlap with known population variants. A locus affected by small variants will result in artificially inflated differences between samples. In the example below, the site in normal sample overlaps with a heterozygous variant (possibly a one-base ins/del). In the paired tumor sample, the heterozygosity is lost (LOH). The difference observed between the two distributions are not due to microsatellite instability, but LOH.

Normal references of miscrosatellite repeat distribution

The normals reference can be provided in two formats: as separate files in one directory, or as a single file containing distributions from multiple samples.

  • Separate files can be provided with msi-ref-normal-dir. The directory should contain only the .dist files that are used as normal references.

  • A combined file can be provided with msi-ref-normal-input. The combined .dist file must contain an additional column that specifies the name of the sample for each distribution.

dragen -f \
--msi-command collect-evidence \
--ref-dir ${reference_directory} \
--msi-microsatellites-file ${microsatellite_file} \
--msi-coverage-threshold 60 \
--output-directory ${output_directory} \
--output-file-prefix ${prefix} \
-1 ${normal_fq1} \
-2 ${normal_fq2}

Please note:

  • The collect-evidence mode MUST be run in DRAGEN germline mode, as indicated by fastq options -1 and -2.

  • The --msi-microsatellites-file and --msi-coverage-threshold settings used in collect-evidence mode must be consistent with the settings used during tumor-only MSI calling.

  • At least 20 normal samples are required.

MSI Output

DRAGEN outputs the following files during the MSI workflow:

File name
Description

<prefix>.microsat_output.json

<prefix>.microsat_diffs.txt

<prefix>.microsat_normal.dist

<prefix>.microsat_tumor.dist

<prefix>.microsat_log.txt

Logs the runtime and MSI results

MSI score report

The JSON file <prefix>.microsat_output.json contains the parameters to reproduce the experiments, and the MSI results (including the MSI score PrecentageUnstableSites).

{   
    "Settings":{
        "Command": "tumor-normal",
        ...,
    },
    "TotalMicrosatelliteSitesAssessed": "20020",
    "TotalMicrosatelliteSitesUnstable": "4374",
    "PecentageUnstableSites": "21.850000000000001",
    "ResultIsValid": "true",
    "ResultMessage": "",
    "SumDistance": "1214.174" 
}

The "SumDistance" is the sum of Jensen-Shannon distance of all unstable sites based on distances of tumor vs normal distributions. The "SumDistance" depends on the size of microsatellite file, and is not normalized. In general it is recommended to set MSI thresholds based on "PecentageUnstableSites" rather than "SumDistance".

In TSO500, Solid microsatellite instability is defined as all samples with "PercentageUnstableSites >= 20". The exact thresholds for other assays with different site files and noise characteristics may need some empirical calibration.

Distribution of repeat lengths

DRAGEN MSI computes the number of repeat units (repeat lengths) supported by each read fragment.

The distribution is recorded in <prefix>.microsat_normal.dist and <prefix>.microsat_tumor.dist for normal and tumor samples, respectively.

Example .dist file:

#chromosome     location        repeat_unit_bases       reference_allele        covered length_distribution
chr1    985443  G       15      false   0,0,0,0,0,0,0,0,0,0,0,0,...,0
chr1    7980985 A       10      true    0,0,0,0,0,0,2,0,8,393,14,1,...,0
chr1    8022800 T       19      true    0,0,0,0,0,0,0,0,0,0,0,0,0,2,3,3,4,35,42,13,2,2,0,0...0,0

Summing up the numbers in the last column give the total number of reads covering the site.

Columns in .dist files:

Column name
Description

chromosome

chromosome of the site

location

start position of the site

repeat_unit_bases

the base(s) of the repeat unit in reference in A/T/C/G string

reference_allele

the number of repeats in reference

covered

whether the site is covered by sufficient reads (determined by msi-coverage-threshold)

length_distribution

A vector of size 100 that records read support for each repeat length from 1 to 100.

Difference between tumor and normal samples

Example <prefix>.microsat_diffs.txt file

#Chromosome	Start	RepeatUnit	Assessed	Distance	PValue	PassFilter
chr1	69106	T	true	0.04105300052	0.4786448589	true
chr1	69116	TC	false	0	0	false

Columnns in <prefix>.microsat_diffs.txt

Column name
Description

Chromosome

chromosome of the site

Start

start position of the site

RepeatUnit

the base(s) of the repeat unit in reference in A/T/C/G string

Assessed

whether the base is assesed based on read coverage and number of reads supporting the reference length

Distance

the Jensen-Shannon distance between tumor and normal distritbutions

PValue

statistical significance of the difference observed between distributions

PassFilter

whether the site passes filters based on on specific peak heights

Specify the file containing the . DRAGEN has tested with ≥ 10 bp homopolymers for solid samples, and 6-7 bp homopolymers for liquid samples.

Full name of directory containing files with . These files can be generated by running collect-evidence on each normal sample. A site is only evaluated if at least 20 normal samples have enough coverage for that site.

Specify the minimum spanning read coverage for a microsatellite. Microsatellites that do not meet the specified threshold are not assessed in analysis. DRAGEN recommends using 60 as the default value for solid samples. If the coverage is low, user can try lowering the threshold to 30 to increase the number of microsatellite sites assessed in the analysis. For TSO500 liquid, a value of 500 is recommended. See for the details on how the number of spanning reads are counted.

Default WES and WGS Microsatellite site files can be downloaded here:

rearrange the columns to match the format of a DRAGEN microsatellite site list (see )

We recommend using as the reference database to filter all sites that overlap with small variants with population allele frequencies > 1%.

Normal reference files can be generated by running collect-evidence mode on a panel of normal samples. The output is in the same format as the .dist file described in . The default normal reference files are also available for WES and WGS at .

reports MSI status and parameters used in JSON format.

reports the statistical distance between tumor and normal samples for each site, and stats used to determine the status of the site. This file is not part of output in collect-evidence mode.

reports the repeat length distribution of each site in a normal sample. This file is not part of output in tumor-only mode.

reports the repeat length distribution of each site in tumor sample. This file is not part of output in collect-evidence mode.

The above figure shows a mock example of read pileup (left) at a pre-specified homopolymer site with 10 repeat units of T in reference with two abnormal alignments at bottom, and the distribution of repeat lengths (right) corresponding to the pileup.

The details of how column values are computed can be found in .

DRAGEN Software Support Site page
gnomAD
DRAGEN Software Support Site page
generating a new sites file
JSON output file
Microsatellite site list columns
DRAGEN Software Support Site page
MSI output
MSI algorithm
microsatellite sites
normal reference repeat length distribution
MSI algorithm
MSI score report
Difference between tumor and normal samples
Distribution of repeat lengths
Distribution of repeat lengths
msi-snv