> For the complete documentation index, see [llms.txt](https://help.dragen.illumina.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://help.dragen.illumina.com/dragen-v4.5/product-guides/dragen-v4.5/germline/tandem_repeats_accuracy.md).

# Tandem Repeats

Short tandem repeats (STRs) are regions of the genome consisting of repetitions of short DNA segments called repeat units. STRs can expand to lengths beyond the normal range and cause mutations called repeat expansions. For more information, refer to the [DRAGEN user guide](/dragen-v4.5/product-guides/dragen-v4.5/dragen-dna-pipeline/repeat-expansions.md).

DRAGEN-STR ships with two STR catalogs. The *default* catalog contains a restricted number of well-studied loci whose expansion is linked with various diseases. The *expanded* catalog contains \~174,000 highly-polymorphic STR loci located in and around genes, and is more suitable for whole-genome explorations.

## STR size genotyping accuracy in the expanded catalog

Accuracy on the expanded catalog loci was evaluated against the GIAB tandem repeat benchmark (v1.0) (1), using Truvari (2) for variant comparison.

**DRAGEN**: DRAGEN 4.5.4 | **Truthset**: HG002 GIABTR v1.0 | **Reference**: GRCh38

<details>

<summary>Standard WGS STR table (click to expand)</summary>

| Subtype | Recall | Precision | F1-score | FN   | FP  |
| ------- | ------ | --------- | -------- | ---- | --- |
| STR     | 0.9617 | 0.9871    | 0.9742   | 2007 | 664 |

</details>

![STR for Standard WGS](/files/u2qsHsasArrdVOd4n79B)

## Classification of samples with known pathogenic STR expansions

We sequenced 40 Coriell cell lines with known STR expansions and compared DRAGEN's STR classification against orthogonal validation methods. The swimplot shows long-allele size distributions in repeat-count units. Dots are colored by [STRipy database](https://www.stripy.org/database) range classification (normal/intermediate/pathogenic) according to each locus thresholds and orthogonally validated size prediction. Shaded regions and classification thresholds from Ibañez K. et al., Lancet Neurology 21, 234–245 (2022).

<details>

<summary>STR classification metrics table (click to expand)</summary>

| Subtype                             | Recall | Precision | F1-score |
| ----------------------------------- | ------ | --------- | -------- |
| STR Classification Accuracy Metrics | 0.9824 | 1.0000    | 0.9911   |

</details>

STR lengths distribution across 20 loci for 40 Coriell samples with pathogenic STRs. Shaded regions show normal, intermediate, and pathogenic ranges (STRipy database, motif-length scaled). Dots are colored by classification.

![STR size distribution, 20 loci, bp, filled intermediate](/files/q6WyQWO4BOcp3VU4pvAA)

#### References

1. <https://www.nature.com/articles/s41587-024-02225-z>
2. <https://link.springer.com/article/10.1186/s13059-022-02840-6>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragen.illumina.com/dragen-v4.5/product-guides/dragen-v4.5/germline/tandem_repeats_accuracy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
