DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Running DRAGEN
  • Profiling
  • Analysis
  • Output

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4
  3. DRAGEN DNA Pipeline
  4. Repeat Expansion Detection

De Novo Repeat Expansion Detection

Short tandem repeats (STRs) are regions of the genome consisting of repetitions of short DNA segments called repeat units. STRs can expand to lengths beyond the normal range and cause mutations called repeat expansions. Repeat expansions are responsible for many diseases, including Fragile X syndrome, amyotrophic lateral sclerosis, and Huntington's disease.

STR profiling allows the discovery of expanded STR regions from paired-end reads across a cohort of samples. It is designed to work with PCR-free samples of 100-200bp paired-end reads at >30X coverage.

Note:

  • STRs shorter than the read length are ignored; the program is appropriate only for detecting expansions that exceed the read length.

  • The location of each reported STR is approximate (up to about 500bp-1Kbp)

  • STRs are not genotyped; the program reports a depth-normalized count of reads originating inside each STR; this count can be used as a very approximate measure of the repeat length

  • To achieve best results all samples must be sequenced on the same instrument to similar coverage, have the same read and fragment lengths, and be subjected to the same computational pre-processing (e.g. reads must be aligned by the same aligner)

Briefly, the workflow can be separated in two distinct steps: profiling and analysis. In the profiling step, repetitive reads are found and used to infer the location of potential STR regions. The regions and the respective read counts are then saved in a "profile" on disk. The profiling step is run for each sample and the resulting profiles are merged into a single dataset for the analysis. In the analysis step the user needs to provide a table describing the experimental design to run either an outlier analysis which tests one sample against the rest or a case-control analysis where the samples are split in two groups.

Running DRAGEN

The two steps of the workflow, profiling and analysis, are performed by two separate DRAGEN commands.

Profiling

In the first step we compute the profiles which are going to be saved as ProtoBuf messages (<out_prefix>.data). The profile can be saved in a specific directory with the --str-profiler-output-directory flag. The sample name will be saved in the profile and can be specified at the profiling stage with the flag --str-profiler-sample-name. If not specified, the sample name in the RGSM field will be used instead.

DRAGEN has to be called once for each sample, for example with the command:

dragen \
  --enable-map-align=false \
  --output-directory=<out_dir>  \
  --output-file-prefix=<out_prefix>  \
  --bam-input <bam_input> \
  --enable-str-profiler=true \
  --str-profiler-sample-name=<optional_name> \            # if not set is == to RGSM
  --str-profiler-output-directory=<path_to_directory> \   # if not set is == to --output-directory
  -r=<dragen_ref>

After all the profiles are computed, they have to be divided in 'cases' and 'controls' directories. This can be achieved while computing the profiles by passing the directory with the --str-profiler-output-directory flag. The input can be a list of samples with the --fastq-list option. DRAGEN can take as input a list of FASTQ files and save each profile in the directory specified directory with --str-profiler-output-directory. A list of cases and a list of controls can be run in this manner.

Example command:

dragen \
  --enable-map-align=true \
  --output-directory=<out_dir>  \
  --output-file-prefix=<out_prefix>  \
  --fastq-list=<fastq_list> \
  --fastq-list-all-samples=true \                # necessary to run samples separately based on sample-id
  --enable-multi-sample=true \                   # necessary to run samples separately based on sample-id
  --enable-per-sample-map-align-output=true \    # can be set to false if map align output (BAM) is not needed
  --enable-str-profiler=true \
  --str-profiler-output-directory=<path_to_directory> \
  -r=<dragen_ref>

Analysis

The analysis is performed with a separate DRAGEN command, which takes as input the path to the two directories.

Two analysis types can be specified:

  • outlier = bootstraps the sampling distribution of the 95% quantile and then calculates the z-scores for the cases samples

  • casecontrol = cases and controls counts are compared with a one-sided Wilcoxon rank-sum test and a Bonferroni correction is applied to the resulting p-values

Providing the --str-profile-analysis flag will trigger the analysis workflow. Example command:

dragen \
  --output-directory=<out_dir>  \
  --output-file-prefix=<out_prefix>  \
  --enable-str-profiler=true \
  --str-profiler-analysis=<outlier|casecontrol> \
  --str-profiler-cases-directory=<directory_with_cases_profiles> \
  --str-profiler-controls-directory=<directory_with_controls_profiles> \
  -r=<dragen_ref>

A note about bootstrapping

DRAGEN uses bootstrapping to approximate the distribution for the outlier analysis with a default number of iteration of 1000. This number can be adjusted with the flag --str-profiler-resampling-rounds. Increasing the number of resampling cycles will improve the accuracy of the approximation but also linearly increase the compute times.

DRAGEN will spread the computation across 48 threads by default, but the number can be adjusted on the command line with the flag --str-profiler-threads.

Output

The output is composed of two tables, one for the "motif" level analysis and one for the "locus" level analysis which will be saved as <output-prefix>.str_profiler_locus.tsv and <output-prefix>.str_profiler_motif.tsv respectively. Below is a description of the locus analysis output. The motif table is the same as the locus table but without the contig, start and end columns.

Outlier analysis (locus) output

Column
Description

contig

Contig of the repeat region

start

Approximate start of the repeat

end

Approximate end of the repeat

motif

Inferred repeat motif

top_case_zscore

Top z-score of a case sample

high_case_counts

Counts of case samples corresponding to z-score greater than 1.0

counts

Nonzero counts for all samples

Case-control analysis (locus) output

Column
Description

contig

Contig of the repeat region

start

Approximate start of the repeat

end

Approximate end of the repeat

motif

Inferred repeat motif

pvalue

P-value from Wilcoxon rank-sum test

bonf_pvalue

P-value after Bonferroni correction

counts

Depth-normalized counts of anchored in-repeat reads for each sample (omitting samples with zero count)

PreviousRepeat Expansion DetectionNextTargeted Caller

Last updated 2 days ago

Was this helpful?