DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Running DRAGEN Gene Fusion
  • Gene Fusion Output
  • Gene Fusion Options and Filters
  • Merging Fusion Caller with the Splice Variant Caller
  • Reporting read-through fusions
  • Running RNA fusion detection with somatic SV evidence

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4
  3. DRAGEN RNA Pipeline

Gene Fusion Detection

The DRAGEN Gene Fusion module uses the DRAGEN RNA splice-aware aligner to detect gene fusion events. The supplementary (chimeric) alignments are used to find potential breakpoints and read evidence is accumulated for the resulting fusion event candidates. Then, an ML model is applied to score the putative fusion events to filter potential false positives. The ML scoring model is currently available on human samples only, and does not support non-human reference genomes.

Running DRAGEN Gene Fusion

You can run the DRAGEN Gene Fusion module together with a regular RNA-Seq map/align job. To enable the DRAGEN Gene Fusion module, set --enable-rna-gene-fusion to "true". The DRAGEN Gene Fusion module requires a gene annotations file in GTF or GFF format.

The following is an example command line for running an end-to-end RNA-Seq experiment with RNA fusion detection.

dragen \
-r <HASHTABLE> \
-1 <FASTQ1> \
-2 <FASTQ2> \
-a <ANNOTATION_FILE> \
--output-dir <OUT_DIRECTORY> \
--output-file-prefix <OUTPUT_PREFIX> \
--RGID <READ_GROUP_ID> \
--RGSM <SAMPLE_NAME> \
--enable-rna true \
--enable-rna-gene-fusion true \
--enable-duplicate-marking true 

At the end of a run, a summary of detected gene fusion events is output, which is like the following example.

==================================================================
Completed DRAGEN Gene Fusion Detection
==================================================================
Chimeric alignments: 3072
Total fusion candidates: 259
Final fusion candidates: 223

Gene Fusion Output

The <output-file-prefix>.fusion_candidates.features.csv file lists the detected gene fusion events. The output CSV file includes the following columns.

  • #FusionGene: Parent gene names (in 5' to 3' order of transcript) participating in the fusion; hereafter referred to as Gene 1 and Gene 2. If a fusion breakpoint overlaps multiple genes, the genes are listed by default as separate candidates (rows). To show them as a semi-colon separated gene list on the same row, the option --rna-gf-merge-calls can be set to "true" as described in the Gene Fusion Options and Filters section.

  • Score: Fusion call confidence score predicted by the ML model. If the ML model is used, the score can be 0 (low confidence) to 1 (high-confidence call). Currently the ML model only supports human references. In the case an ML model is not available, the number of supporting reads will be reported as the score.

  • LeftBreakpoint: Gene 1 breakpoint formatted as <Chromosome>:<Position>:<Strand>.

  • RightBreakpoint: Gene 2 breakpoint formatted as <Chromosome>:<Position>:<Strand>.

  • Filter: Semicolon separated list of filter flags. The LOW_SCORE filter is used to filter low confidence fusion candidates. If --rna-gf-enable-post-filtering=true, other confidence filters will also be applied. Informative filters, on the other hand, do not fail the fusion. In the absence of the ML model scoring (i.e. a non-human reference is used), a more aggressive post-filtering will take place and all confidence and informative filters will be applied.

The following are the available filters.

Filter
Type
Description
Option to set threshold

LOW_SCORE

Confidence (always applied)

The fusion candidate has low score (< 0.5) as determined by the ML model.

--rna-gf-min-score

MIN_SUPPORT

Confidence (optional)

The fusion has < 2 supporting read pairs.

--rna-gf-min-split-support

LOW_UNIQUE_ALIGNMENTS

Confidence (optional)

The minimum number of unique supporting read alignments required at each breakpoint in not met. Unique alignments have unique start and end positions and are not PCR duplicates.

--rna-gf-min-unique-alignments

LOW_MAPQ

Confidence (optional)

All fusion-supporting read alignments at either breakpoint have MAPQ < 20.

--rna-gf-min-breakpoint-mapq

DOUBLE_BROKEN_EXON

Confidence (optional)

If both breakpoints are >50 bp away from annotated exon boundaries, then the number of supporting reads do not satisfy a high threshold requirement (≥10 supporting reads). The distance indicates an intronic fusion.

--rna-gf-exon-snap --rna-gf-min-support-be

UNENRICHED_GENES

Confidence (optional)

--rna-gf-enriched-only

MITOCHONDRIAL_GENES

Confidence (optional)

The fusion candidate involves mitochondrial genes. Set --rna-gf-filter-chrm=false to disable this filter. Default value is "true".

--rna-gf-filter-chrm

READ_THROUGH

Confidence (optional)

The breakpoints are cis neighbors (< 200,000 bp) on the reference genome.

--rna-gf-min-cis-distance

ANCHOR_SUPPORT

Information only

Read alignments of fusion-supporting reads are not long enough (less than 12 bp) at either breakpoint.

--rna-gf-min-anchor

HOMOLOGOUS

Information only

The candidate is likely to be a false candidate generated because the two genes involved have high gene homology. Default threshold is 1e-100.

--rna-gf-min-blast-pairs-eval

LOW_ALT_TO_REF

Information only

The number of reads supporting the fusion is < 1% of the number of reads supporting the reference transcript at either breakpoint.

--rna-gf-min-alt-to-ref

LOW_GENE_COVERAGE

Information only

Either breakpoint has less than 125 bp with nonzero read coverage.

--rna-gf-min-covered-bases

Note that the specific features and column values are subject to change in future DRAGEN versions as more RNA data is analyzed.

  • #SplitScore: Combined count of fusion-supporting read pairs reported as split reads and soft-clipped reads

  • #NumSplitReads: Number of fusion-supporting read pairs with at least one split read alignment.

  • #NumSoftClippedReads: Number of fusion-supporting read pairs with no split read alignment, but at least one soft clipped alignment. Included in SplitScore and includes soft-clipped reads for both Gene1 and Gene2

  • #NumSoftClippedReadsGene1: Number of fusion-supporting read pairs with no split read alignment, but at least one soft clipped alignment to Gene 1

  • #NumSoftClippedReadsGene2: See above (NumSoftClippedReadsGene1) for Gene 2

  • #NumPairedReads: Number of fusion-supporting read pairs such that one of the reads maps to Gene1 and the other maps to Gene2, without any breakpoint overlap

  • #NumRefSplitReadsGene1: Number of read pairs that map fully within Gene 1 such that at least one of the reads aligns across the breakpoint. These reads support the reference transcript and do not support the fusion.

  • #NumRefSplitReadsGene2: See above (NumRefSplitReadsGene1) for Gene 2

  • #NumRefPairedReadsGene1: Number of read pairs such that one of the reads maps on the left side of the Gene 1 breakpoint and the other maps on the right side of the Gene 1 breakpoint, without overlapping the break. These reads support the reference transcript and do not support the fusion.

  • #NumRefPairedReadsGene2: See above (NumRefPairedReadsGene1) for Gene 2

  • #RefToAlt-- Log2 value of the ratio of max(NumRefSplitReadsGene1, NumRefSplitReadsGene2) / (fusion split + soft clipped reads); used for the LOW_ALT_TO_REF filter

  • #UniqueAlignmentsGene1: Unique (start-end) positions of fusion-supporting read alignments to Gene 1 (after dedup); used for the LOW_UNIQUE_ALIGNMENTS filter

  • #UniqueAlignmentsGene2: Unique (start-end) positions of fusion-supporting read alignments to Gene 2 (after dedup); used for the LOW_UNIQUE_ALIGNMENTS filter

  • #MaxMapqGene1: Maximum MAPQ for fusion-supporting reads in Gene 1

  • #AvgMapqGene1: Average MAPQ for fusion-supporting reads in Gene 1

  • #MaxMapqGene2: Maximum MAPQ for fusion-supporting reads in Gene 2

  • #AvgMapqGene2: Average MAPQ for fusion-supporting reads in Gene 2

  • #CoverageBasesGene1: Bases in Gene 1 with read coverage within a certain distance (default 1000 bp) of the breakpoint in the direction of the breakpoint strand which is part of the fusion transcript

  • #CoverageBasesGene2: See above (CoverageBasesGene1) for Gene 2

  • #DeltaExonBoundaryGene1: Distance from the Gene 1 breakpoint for the closest fusion-supporting alignment (higher distance to boundary lowers score)

  • #DeltaExonBoundaryGene2: See above (DeltaExonBoundaryGene1) for Gene 2

  • #IsRestrictedGene1: Indicator variable of whether Gene 1 is tagged as protein coding in the annotation file

  • #IsRestrictedGene2: Indicator variable of whether Gene 2 is tagged as protein coding in the annotation file

  • #IsEnrichedGene1: If enrichment or amplicon assay, then indicates whether Gene 1 is enriched. If whole transcriptome sequencing, then set to 1

  • #IsEnrichedGene2: See above (IsEnrichedGene1) for Gene 2

  • #CisDistance: Distance between breakpoints if they are adjacent to each other and on the same strand. Large value (3.2G) if not a CIS break; used for the READ_THROUGH filter.

  • #BreakpointDistance: Distance between breakpoints if they are adjacent. Large value (3.2G) if not within same chromosome

  • #GenePairHomologyEval: E-value of pairwise BLAST alignment of the parent genes

  • #AnchorLength1: Longest alignment of a fusion-supporting read to Gene 1

  • #AnchorLength2: Longest alignment of a fusion-supporting read to Gene 2

  • #NormalizedAnchorLength1: Normalized value of AnchorLength1 by the maximum read length.

  • #NormalizedAnchorLength2: Normalized value of AnchorLength2 by the maximum read length.

  • #FusionLengthGene1: Distance from breakpoint to the end of Gene 1

  • #FusionLengthGene2: Distance from breakpoint to the end of Gene 2

  • #NonFusionLengthGene1: Breakpoint distance to the end of transcript not part of the fusion for Gene 1

  • #NonFusionLengthGene2: Breakpoint distance to the end of transcript not part of the fusion for Gene 2

  • #Gene1Id: Gene ID reported in the annotation file for Gene 1

  • #Gene2Id: Gene ID reported in the annotation file for Gene 2

  • #Gene1Location:

    • IntactExon: Breakpoint matches exon boundary,

    • BrokenExon: Breakpoint is within an exon but does not match the exon boundary,

    • Intron: Breakpoint is within an intron,

    • Intergenic: Breakpoint does not overlap any gene

  • #Gene2Location: See above (Gene1Location) for Gene 2

  • #Gene1Sense: "TRUE" if the Gene 1 5' to 3' direction matches the breakpoint order, indicating that the gene is the upstream gene in the fusion transcript

  • #Gene2Sense: See above (Gene1Sense) for Gene 2

In addition, if --rna-gf-merge-calls is enabled, DRAGEN will merge the fusion candidates that overlap the same breakpoint into a single row reporting the feature values for the highest scoring passing candidate (or highest scoring failing candidate if no passing candidate is reported). For each breakpoint, in the column #FusionGene, it reports a semi-colon separated list of names of all overlapping genes with a passing candidate. The following two columns are added to the features.csv output file:

  • #AdditionalGenes1: If a mix of passing and failing candidates are reported for the same breakpoint of Gene 1, genes with only failing candidates are listed. If no passing candidate exists, then all overlapping genes are reported in the #FusionGene column.

  • #AdditionalGenes2: See above (AdditionalGenes1) for Gene 2

The <output-file-prefix>.fusion_candidates.final output file lists each passing fusion along with the read names that support the fusion, including Split Reads, Soft-clipped reads, and Paired (discordant) Reads and the passing scores. These reads can be extracted from the output BAM file and then used to visualize the fusions (i.e. in IGV). The same information for the non-passing fusions is provided in the <output-file-prefix>.filter_info output file.

The <output-file-prefix>.fusion_candidates.vcf.gz output file provides the VCF representation for all of the breakpoints for the candidate fusions using structural variant-style BND notation. The VCF header is annotated with ##source=DRAGEN_RNA_GF to indicate the file is generated by the DRAGEN RNA Gene Fusion pipeline. All fusion candidates (passing and failing) are represented in the VCF output with one entry for each side of the fusion breakpoint (Gene 1 and Gene 2).

The <output-file-prefix>.final.fusion_candidates.vcf.gz output file provides the filtered VCF for all passing breakpoints for the candidate fusions.

The <output-file-prefix>.fusion_metrics.csv output file provides a simple count of the total number of fusion candidates, those passing the scoring filter, and the number of unique left-right gene combinations that are found.

Gene Fusion Options and Filters

The following thresholds and options may be used to configure the fusion caller:

  • --rna-gf-enriched-regions Alternative to --rna-gf-enriched-genes, but input is provided as a bed-file with regions' coordinates instead of a gene list. All the genes in the provided annotation file that overlap such regions are included. Genes that are extracted in this way are summarized in output in the *.fusion.enriched_genes.txt file. This option cannot be provided together with --rna-gf-enriched-genes.

  • --rna-repeat-genes Text file that contains the names or IDs (from the annotation file) of targeted repetitive genes for sensitive fusion detection. Exclusive from --rna-repeat-intervals. This option overrides the default BED file. The repeat genes list should only contain genes listed in the input annotation file.

  • --rna-repeat-intervals BED file that contains a target list of repeat intervals for sensitive fusion detection. Exclusive from --rna-repeat-genes. This option overrides the default files, which contain the genes CIC, DUX4, NPM1, PSPH, and SEPTIN14 for GRCh38 and hg19 reference genomes.

  • --rna-gf-restrict-genes When parsing the gene annotations file for use in the DRAGEN Gene Fusion module, you can use this option to restrict the entries of interest to only protein-coding regions. Restricting the annotation to only the protein-coding genes reduces false positive rates in currently studied fusion events. To report non-coding gene fusions such as pseudo genes and lincRNAs, turn off this option. The default value is "true".

  • --rna-gf-merge-calls If multiple genes overlap a fusion breakpoint, DRAGEN generates and scores a separate fusion candidate for each gene pair overlapping the breakpoint. The default value is "false" so that each reported fusion event only has one left and right gene in the fusion, and overlapping genes are output as separate events.

  • --rna-gf-allow-overlapping-genes Allows for fusion calls between overlapping genes. The default value is "false".

  • --rna-gf-enable-post-filters Enable post-filtering of RNA gene fusion candidates by confidence flags. The filter flags are listed in the table above. The default value is "false".

  • --rna-gf-output-fusion-sequence Add a "FusionSequence" column for all passing fusions in the <output-file-prefix>.fusion_candidates.final file based on the contig assembly of all supporting reads. If no assembly was generated, then "NoAssembly" is reported. Setting this option to "true" also updates the fusion breakpoints based on the alignment of the assembled contig to the reference for the passing fusions in the <output-file-prefix>.fusion_candidates.final file and the output VCF files. The left and right breakpoint positions are chosen such that the alignment score between the assembled contig and the reference sequences is maximized. If there are multiple maximal alignments, the positions minimizing the distance from the original fusion breakpoints is reported. An additional "BreakpointLeeway" column is also added to the <output-file-prefix>.fusion_candidates.final file. This column has the form "-XX|+YY" where XX is the number of bases the reported breakpoint can be shifted left (relative to the left side of the fusion) while maintaining the maximal alignment score and YY is similarly the number of bases it can be shifted to the right. For example, "-2|+1" indicates the breakpoint could be shifted 2 to the left or 1 to the right and still have a maximal alignment score (due to identical bases occurring adjacent to the breakpoint). "NA" is output if no assembly is generated. The default value for this option is "true".

  • --rna-gf-sv-vcf Structural Variant VCF file output from DRAGEN DNA structural variant caller run in somatic mode. See below for more information.

Merging Fusion Caller with the Splice Variant Caller

When the splice variant caller and gene fusion caller are both enabled, the passing and failed intergenic splice variants will be passed to the gene fusion caller to be reported as candidate fusion events. This merging only occurs for genomes supported by the ML model for gene fusions (currently only Human genomes are supported). The passing calls are output to the fusion caller's <output-file-prefix>.fusion_candidates.final file. The tab separated fields are described below.

Field Names

Description

FusionGene

Left and Right gene names (separated by "--")

Score

Value between 0 and 1, from the splice variant caller

LeftBreakpoint, RightBreakpoint

The location for left and right sides of the splice with three colon separated fields: chromosome:coordinate:strand(+/-)

Gene1Location, Gene2Location

Splice Variant caller always outputs "SpliceVar" here instead of Exon/Intron location

Gene1Sense, Gene2Sense

Always TRUE for by design

Gene1Id, Gene2Id

Long form ID (i.e. for Gencode it is usually "ENSG.version")

NumSplitReads

Taken from the split_unique_reads_alt column value of the splice_varian_fusions.tsv file

NumSoftClippedReads, NumPairedReads

These values are not used by RSV caller and are set to "0"

ReadNames

Not provided by this caller and set to "N/A"

The passing splice variant calls will also be output to the VCF outputs: <output-file-prefix>.final.fusion_candidates.vcf and <output-file-prefix>.fusion_candidates.vcf with the "SPLICE_VARIANT" flag in the info field.

Reporting read-through fusions

Read-through gene fusions occur when neighboring genes are spliced together. These fusions are detected by the Splice Variant Caller as intergenic splice variants on adjacent genes and by default are not passed to the gene fusion caller. To detect them, enable the gene fusion and splice caller together with the following options:

  --enable-rna=true \
  --enable-rna-gene-fusion=true \
  --enable-rna-splice-variant=true \
  --rna-splice-variant-enable-readthrough=true

Running RNA fusion detection with somatic SV evidence

dragen \
-r <HASHTABLE> \
-1 <FASTQ1> \
-2 <FASTQ2> \
-a <ANNOTATION_FILE> \
--output-dir <OUT_DIRECTORY> \
--output-file-prefix <OUTPUT_PREFIX> \
--RGID <READ_GROUP_ID> \
--RGSM <SAMPLE_NAME> \
--enable-rna true \
--enable-rna-gene-fusion true
--rna-gf-sv-vcf <SV_VCF_PATH>

When the SV VCF input is provided to the RNA fusion caller, the following additional features will be reported in the features.csv output file:

  • #SvEvent: A semi-colon separated string representation of SV events matching the fusion candidate.

  • #SvType: A semi-colon separated list of type of the matching SV events.

  • #SomaticScore: The highest SomaticScore value of the matching SV events.

  • #SvDistance: The maximum distance between any SV breakpoint to any fusion breakpoints (if multiple matching SV events, then minimum of all maximum distances over all SV events).

  • #LeftSvDistance: The distance between the left fusion breakpoint and the corresponding SV breakpoint (if multiple matching SV events, then minimum over all SV events).

  • #RightSvDistance: The distance between the right fusion breakpoint and the corresponding SV breakpoint (if multiple matching SV events, then minimum over all SV events).

  • #SvPresent: Set to 1 if matching SV event is present, otherwise 0.

  • #SvAbsent: Set to 1 if no matching SV event is present, otherwise 0.

PreviousRNA AlignmentNextGene Expression Quantification

Last updated 2 days ago

Was this helpful?

If enrichment list provided, then neither parent genes is enriched. If Amplicon mode is enabled, then at least one parents gene is not enriched (See for further information).

--rna-gf-blast-pairs A tab separated file listing gene pairs that have a high level of similarity. The first and second column are the gene names, and the third column is the e-score. This list of gene pairs is used as a homology filter to reduce false positives. For runs on human genome assemblies GRCH38 and hg19, DRAGEN automatically applies a default file generated using annotations for primary chromosomes if no other file is specified using the command-line.

--rna-gf-enriched-genes For RNA enrichment assays, a list of targeted genes specified as one gene-name per line. Only fusion calls involving at least one gene on the list are reported. The enriched genes list should only contain genes listed in the input annotation file. This option cannot be provided together with --rna-gf-enriched-regions. If RNA amplicon mode is enabled and the amplicon bed file already includes the gene name, then you do not need to set this option; DRAGEN will read the enriched genes names from the amplicon BED file (fifth column). See for further information.

--enable-variant-annotation=true, --variant-annotation-assembly, and --variant-annotation-data Enable Illumina Annotation Engine (IAE) to report fusion annotations in JSON format. --enable-variant-annotation must be set to "true". For more information, see .

--enable-rna-amplicon A separate fusion filtering model is trained for RNA amplicon mode. Duplicate removal for fusion-supporting reads is disabled for RNA amplicon mode and both genes are required to be in the list of enriched genes. By default, the DRAGEN fusion caller filters candidates if a breakpoint overlaps both transcripts (e.g. fusions such as FIP1L1--PDGFRA and GOPC--ROS1). In RNA amplicon mode, such candidates are not filtered. See See for further information. The default is "false".

You can run the DRAGEN Gene Fusion module with a VCF file containing somatic Structural Variant (SV) calls. DRAGEN will report SV events matching each fusion candidate in the *.features.csv output file for informational purposes but will not use this data in the scoring or filtering of the fusion candidates. The SV events must be run in somatic mode (for more information see pipeline). The following is an example command line for running an end-to-end RNA-Seq experiment with a somatic SV VCF file.

Gencode Human Release 32
DRAGEN Amplicon Pipeline
Illumina Annotation Engine
DRAGEN Amplicon Pipeline
DRAGEN Structural Variant Calling
DRAGEN Amplicon Pipeline