Only this pageAll pages
Powered by GitBook
Couldn't generate the PDF for 225 pages, generation stopped at 100.
Extend with 50 more pages.
1 of 100

DRAGEN v4.4

Overview

Loading...

Loading...

Loading...

Product Guide

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Deployment Options

DRAGEN analysis is available on multiple platforms.

Platform
Description

DRAGEN on-premises server

DRAGEN on-premises server offers highly accurate secondary analysis in a fraction of time compared with a traditional CPU-based system. - Analyze and store data locally - Supports varying levels of command line interface - Replace up to 30 traditional compute instances - Fully process a 34× whole human genome in ~30 minutes. (1) - One unit supports two NovaSeq 6000 Systems running at full capacity

DRAGEN analysis on Illumina Connected Analytics

Couples the accuracy and speed of the DRAGEN with the ability to customize analysis pipeline to operationalize informatics on a secure platform.

DRAGEN on BaseSpace Sequence Hub (BSSH)

Push button analysis capability in an intuitive, easy-to-use interface with compliance, and storage features of BaseSpace Sequence Hub and Amazon Web Services (AWS).

DRAGEN onboard NovaSeq X Series

- Flexibly runs multiple secondary analysis pipelines in parallel. - Performs up to four simultaneous applications per flow cell in a single run. - Brings up to 5x lossless data compression, and analysis with supported applications - Provides savings on analysis, which over five years can exceed the price of the sequencer

DRAGEN onboard NextSeq 1000 and NextSeq 2000 Systems

(1) HG002 from PrecisionFDA truth challenge V2 run with DRAGEN analysis v4.0 on DRAGEN server v4, all callers

(2) When run according to sample recommendations

Illumina® DRAGEN™ Secondary Analysis

Illumina DRAGEN (Dynamic Read Analysis for GENomics) secondary analysis was developed to address important challenges associated with analyzing NGS (Next Generation Sequencing) data for a range of applications, including genome, exome, transcriptome, and methylome studies. DRAGEN secondary analysis processes NGS data and enables tertiary analysis to drive insights. The available tools make up a highly accurate, comprehensive, and efficient solution that enables labs of all sizes and disciplines to do more with their genomic data.

Product highlights

Accurate results:

  • Pangenome reference genome and machine learning drive unprecedented accuracy

  • 99.89% accuracy score with the Precision FDA Truth Challenge V2 benchmark data (2,3)

Comprehensive platform:

  • Analyze NGS data from whole genomes, exomes, methylomes, and transcriptomes

  • Available on platform of choice and scalable based on needs

Efficient analysis:

  • Process a 34x genome in ~ 30 minutes, with all supported callers with DRAGEN server v4 (1)

  • Reduce FASTQ file sizes up to 5x with DRAGEN ORA Compression

References:

  1. Illumina data on file, 2022.

  2. Illumina DRAGEN Secondary Analysis is the first single platform to achieve 99.89% accuracy based on . Details here . Accessed March 22, 2023

  3. PrecisionFDA Truth Challenge V2: Calling Variants from Short and Long Reads in Difficult-to-Map Regions. . Accessed November 3, 2020.

DRAGEN Applications

Applications

DRAGEN analysis offers a large selection of application pipelines.

Pipeline
Description
Variant Types Detected
Metrics Provided

- Provides access to select DRAGEN analysis informatics pipelines - Enables users to generate results in as little as two hours - Uses intuitive pipeline algorithms to reduce reliance on external informatics experts

DRAGEN onboard MiSeq i100 Series

Intuitive, ultra-rapid analysis including DRAGEN BCL convert, DRAGEN Library QC, DRAGEN small WGS and DRAGEN Microbial Enrichment Plus. - Rapid results with comprehensive secondary analysis generated in two hours or less (2) - Highly efficient workflow with a single user touchpoint to VCF and/or html report and no intermediate file transfers - Exceptionally easy with an intuitive interface for non-expert users

DRAGEN on AWS, Azure

DRAGEN supports the FPGA enabled instance types of AWS, Azure. Rpm installers and the Kernel driver can be installed on images managed by the user, and DRAGEN can be run by purchasing a license.

DRAGEN on AWS and Azure Marketplace

Pre-configured Amazon Machine Images (AMI) and Azure Virtual Machines with DRAGEN installed can be accessed from the respective marketplace offerings in a Pay-As-You-Use model.

DRAGEN on GCP

DRAGEN is made available on the Google Cloud Platform. Pre-configured instances with DRAGEN installed can be accessed through the GCP application interface. Limited availability. Please reach out to your Illumina representative for access.

PrecisionFDA v2 Truth Challenge Benchmark Data
DRAGEN sets new standard for data accuracy in PrecisionFDA benchmark data
precision.fda.gov/challenges/10

DRAGEN v4.4

N/A

N/A

DRAGEN ORA Compression

DRAGEN ORA compression is optimized for high compression ratios of FASTQ files, as well as rapid compression and decompression, all while preserving data integrity.

N/A

Compression Ratio Run Time

DRAGEN Map + Align

The DRAGEN Map + Align can be run as a standalone or as part of DRAGEN’s suite of pipelines

N/A

Mapping metrics Duration Metrics Coverage Metrics

DRAGEN Germline

The DRAGEN Germline Pipeline provides end-to-end NGS analysis, including advanced error model calibration for increased accuracy, and repeat expansion detection and genotyping through Illumina Expansion Hunter.

SNV/Indel CNV SV Repeat Expansions

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN Somatic

The DRAGEN Somatic Pipeline includes tumor-only and tumor–normal modes, designed for detecting somatic variants in tumor samples. Both modes make no ploidy assumptions, enabling detection of low-frequency alleles.

SNV/Indel CNV SV TMB MSI HLA

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN Enrichment

The DRAGEN Enrichment Pipeline combines DRAGEN’s germline and somatic callers into a pipeline designed specifically for analyzing enrichment samples. Includes a full suite of enrichment metrics and reporting.

SNV/Indel CNV SV

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN RNA

The DRAGEN RNA Pipeline performs transcriptome analysis starting with splice junction discovery and alignment, followed by rapid alignment and splice junction mapping and quantification. For differential expression, Illumina recommends the DRAGEN Differential Expression app on BaseSpace Sequence Hub.

Gene fusion SNV/Indel

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN Single Cell RNA

The DRAGEN Single Cell RNA pipeline performs demultiplexing, cell-barcode and UMI error correction, sequence alignment, and quantification of gene expression.

N/A

Mapping Metrics Duration Metrics Coverage Metrics Callability Report Cell Metrics

DRAGEN Joint Genotyping

The DRAGEN Joint Genotyping/Population Pipeline calls variants jointly across multiple genomes and scales to large cohorts of samples at expedited speeds with uncompromising accuracy.

SNV/Indel CNV SV Repeat Expansions

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN Methylation

The DRAGEN Methylation Pipeline performs alignment, methyl calling, and calculates alignment and methylation metrics.

N/A

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN Reference Builder

Accepts FASTA files, and builds the proprietary reference used by the DRAGEN apps.

N/A

N/A

DRAGEN TruSight Oncology 500 ctDNA Analysis Software

Secondary analysis support for Illumina’s TruSight Oncology 500 ctDNA. Available on the local DRAGEN Server version 3 and later.

SNV/Indel CNV DNA fusions MSI TMB

Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report

DRAGEN Imputation

The DRAGEN Imputation pipeline is an end to end user friendly tool that enables scalable low pass whole genome sequencing analysis

N/A

Impute ≤100 samples simultaneously 1.7x faster compared to original GLIMPSE code

Analysis Uses

DRAGEN analysis can be used in numerous fields in the biological sciences.

Analysis
Description

Genetic Diseases

Reduce time required for genomic analysis, with high accuracy and comprehensiveness

Oncology

Analyze tumor-only and tumor/normal samples with accuracy, comprehensiveness, and efficiency

Cell and Molecular Biology

Advance understanding of cellular mechanisms with rapid analysis pipelines for bulk and single cell samples

Population Genomics

Accurately and efficiently analyze sequenced genomes at scale. Accelerate re-analysis as computational tools improve over time

Infectious Disease

Detect and characterize infectious diseases with a comprehensive solution

Agrigenomics

Efficiently analyze animals and plants of varying genomic complexities with custom reference

DRAGEN Demultiplexing

Rapid demultiplexing of NGS analysis

Clinical Research Workflows

DRAGEN v4.4 introduces support for DRAGEN server apps. These apps, comprised of Docker images, Nextflow workflows, a CLI shell script, and packaged resource bundles, can be downloaded and installed on the on-premises server. The packaged resource bundles include all the resource files required to run the application, such as the hash table(s), various noise baseline files, bed files.

Server apps make it easy to run complex workflows such as Tumor Normal somatic analysis by simplifying the management of external resources and applying the correct command line parameters for the selected analysis type. The DRAGEN server can support multiple installed server apps and DRAGEN on-prem for command line use at the same time.

Run Planning

We recommend using the BSSH Run Planner tool to minimize errors in creating the sample sheet. For instruments such as NovaSeq X, sample sheets created in the BSSH Run Planer tool is automatically downloaded into the run folder on the instrument.

Local Specific Output

Local output management

  • 📂 Work — (DRAGEN server only) - Contains information and files related to Nextflow execution

  • On DRAGEN server, Nextflow logs are contained in the Work folder

Local Specific Output

Local output management

Common output files for cloud and local pipelines are described in the Analysis Output.

On DRAGEN server, Nextflow logs are contained in the Work folder in a hierarchical folder structure organized by the tasks in the pipeline_trace.txt. These files are prefixed with "." and hidden from normal view.

  • 📂 Work — (DRAGEN server only) - Contains information and files related to Nextflow execution

    • 📄 .command.log - Contains Nextflow pipeline step execution log.

    • 📄 .command.out - Contains Nextflow pipeline step standard output log.

    • 📄 .command.err - Contains Nextflow pipeline step standard error log.

    • 📄 .exit.code - Contains Nextflow pipeline step execution exit code.

Templates

The pipeline only supports starting from FASTQ, BAM or CRAM in the current release. The sample sheet below only contains the minimally required sections for starting the analysis. It is not a valid sample sheet for other purposes.

Users can visit the Sample Sheet guidelines section to learn additional details on required fields and values as they fill-in their sample information, or download a template from Sample Sheet Template.

[Header],,,,,,,,,,
FileFormatVersion,2,,,,,,,,,
RunName,DRAGEN TN Start From FASTQ Only,,,,,,,,,
InstrumentType,NovaSeq,,,,,,,,,
InstrumentPlatform,NovaSeq,,,,,,,,,

[TN_Data],,,,,,,,,,
Sample_ID,Specimen_Type,Sample_Type,Case_ID,Sample_Description,Sample_Classification
tumorSample,FFPE,DNA,SampleA,Description1,Tumor
normalSample,FFPE,DNA,SampleA,Description2,Normal

DRAGEN Heme WGS Tumor Only Pipeline

Overview

DRAGEN Heme WGS Tumor Only Pipeline, henceforth referred as the Heme Pipeline, is a comprehensive and unbiased whole genome sequencing solution to replace conventional cytogenetic and panel sequencing approaches for detecting all types of mutation using a limited amount of DNA. It can be applied to detect clinically actionable mutations for cancer spanning a wide range of genomic events, e.g., structural variants (SV), Copy Number Alterations (CNA), small variants (SNV/insertion/deletion/delins) and internal tandem duplications (ITD) and DUX4 variants using Heme samples.

The Heme pipeline includes a DNA-only workflow designed to analyze whole genome sequencing data generated on supported instruments. It may be run as a local off-instrument solution installable on a DRAGEN server or accessible through the Illumina Connected Analytics (ICA) cloud environment. The Heme pipeline is for Research Use Only (RUO).

Troubleshooting

Help

Support Request

For debugging or support request, please include the files from the top level of the analysis output folder, the work directory and the errors directory content, in addition to the MetricsOutput.tsv from the Results folder.

Common Product Features

Run Planning

Application
Supported in BSSH Run Planner
Note

ICA Cloud App

Analysis in the ICA Cloud environment

Prerequisites

Illumina Connected Insights Local

The Illumina Connected Insights (ICI) Local platform can be used to interpret and visualize analysis results from a clinical research workflow pipeline on a local DRAGEN server. See .

Advanced Topics

CRAM input

When CRAM is used as input, the reference genome used to generate the CRAM files is required. This may be provided using the .

ICA Specific Output

Output Folder

  • This section only describes output files specific to ICA. are described in the common output files.

  • Nextflow output folders differ across platforms.

Advanced Topics

Overview

The pipeline supports advanced use cases:

  • Selected custom parameters may be configured using a configuration file, and associated custom pipeline resource files.

DRAGEN DNA Pipeline

The DRAGEN DNA Pipeline accelerates the secondary analysis of NGS data by harnessing the tremendous power available on the DRAGEN Platform. The pipeline includes highly optimized algorithms for mapping, aligning, sorting, duplicate marking, and haplotype variant calling. In addition to haplotype variant calling, the pipeline supports calling of copy number and structural variants as well as detection of repeat expansions and targeted calls.

Troubleshooting

Help

stopping analysis

Pressing Ctrl+C during a DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.

Stopping Analysis

Pressing Ctrl+C during a DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.

Using a Non-root User

CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.

Using ssh

When running the analysis software using SSH, Illumina recommends using additional software to prevent unexpected termination of analysis. Illumina recommends screen and tmux.

Using DAM

The Heme pipeline depends on the DRAGEN Application Manager (DAM). For issues related to the DRAGEN Application Manager installation, refer to the DRAGEN Application Manager Installation Guide.

  • Ensure DRAGEN App Manager is running properly.

Using docker

  • Ensure Docker is running properly. For docker configuration help, please check the DRAGEN Application Manager installation guide and docker.org documentation.

Limitations

CIFS Support

  • In CIFS (SMB 1.0), the mounted volume may have a permission check issue and cause the Nextflow workflow to exit prematurely when a non-root user account is used for analysis, unless the filesystem permission check is disabled. The workaround is to use newer SMB protocols and configure Windows Active Directory for analysis with non-root users.

Using multiple FASTQ files for increased coverage (top-off)

  • To increase the coverage of a sample using multiple FASTQ files, the FASTQ files must follow the Illumina naming convention. The current limit is up to 16 FASTQ files from 8 lanes based on available flow cell types.

  • If there are more than 16 FASTQ files, then use cat or other command line utility to concatenate the FASTQ files as a single FASTQ file to get around the file number restriction.

Basic ICA Subscription

  • Basic ICI Subscription (if desired)

  • Additional information is available from the ICA support site.

    The pipeline supports automatic data streaming from instruments and automatic launch of analysis in ICA, followed by tertiary interpretation in ICI.
  • The pipeline supports mix flow cells where different assays are sequenced in the same flow cell.

  • using a non-root user

    CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.

    using ssh

    When running the analysis software using SSH, Illumina recommends using additional software to prevent unexpected termination of analysis. Illumina recommends screen and tmux.

    using DAM

    The pipeline depends on the DRAGEN Application Manager. For issues related to the DRAGEN Application Manager installation, refer to the DRAGEN Application Manager installation guide.

    • Ensure DRAGEN App Manager is running properly.

    using docker

    • Ensure Docker is running properly. For docker configuration help, please check the DRAGEN Application Manager installation guide and docker.org documentation.

    Variant Interpretation on-premises via a local DRAGEN server
    custom configuration file

    📂 Heme_Nextflow_logs—(ICA only) - Contains information related to the execution of the pipeline as a whole and for specific nodes (when an analysis is split across multiple nodes). It contains files used to execute parts of the workflow on different nodes as well as records of the Nextflow execution on those nodes.

  • Nextflow output folders differ across platforms.

  • Standard output files
    Features
    • Superb performance based on the DRAGEN BioIT platform Release 4.4.4

    • Supports starting the analysis from BCL, FASTQ, BAM or CRAM as inputs.

    • Flexible custom configurable options on top of well established DRAGEN recipes for Heme WGS analysis.

    • Available on local DRAGEN servers and Illumina Connected Analytics (ICA)

    • Seamless integration with Illumina Connected Insights (ICI) for tertiary interpretation

    Supported Library Prep Kits (LPKs)

    • Illumina DNA PCR Free Prep Kit

    • Illumina DNA Prep Kit

    • Custom LPKs

    Supported Sequencing Instruments

    • NovaSeq 6000 or 6000Dx in RUO mode

    • NovaSeq X or NovaSeq X plus

    Note Unsupported instruments can still be analyzed, but a warning will be generated.

    Supported FLow Cells

    • NovaSeq 6000 or 6000Dx S4

    • NovaSeq X or NovaSeq X plus 10B, 25B

    Figure 1. DRAGEN Heme WGS Tumor Only Workflow

    Advanced Topics

    Advanced Use Cases

    • User may be able to rerun a completed analysis using the "rerun" option in ICA.

    • User may be able to use the icav2 client to complete any analysis performed throught the UI.

    No

    Only Header and TN_Data sections required in SampleSheet.csv

    Sample Sheets

    All the clinical research workflow pipelines support only v2 sample sheet and requires index2 to be in forward orientation, with bcl-convert SoftwareVersion >= 4.4. The pipelines are stil compatible with legacy sample sheets where the BCLConvert_Settings section has SoftwareVersion < 4.4. Sample sheet v1 is no longer supported.

    ICA Cloud Applications

    The clinical research workflow pipelines in the ICA cloud support post-processing scripts to be executed after the completion of the pipeline analysis.

    ICI Variant Interpretation

    The clinical research workflow pipelines support automatic data ingestion or manual upload into ICI for variant interpretation after the analysis is completed in ICA, or using a DRAGEN on-premises server locally.

    Heme WGS TO

    Yes

    Mixed flow cell, auto-launch

    Data Management

    Copying data to local /staging drive

    • Copy the run or FASTQ folder to the DRAGEN server into the staging folder with the following recommended organization: /staging/runs/{RunID}. You can copy the run folder onto the DRAGEN server using Linux commands such as rsync. The sample sheet within the run folder is used unless otherwise specified through the command line.

    • Run folder must be intact.

    • If the analysis output folder path is different from the default, provide the analysis output folder path.

    Analysis output directory

    Before running the analysis, confirm that the output directory for the software to write to is empty and does not include results of previous analyses.

    Storage Requirements

    The DRAGEN server provides an NVMe SSD in the /staging directory to use as the software output directory. Network-attached storage is required for long-term storage.

    When running the Heme pipeline, use the default settings or set the -analysisFolder command line option to a directory in /staging to make sure the DRAGEN server processes read and write data on the NVMe SSD.

    Before beginning analysis, develop a strategy to copy data from the DRAGEN server to a network‑attached storage. Delete output data on the DRAGEN server as soon as possible.

    The following are the run folder output size estimates and the minimum free space requirements for fastq.gz or fastq.ora output format.

    Sequencing System
    Run Folder Output (Gb)
    Minimum Disk Space .gz (Gb)
    Minimum Disk Space .ora (Gb)

    When launching the analysis, the software checks that the minimum disk space required is available. If the minimum disk space is not available, the software shows an error message and prevents analysis from starting. If disk space is exhausted during a run, the run shows an error and stops analyzing.

    Moving or modifying files during an analysis may cause the analysis to fail or provide incorrect results.

    Data streaming from Network Filesystem

    Analysis of data stored on network file system may be slow when there are multiple DRAGEN servers reading and writing to the network file system simultaneously. However, it is advisable to use a network filesystem to stream large datasets from NFS when data transfer to local /staging is taking a significant amount of time, especially for NovaSeq X 25B flow cells. Discuss with your system administrator for of the DRAGEN server.

    Custom Config Support

    This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.

    ICA Setup

    On the ICA (Illumina Connected Analytics) user interface (UI) to the software, you can specify the Custom Parameters Config File and Custom Resources Directory directly. Supported customizable options are described below.

    Examples

    heme_custom_param.config Content

    custom_resources_Heme_dir Folder Structure on ICA

    ℹ️ Note: Custom resource files and the custom configuration file must be uploaded to the same ICA project where the run is created. You can use the icav2 client or other supported methods. See for details.

    ICA Input Files UI Example

    Custom Config Support

    This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.

    ICA Setup

    On the ICA (Illumina Connected Analytics) user interface (UI) to the software, you can specify the Custom Parameters Config File and Custom Resources Directory directly. Supported customizable options are described below.

    Examples

    solid_custom_param.config Content

    custom_resources_Heme_dir Folder Structure on ICA

    ℹ️ Note: Custom resource files and the custom configuration file must be uploaded to the same ICA project where the run is created. You can use the icav2 client or other supported methods. See for details.

    ICA Input Files UI Example

    Launching Analysis

    Overview

    Run on DRAGEN Server

    The DRAGEN Heme WGS Tumor Only Pipeline is launched with the bash script called run_Heme_WGS_TO_{version}.sh, which is installed in the /usr/local/bin directory. The bash script is executed on the command line and runs the software using DRAGEN Application Manager. For a full list of command-line options, refer to .

    Getting Started

    To launch an analysis, you must provide the --inputType and --inputFolder arguments. The --inputType argument can be bcl, fastq, bam, or cram. When starting from a sequencing system run folder containing BCL files, --inputType must be bcl and --inputFolder is the absolute path to the full run folder. When starting from FASTQ, BAM, or CRAM files --inputFolder may also be a comma separated list of folders. If more than one input folder is specified, the --sampleSheet argument must also be provided with the absolute path to a valid Sample Sheet (refer to

    Analysis output is written to /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}_Analysis_{datetimestamp} by default. To write to a different output directory, run the bash script with --analysisFolder <FULL_PATH_TO_ANALYSIS_FOLDER>.

    The --demultiplexOnly flag runs the pipeline through FASTQ Generation only, and these outputs can be used for splitting a run into smaller batch analyses with --inputType fastq and the --sampleIDs argument.

    Variant Interpretation with Illumina Connected Insights

    The Illumina Connected Insights (ICI) platform can be used to interpret and visualize analysis results from the Heme pipeline. Analysis results can be provided to ICI via a manual upload for local analyses and via auto-ingestion for Illuminia Connected Analytics (ICA) analyses.

    Automatic Ingestion of Heme Analysis on ICA to ICI

    • Access to Illumina Connected Analytics

    • Access to Illumina Connected Insights

    Refer to the ICI support site page for information on

    Launching Analysis

    Overview

    Run on DRAGEN Server

    The DRAGEN Heme WGS Tumor Only Pipeline is launched with the bash script called run_Solid_WGS_TN_{version}.sh, which is installed in the /usr/local/bin directory. The bash script is executed on the command line and runs the software using DRAGEN Application Manager. For a full list of command-line options, refer to .

    Getting Started

    To launch an analysis, you must provide the --inputType and --inputFolder arguments. The --inputType argument can be fastq, bam, or cram. The --inputFolder may be the absolute path to the input folder or it may be a comma separated list of path. If more than one input folder is specified, the --sampleSheet argument must also be provided with the absolute path to a valid Sample Sheet (refer to ). If the --sampleSheet argument is not provided, the software checks for a file named SampleSheet.csv in the input folder.

    Analysis output is written to /staging/DRAGEN_Solid_WGS_Tumor_Normal_Pipeline_{version}_Analysis_{datetimestamp} by default. To write to a different output directory, run the bash script with --analysisFolder <FULL_PATH_TO_ANALYSIS_FOLDER>.

    DRAGEN Server App

    Analysis on DRAGEN Server

    Prerequisites

    Quick Start

    Quick Start Guide

    Table 1. Release Information

    Execution Environment

    Templates

    Description

    Sample Sheet templates for the Heme pipeline for standalone DRAGEN server and ICA manual launch analysis can be found in the table below. For auto-launch compatible sample sheets, use BaseSpace Run Planner.

    The Heme pipeline is compatible with several instruments and assay workflows (standard, XP), each of which have implications for the sample sheet.

    Quick Start

    Quick Start Guide

    Table 1. Release Information

    Execution Environment

    DRAGEN Solid WGS Tumor Normal Pipeline

    Overview

    DRAGEN Solid WGS Tumor Normal Pipeline, henceforth referred as the Solid WGS TN Pipeline, is a comprehensive and unbiased whole genome sequencing solution for detection of all types of mutation in matched tumor and normal samples. It can be applied to detect clinically actionable mutations for cancer spanning a wide range of genomic events, e.g., structural variants (SV), copy number alterations (CNA), small variants (SNV/insertion/deletion/delins).

    The Solid WGS TN pipeline includes a DNA-only workflow designed to analyze whole genome sequencing data generated on supported instruments. It may be run as a local off-instrument solution installable on a DRAGEN server or accessible through the Illumina Connected Analytics (ICA) cloud environment. The Solid WGS TN pipeline is for Research Use Only (RUO).

    Sample Sheet Requirements

    The TN pipeline may contain additional user defined fields such as Sex, Tumor Type or Case ID for use with variant interpretation in ICI.

    Standard Sample Sheet Requirements

    The following sample sheet requirements describe required and optional fields for TN pipeline. It must contain fhe follwing sections.

    The analysis fails if the sample sheet requirements are not met.

    Advanced Topics

    Demultiplex only option

    In order to break up the workflow, one may wish to run the software with the demux only option. The pipeline will perform FASTQ generation with the settings provided by default or as specified in the sample sheet. Then the subsequent analysis may start from FASTQ.

    CRAM input

    Advanced Topics

    The pipeline may be downloaded and installed on a local DRAGEN server. A download utility may be obtained from the Illumina download site, and the download utility will manage all the dependencies. Once the required installers are downloaded, the software may be installed by running the installers.

    Using NFS for data streaming

    With the NovaSeq X 25B flow cells, the amount of data is on the order of terabytes, which may take a few hours or more to copy to the /staging folder on the local DRAGEN server. Using NFS storage directly for input and output is recommended in this case.

    Custom Config Support

    BSSH Run Planner Setup

    On the BSSH Run Planner, custom parameters and custom resource files can also be specified during Run Planning.

    Custom resource files must be uploaded to BaseSpace under the same project to be selectable during run planning. Supported customizable options are described in the Custom Configuration Support section of each application.

    See for additional details.

    When CRAM is used as input, the reference genome used to generate the CRAM files is required. This may be provided using the custom configuration file
    Variant Interpretation

    ICI supports variant interpretation with advance visualization capabilities. It is available in the cloud or on a local DRAGEN server.

    setting up the data upload from ICA or BSSH
    Command-Line Options
    Sample Sheet Requirements
    Solid WGS TN
    Features
    • Superb performance based on the DRAGEN BioIT platform Release 4.4.4

    • Supports starting the analysis from FASTQ (.gz or .ora format), BAM or CRAM as inputs

    • Flexible custom configurable options on top of well established DRAGEN recipes for Solid WGS TN analysis.

    • Available on local DRAGEN servers and Illumina Connected Analytics (ICA)

    • Seamless integration with Illumina Connected Insights (ICI) for tertiary interpretation

    Supported Library Prep Kits (LPKs)

    No specific requirements on LPKs since the pipeline does not support starting from BCL in the curent release.

    Supported Sequencing Instruments

    No specific requirements on instruments since the pipeline does not support starting from BCL in the curent release.

    Solid Tumor Normal Pipeline Workflow

    5300

    Other Instruments

    ~2000

    4000

    2500

    NovaSeq 6000/6000Dx (RUO) S4 Flow Cell

    ~2000

    4000

    2500

    NovaSeq X 10B

    ~2000

    4000

    2500

    NovaSeq X 25B

    ~4250

    Network Considerations

    8500

    ). If the --sampleSheet argument is not provided, the software checks for a file named SampleSheet.csv in the input folder.
    Command-Line Options
    Sample Sheet Requirements
    Data Transfer Options with ICA Platform
    ICA Setup Screenshot
    Data Transfer Options with ICA Platform
    ICA Setup Screenshot
    ## custom parameters
    vc_output_evidence_bam = false
    qc_detect_contamination = true
    aligner_clip_pe_overhang = 0
    
    ## custom reference files
    vc_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
    sv_systematic_noise = '/sv/WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz'
    vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'
    custom_resources_Solid/
    ├── snv
    │   ├── WGS_Solid_hg38_v1.0_systematic_noise.snv.bed.gz
    │   └── somatic_hotspots_GRCh38.vcf.gz
    └── sv
        └── WGS_Solid_FF_solid_hg38_v1.0_systematic_noise.sv.bedpe.gz
    ## custom parameters
    somatic_vc_output_evidence_bam = false
    germline_qc_detect_contamination = true
    germline_aligner_clip_pe_overhang = 0
    
    ## custom reference files
    somatic_sv_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
    somatic_sv_systematic_noise = '/sv/WGS_FF_solid_hg38_v1.0_systematic_noise.sv.bedpe.gz'
    somatic_vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'
    custom_resources_Solid/
    ├── snv
    │   ├── WGS_Solid_hg38_v1.0_systematic_noise.snv.bed.gz
    │   └── somatic_hotspots_GRCh38.vcf.gz
    └── sv
        └── WGS_Solid_FF_solid_hg38_v1.0_systematic_noise.sv.bedpe.gz

    DRAGEN Phase 3 or 4 server

  • DRAGEN License

  • Network storage server

  • DRAGEN server

    DRAGEN phase 4 server is recommended especially for datasets from NovaSeq X instruments. The server has 12 TB of intermediate data storage space for full processing of a NovaSeq X 25B flow cell.

    The DRAGEN phase 3 server has 6 TB of intermediate data storage space, which can accommodate for flow cells from the NovaSeq 6000 or 6000 Dx instruments.

    DRAGEN license

    The Heme pipeline uses the standard DRAGEN license without requiring any special licenses.

    NFS and CIFS file servers

    The Heme pipeline is designed to stream data from a network file server onto the DRAGEN server, complete the analysis using the /staging area of the high performance SSD and then stream the analysis output back to the network file server.

    The network file server may be mounted to the DRAGEN server using the NFS or CIFS protocol (SMB 1.0). SMB 2.0 or higher is recommended with Active Directory support if the SMB protocol is used.

    Starting from BCL Files

    If starting from BCL (*.bcl) files, the Heme pipeline requires the run folder to contain certain files and folders.

    The run folder contains data from the sequencing run, make sure that the folder contains the following files:

    Folder/File
    Description

    Config folder

    Configuration files

    Data folder

    *.bcl files

    Images folder

    [Optional] Raw sequencing image files.

    Interop folder

    Interop metric files.

    Logs folder

    [Optional] Sequencing system log files.

    RTALogs folder

    Real-Time Analysis (RTA) log files.

    Starting from FASTQ Files

    The following inputs are required for running the using FASTQ (*.fastq) files.

    • Full path to an existing FASTQ folder.

    • The FASTQ folder structure conforms to the folder structure in FASTQ File Organization..

    • The sample sheet is in the FASTQ folder path, or you can set the path to the sample sheet with the --sampleSheet override command line option.

    Make sure there is sufficient disk space for the analysis to complete. Refer to the --help command line argument details for disk space requirements.

    Use BCL Convert to produce FASTQ files for the Heme pipeline. Using bcl2fastq does not produce the same results and is discouraged.

    FASTQ File Organization

    Store FASTQ files in individual subfolders that correspond to a specific Sample_ID. Keep file pairs together in the same folder. Alternatively, store the FASTQ files in one flat folder structure where the FASTQ files are stored in one folder.

    The Heme pipeline requires separate FASTQ files per sample. Do not merge FASTQ files.

    The instrument generates two FASTQ files per flow cell lane, so that there are eight FASTQ files per sample.

    Sample1_S1_L001_R1_001.fastq.gz

    • Sample1 represents the Sample ID.

    • The S in S1 means sample, and the 1 in S1 is based on the order of samples in the sample sheet, so S1 is the first sample.

    • L001 represents the flow cell lane number.

    • The R in R1 means Read, so R1 refers to Read 1.

    software version
    Client program
    location
    Note

    Local Dragen Server

    4.4.4.62

    run_Heme_WGS_TO_{version}.sh

    /usr/local/bin

    See

    ICA

    a11697ba-1144-4dc6-9e22-f21dff29f747

    icav2

    ICA Pipelines

    See

    ICA

    urn:ilmn:ica:pipeline:a11697ba-1144-4dc6-9e22-f21dff29f747#Heme_WGS_TO_v4_4_4_62

    supported browser

    ICA UI

    See

    • {version} is used to represent the software version number in Table 1 above. Similarly, <pipeline_run_script> is used to indicate the client program name in this document.

    Download, Install and Execute on a Local Server

    The software may be downloaded and installed by following the installation guide. It may be executed using a local DRAGEN server or on a local computer which launches the analysis in the ICA cloud environment.

    Run analysis on a local DRAGEN Server

    The command line program may be used to launch an analysis by using the <pipeline_run_script> with the appropriate options.

    start from bcl

    start from one or more input folders when using FASTQ, BAM or CRAM files

    Multiple folders may be specified as input folders in comma separated values when using FASTQ, BAM or CRAM files as input.

    Pressing Ctrl+C during a DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.

    Run analysis on ICA using the icav2 client

    Here is an example of starting an analysis using the ICA client by providing the necessary command parameters and specify a particuar storage size for analysis in ICA.

    Run analysis on ICA using UI

    The same analysis example above may be completed using the ICA UI by logging into the appropriate domain of your company and project where the Heme pipeline is set up.

    Find more information in the ICA Cloud App Launch Guide.

    Turn Around Time Comparison

    ICA

    Coming soon.

    Local Server

    Local Server Only

    Coming soon.

    Data Streaming from NFS

    Coming soon.

    Templates

    Sample sheet templates contain all required fields, including index sequences in the proper orientation for all indexes from a given library prep kit. The templates are provided as a starting point for creating a sample sheet manually when launching analysis on a standalone DRAGEN server or on ICA using manual launch.

    For interactive run planning or to create a sample sheet for ICA Autolaunch, use BaseSpace Run Planner to create valid sample sheets for either local or cloud analysis. To set up a run in BaseSpace run planner, refer to Sample Sheet Creation in BaseSpace Run Planner.

    Users can visit the Sample Sheet guidelines section to learn additional details on required fields and values as they fill-in their sample information. Use the lookup table below to select and download the sample sheet template that matches your instrument, assay, and workflow configuration:

    Instrument
    Workflow
    File

    NovaSeq 6000Dx (RUO)

    Standard or XP

    -

    NovaSeq 6000

    -

    NovaSeq X

    Standard or XP

    -

    -

    *Lane numbers cannot exceed what is supported by the flow cell in use.

    software version
    Client program
    location
    Note

    Local Dragen Server

    4.4.4.53

    run_Solid_WGS_TN_{version}.sh

    /usr/local/bin

    See

    ICA

    c18e9e69-0a74-4c43-a419-a62cb7c6abc0

    icav2

    ICA Pipelines

    See

    ICA

    urn:ilmn:ica:pipeline:c18e9e69-0a74-4c43-a419-a62cb7c6abc0#Solid_WGS_TN_v4_4_4_53

    supported browser

    ICA UI

    See

    • {version} is used to represent the software version number in Table 1 above. Similarly, <pipeline_run_script> is used to indicate the client program name in this document.

    Download, Install and Execute on a Local Server

    The software may be downloaded and installed by following the installation guide. It may be executed using a local DRAGEN server or on a local computer which launches the analysis in the ICA cloud environment.

    Run analysis on a local DRAGEN Server

    The command line program may be used to launch an analysis by using the ${CLI_program} with the appropriate options.

    start from one or more input folders when using FASTQ, BAM or CRAM files

    Multiple folders may be specified as input folders in comma separated values when using FASTQ, BAM or CRAM files as input.

    Pressing Ctrl+C during a Solid_WGS_TN_DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.

    Run analysis on ICA using the icav2 client

    Here is an example of starting an analysis using the ICA client by providing the necessary command parameters and specify a particuar storage size for analysis in ICA.

    Run analysis on ICA using UI

    The same analysis example above may be completed using the ICA UI by logging into the appropriate domain of your company and project where the pipeline is set up.

    Find more information in the ICA Cloud App Launch Guide.

    [Header] Section
    Parameter
    Required
    Details

    FileFormatVersion

    2

    v2 sample sheet format

    [TN_Data] Section

    Sample Parameter
    Required
    Details

    Sample_ID

    Required

    The unique ID to identify a sample. The sample ID is included in the output file names. Sample IDs are not case sensitive. Sample IDs must have the following characteristics: - Unique for the run. - 1–70 characters. - No spaces. - Alphanumeric characters with underscores and dashes. If you use an underscore or dash, enter an alphanumeric character before and after the underscore or dash. eg, Sample1-T5B1_022515. - Cannot be called all, default, none, unknown, undetermined, stats, or reports. - Must match a Sample_ID listed in the [BCLConvert_Data] section. Each sample must have a unique combination of Lane (if applicable), sample ID, and index ID or the analysis will fail.

    Case_ID

    Required

    A unique ID that links the same biological samples from the same individual. It is used for variant interpretation in downstream software such as the Illumina Connected Insights software

    Sample_Type

    Required

    Possible value is DNA.

    Sample_Classification

    Required

    To ensure a successful analysis, follow these guidelines:

    1. Avoid any blank lines at the end of the sample sheet; these can cause the analysis to fail.

    2. When running local analysis using the command line save the sample sheet in the sequencing run folder with the default name SampleSheet.csv, or choose a different name and specify the path in the command-line options.

    BSSH Run Planner UI Example
    BSSH Setup Screenshot
    Uploading Reference Files to BaseSpace

    Command Line Options

    Overview

    Command line options

    For command-line options, refer to Table 1 (below) for details.

    Table 1: Shell Script Command-Line Options

    CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.

    Argument>
    Required
    Description

    Command Line Options

    Overview

    Command line options

    For command-line options, refer to Table 1: Shell Script Command-Line Options for details.

    Table 1: Shell Script Command-Line Options

    CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.

    Argument>
    Required
    Description

    DRAGEN Secondary Analysis

    The DRAGEN secondary analysis software utilizes a highly reconfigurable Field Programmable Gate Array (FPGA) card and is available on a preconfigured DRAGEN server that can be seamlessly integrated into bioinformatics workflows. The platform can be loaded with highly optimized algorithms for many different NGS secondary analysis pipelines, including the following:

    • Whole genome

    • Exome

    • RNA-Seq

    • Methylome

    • Cancer

    All user interaction is accomplished via DRAGEN software that runs on the host server and manages all communication with the FPGA card. This user guide summarizes the technical aspects of the system and provides detailed information for all DRAGEN command line options. If you are working with DRAGEN for the first time, Illumina recommends that you first read the Getting Started section, which provides a short introduction to DRAGEN, including running a test of the server, generating a reference genome, and running example commands.

    DNA Pipeline

    DRAGEN DNA Pipeline

    The DRAGEN DNA Pipeline massively accelerates the secondary analysis of NGS data. For example, the time taken to process an entire human genome at 30x coverage is reduced from approximately 10 hours (using the current industry standard, BWA-MEM+GATK-HC software) to approximately 20 minutes. Time scales linearly with coverage depth.

    These pipelines harness the tremendous power of the DRAGEN server and include highly optimized algorithms for mapping, aligning, sorting, duplicate marking, and haplotype variant calling. They also use platform features such as hardware-accelerated compression and optimized BCL conversion, together with the full set of platform tools.

    Unlike all other secondary analysis methods, DRAGEN DNA Applications do not reduce accuracy to achieve speed improvements. Accuracy for both SNPs and INDELs is improved over that of BWA-MEM+GATK-HC in side-by-side comparisons.

    In addition to haplotype variant calling, the pipeline supports calling of copy number and structural variants as well as detection of repeat expansions.

    RNA Pipeline

    DRAGEN secondary anaylsis includes an RNA-seq (splicing-aware) aligner, as well as RNA-specific analysis components for gene expression quantification and gene fusion detection.

    The DRAGEN RNA Pipeline shares many components with the DNA Pipeline. Mapping of short seed sequences from RNA-Seq reads is performed similarly to mapping DNA reads. In addition, splice junctions (the joining of noncontiguous exons in RNA transcripts) near the mapped seeds are detected and incorporated into the full read alignments.

    DRAGEN secondary analysis uses hardware accelerated algorithms to map and align RNA-Seq--based reads faster and more accurately than popular software tools. For instance, it can align 100 million paired-end RNA-Seq--based reads in about three minutes. With simulated benchmark RNA-Seq data sets, its splice junction sensitivity and specificity are unsurpassed.

    Methylation Pipeline

    The DRAGEN Methylation Pipeline provides support for automating the processing of bisulfite sequencing data to generate a BAM with the tags required for methylation analysis and reports detailing the locations with methylated cytosines.

    Other scRNA prep

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    ICA Cloud App

    Requirements

    Analysis on ICA requires an account with a valid subscription and a project with the following configuration.

    Manual Launch

    DRAGEN Recipes

    Overview

    The following sub-pages contain recommended command line options for specific DRAGEN pipelines. For an overview of DRAGEN command line parsing, also see

    Germline Pipelines

    ICA Cloud App

    Requirements

    Analysis on ICA requires an account with a valid subscription and a project with the following configuration.

    Manual Launch

    RNA WTS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    <pipeline_run_script> --help # list all supported parameters
    <pipeline_run_script> --inputType bcl \
    --inputFolder /staging/input-folder \
    --analysisFolder /staging/output-folder
    <pipeline_run_script> --inputType <fastq|bam|cram> \
    --inputFolder /staging/input-folder-1,/staging/input-folder-2 \
    --analysisFolder /staging/output-folder
    icav2 projectpipelines start nextflow ${PIPELINE_ID} \
    --project-id ${ANY_PROJECT_ID} \
    --storage-size Large \
    -o json \
    --input ${ANY_SAMPLE_SHEET} \
    --input ${ANY_INPUT_DIR} \
    --parameters inputType:'bcl' \
    --parameters referenceGenome:'hg38' \
    --parameters oraCompressionEnabled:'true' \
    --parameters sampleIds:'1267-Prostate-Del-R1,741-Lung-SNV-R1' \
    --user-reference ${ANY_USER_REFERENCE}
    <pipeline_run_script> --inputType <fastq|bam|cram> \
    --inputFolder /staging/input-folder-1,/staging/input-folder-2 \
    --analysisFolder /staging/output-folder
    icav2 projectpipelines start nextflow ${PIPELINE_ID} \
    --project-id ${ANY_PROJECT_ID} \
    --storage-size Large \
    -o json \
    --input ${ANY_SAMPLE_SHEET} \
    --input ${ANY_INPUT_DIR} \
    --parameters inputType:'fastq' \
    --parameters referenceGenome:'hg38' \
    --parameters sampleIds:'Sample1,Sample2' \
    --user-reference ${ANY_USER_REFERENCE}

    RunInfo.xml file

    Run information.

    RunParameters.xml file

    Run parameters.

    SampleSheet.csv file

    Sample information. If you want to use a sample sheet that is not in the run folder or a sample sheet named something other than SampleSheet.csv, provide the full path.

    Possible values are Tumor or Normal.

    Specimen_Type

    Required

    Possible values are FFPE (Formalin-Fixed, Paraffin-Embedded), FF (Fresh Frozen) for Tumor sample classification. No restrictions on a sample classification of Normal

    Sex

    Optional

    Possible values are Male, Female or Unknown

    Tumor_Type

    Optional

    Support tumor type code based on the SNOMED ontology

    Sample_Description

    Optional

    Free text description for the sample

    Dragen Server
    ICA Cloud
    ICA Cloud

    - Local Mixed Flow Cell

    - Local

    Cloud
    Local
    Cloud Mixed Flow Cell
    Cloud
    Dragen Server
    ICA Cloud
    ICA Cloud

    --sampleOrCaseIDs

    No

    The comma-delimited sample IDs (or CaseID) that are processed by the run. For example, Sample_1,Sample_2.

    --referenceGenome

    No

    Specify the reference genome to use for alignment. Possible values: hg38 or hs37d5_chr. Default is hg38.

    --disableOraCompression

    No

    Specify to disable Ora compression.

    --customResourceDir

    No

    Provide custom resource directory path.

    --customConfig

    No

    Provide custom config file path.

    --keepFullWorkDir

    No

    Copy entire work dir to analysis output folder. Default behavior is to copy only nextflow logs.

    --version

    No

    Displays the version of the software, and then exits.

    --help

    No

    Displays the help text.

    --inputType

    Yes

    Possible values include fastq, bam, cram.

    --inputFolder

    Yes

    Input folder containing {input type} files. Multiple folders can be specified as a comma separated list.

    --sampleSheet

    No

    Full path to the sample sheet file. If the sample sheet is named SampleSheet.csv and is located in the single input folder (depending on how the analysis is initiated), this command is not required.

    --analysisFolder

    No

    Full path to the alternative analysis folder. Default is /staging/DRAGEN_Solid_WGS_Tumor_Normal_Pipeline_{version}_Analysis_{datetimestamp} if not specified. This folder must have enough available free space for the analysis and be on an NVMe SSD partition to achieve high performance.

    --sampleIDs

    No

    The comma-delimited sample IDs that are processed by the run. For example, Sample_1,Sample_2.

    --referenceGenome

    No

    Specify the reference genome to use for alignment. Possible values: hg38 or hs37d5_chr. Default is hg38.

    --disableOraCompression

    No

    Specify to disable Ora compression.

    --demultiplexOnly

    No

    Demultiplex to generate FASTQ files only without further analysis.

    --customResourceDir

    No

    Provide custom resource directory path.

    --customConfig

    No

    Provide custom config file path.

    --keepFullWorkDir

    No

    Copy entire work dir to analysis output folder. Default behavior is to copy only nextflow logs.

    --version

    No

    Displays the version of the software, and then exits.

    --help

    No

    Displays the help text.

    --inputType

    Yes

    Possible values include bcl, fastq, bam, cram.

    --inputFolder

    Yes

    Input folder containing {input type} files. Multiple {input type, except bcl} folders can be specified as a comma separated list.

    --sampleSheet

    No

    Full path to the sample sheet file. If the sample sheet is named SampleSheet.csv and is located in the run or fastq folder (depending on how the analysis is initiated), this command is not required.

    --analysisFolder

    No

    Full path to the alternative analysis folder. Default is /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}Analysis{datetimestamp} if not specified. This folder must have enough available free space for the analysis and be on an NVMe SSD partition to achieve high performance.

    For DRAGEN RNA/scRNA runs, it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    RNA Variant Calling

    Option
    Description

    --vc-target-bed $PATH

    Restrict the variants called to a target bed. For WTS, a bed file specifying the gene-coding regions should be provided to avoid calling erroneous variants in non-coding regions due to noisy reads.

    RNA Quant

    Option
    Description

    --rna-library-type

    Set the library according to the read orientations. Set to 'A' to auto detect the correct read orientation. Alternatively select 'IU', 'ISR', 'ISF', 'U', 'SR', or 'SF'.

    RNA Splice

    Option
    Description

    --rna-splice-variant-normals $PATH

    Optional setting list of normal splice variants that will be used filter false positive calls. The file should be a tab separated file with the following first four columns: (1) contig name, (2) first base of the splice junction (1-based), (3) last base of the splice junction (1-based), (4) strand (0: undefined, 1: +, 2: -).

    --rna-splice-variant-regions $PATH

    Target region bed file. Required for panels. The name of the region must be specified in the fourth column.

    RNA Fusion

    Option
    Description

    --rna-gf-enriched-regions $PATH

    For panels, the list of enriched genes should be set, either as a list of genes or a list of regions in BED format.

      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-rna true 
    --annotation-file $GTF                  #GTF or GFF3 format 
    --enable-map-align true                 #required for RNA/scRNA 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # RNA Quantification 
    --enable-rna-quantification true 
    --rna-library-type A                    #see 'RNA Quant' 
    --rna-quantification-gc-bias true 
    # RNA Splice Variants 
    --enable-rna-splice-variant true 
    # RNA Gene Fusions 
    --enable-rna-gene-fusion true 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
    For DRAGEN RNA/scRNA runs, it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    Single-cell RNA options

    To change the barcode or binning index positions, use --scrna-barcode-position and --scrna-umi-position. These settings should be provided in the form <startPos>_<endPos> for each barcode. Connect multiple barcode sequence positions with a '+'.

    For example, a library with the cell-barcode split into three blocks of 9 bp separated by fixed linker sequences and an 8 bp UMI would be set to: --scrna-barcode-position 0_8+21_29+43_51, and --scrna-umi-position 52_59.

    The following table list some optional settings:

    Option
    Description

    --enable-single-cell-rna true

    Option to enable single-cell rna mode.

    --scrna-barcode-position

    See example above or refer to

    --scrna-umi-position

    See example above or refer to

    --single-cell-threshold

    Cell filtering can be set to ['fixed', 'ratio', or 'inflection'].

    --scrna-barcode-sequence-list

    A known barcode sequence list can be optionally provided.

    --umi-source

    Optionally override the default barcode/BI source, valid option inclde ['read1', 'read2', 'qname', 'fastq'].

    For more details on single-cell RNA options, refer to the DRAGEN Single-Cell RNA User Guide.

    How to Set up Project
    1. Create a Project: Project can be specific for the DRAGEN Heme WGS Tumor Only v4.4.4 Pipeline or it can contain multiple Pipelines and/or Tools. For information on creating Projects, refer to the Projects section in Illumina Connected Analytics help. ICA standard storage is used by default as soon as the Project is saved. To connect a different storage source, set it up before creating your Project. For details and options, refer to the Storage section in Illumina Connected Analytics help.

    2. Edit Project and Add Bundle: Edit the Project and add the bundle titled, "Heme WGS TO v4.4.4 (XX)." XX is a 2-letter code designating the region from which you are launching the analysis. Adding the Bundle automatically adds the pipeline and associated resource files and datasets to the Project. For information on Bundles, refer to the Bundles section in Illumina Connected Analytics help. After adding the Bundle to the Project, an example dataset becomes available in the Demo_Data folder for the Project.

    3.  Upload the sequencing data: For information on viewing and uploading data, refer to the Data section in .

    4. Start Analysis: In the Project, navigate to Pipelines, select the Heme WGS TO v4_4_4_x  Pipeline, and then select  "Start New Analysis". Set up the new analysis by configuring the parameters listed in the . When the required files are completed, start analysis.

    5. Download Results: After analysis is complete, navigate to results in the configured output location.

    Please see the Illumina Support Shorts for guidance on how to set up and run DRAGEN Heme WGS Tumor Only analysis on ICA.

    Analysis Parameters on ICA

    To launch an analysis via the ICA user interface, configure a DRAGEN Heme WGS Tumor Only pipeline analysis with the following parameters.

    Parameter Name
    Description

    User Reference

    The analysis run name

    User Tags

    Text labels to help index the analysis.

    Notify me when task is completed

    Option to receive an email notification when analysis is complete.

    Output Folder

    The path to the analysis output folder. The default path is the project output folder.

    Entitlement Bundle

    Automatically populated from the project details.

    Samplesheet

    Select a sample sheet in CSV format for the analysis.To note: Sample Sheet selection is optional if starting from a run folder, and required when submitting a FASTQ folder.

    For information about using pipelines, refer to Illumina Connected Analytics support site page.

    DNA Germline WGS

  • DNA Germline WES

  • DNA Germline Panel

  • 5 Base DNA Germline WGS

  • 5 Base DNA Germline Panel

  • Germline with UMI Pipelines

    • DNA Germline WGS UMI

    • DNA Germline WES UMI

    • DNA Germline Panel UMI

    • 5 Base DNA Germline WGS UMI

    RNA and scRNA Pipelines

    • RNA WTS

    • RNA Panel

    • Illumina scRNA

    • Other scRNA prep

    Somatic Pipelines

    • DNA Somatic Tumor-Normal Solid WGS

    • DNA Somatic Tumor-Normal Solid WES

    • DNA Somatic Tumor-Normal Solid Panel

    • DNA Somatic Tumor-Only Solid WGS

    Somatic with UMI Pipelines

    • DNA Somatic Tumor-Normal Solid WGS UMI

    • DNA Somatic Tumor-Normal Solid WES UMI

    • DNA Somatic Tumor-Normal Solid Panel UMI

    • DNA Somatic Tumor-Only Solid WGS UMI

    Amplicon Pipelines

    • DNA Amplicon

    • RNA Amplicon

    Multicaller Workflows
    How to Launch Analysis
    1. Create a Project: Project can be specific for the DRAGEN Solid WGS Tumor Normal v4.4.4 Pipeline or it can contain multiple Pipelines and/or Tools). For information on creating Projects, refer to the Projects section in Illumina Connected Analytics help.

    ICA standard storage is used by default as soon as the Project is saved. To connect a different storage source, set it up before creating your Project. For details and options, refer to the Storage section in Illumina Connected Analytics help.

    1. Edit Project and Add Bundle: Edit the Project and add the bundle titled, "Solid WGS TN v4.4.4 (XX)." XX is a 2-letter code designating the region from which you are launching the analysis. Adding the Bundle automatically adds the pipeline and associated resource files and datasets to the Project. For information on Bundles, refer to the Bundles section in Illumina Connected Analytics help.

    After adding the Bundle to the Project, an example dataset becomes available in the Demo_Data folder for the Project.

    1.  Upload the sequencing data: For information on viewing and uploading data, refer to the Data section in Illumina Connected Analytics help.

    2. Start Analysis: In the Project, navigate to Pipelines, select the Solid WGS TN v4_4_4_x  Pipeline, and then select  "Start New Analysis". Set up the new analysis by configuring the parameters listed in the table below. When the required files are completed, start analysis.

    3. Download Results: After analysis is complete, navigate to results in the configured output location.

    Please see the Illumina Support Shorts for guidance on how to set up and run DRAGEN Solid WGS Tumor Normal analysis on ICA.

    Analysis Parameters on ICA

    To launch an analysis via the ICA user interface, configure a DRAGEN Solid WGS Tumor Normal pipeline analysis with the following parameters.

    Parameter Name
    Description

    User Reference

    The analysis run name

    User Tags

    Text labels to help index the analysis.

    Notify me when task is completed

    Option to receive an email notification when analysis is complete.

    Output Folder

    The path to the analysis output folder. The default path is the project output folder.

    Entitlement Bundle

    Automatically populated from the project details.

    Samplesheet

    Select a sample sheet in CSV format for the analysis.To note: Sample Sheet selection is optional if starting from a run folder, and required when submitting a FASTQ folder.

    For information about using pipelines, refer to Illumina Connected Analytics support site page.

    For more information about using ICA and BaseSpace Sequence Hub or running a pipeline Analysis Software analysis on ICA, refer to the relevant support pages on the Illumina support site.

    Getting Started

    DRAGEN provides tests you can run to make sure that your DRAGEN system is properly installed and configured. Before running the tests, make sure that the DRAGEN server has adequate power and cooling, and is connected to a network that is fast enough to move your data to and from the machine with adequate performance.

    Please refer to the Server Site Prep & Installation Guide when installing a new system.

    On-premises Installation

    The software can be installed on an on-premises server by executing the .run installer for the desired version. Installers are made available for all releases at the DRAGEN Software Support Site page.

    Installation procedure:

    • Download the desired installer from the support website and unzip the package

    • The archive integrity can be checked using: ./<dragen .run file> --check

    • Install the appropriate release based on your Linux OS with the command: sudo sh <dragen .run file>

    The .run file includes a script that administers un-installation of an existing software, integrity checking of the package and files, installation of the new DRAGEN software version. The DRAGEN software is installed in part by use of the Linux RPM Package Manager (rpm). Several rpm packages comprise the installation of a single DRAGEN software version. The RPM packages also configure the system for dragen, like raised user ulimits, and the .run script starts services needed for functionality, such as the Licensing daemon dragen_licd, and the hugepages daemon, dragend_hp.

    NOTE: Root privileges are required for the installation.

    Single Version Installation

    Up to DRAGEN Software v4.2, only one version of the DRAGEN software can be installed at a time. Executing the .run file will remove any existing installed version and (re)install the new version.

    After installation, the application and associated files are available at /opt/edico.

    The single version installer will add /opt/edico to the Linux $PATH, so that the user can just call dragen without specifying the full path.

    Multi-Version Installation

    Starting with DRAGEN Software v4.3 and later, multiple compatible versions of the DRAGEN software can be installed at a time. Executing the .run file will add the new version to the system.

    After installation, the application files are available at /opt/dragen/{version}/bin and FPGA files are located at /opt/bitstream/{bitstream version}.

    The multi-version installer will NOT add /opt/dragen/{version}/bin to the Linux $PATH, since multiple versions can be present at a given time. User should manage the desired paths to the specific version they want to run. When this guide provides command line examples, it will assume that the Linux $PATH is set to correct dragen version, and we will just refer to dragen <options>

    Notes on multi-version installation:

    • Installers released for DRAGEN v4.2 and earlier are single version packages

    • Single version packages and multi-version packages can not be mixed

      • Installation of a prior single version package will remove all the multi-version packages

      • Installation of a multi-version package will remove any installed single version package

    Example:

    Location of dragen and resource files

    DRAGEN Version
    on-premises server
    cloud instance

    Throughout this guide we will refer to <INSTALL_PATH> which will be either of the locations above

    Licensing

    DRAGEN requires license(s) for most functionality, please refer to the for guidance on how to install and/or review your current licenses.

    Running the System Check

    After turning on the server, you can make sure that your DRAGEN server is functioning properly by running <INSTALL_PATH>/self_test/self_test.sh, which does the following:

    • Automatically indexes chromosome M from the hg19 reference genome

    • Loads the reference genome and index

    • Maps and aligns a set of reads

    • Saves the aligned reads in a BAM file

    Each server ships with the test input FASTQ data for this script, which is located in <INSTALL_PATH>/self_test. The system check takes approximately 25--30 minutes.

    The following example shows how to run the script and shows the output from a successful test.

    If the output BAM file does not match expected results, then the last line of the above text is as follows:

    SELF TEST RESULT : FAIL

    If you experience a FAIL result after running this test script immediately after turning on your DRAGEN server, contact Illumina Technical Support.

    Running Your Own Test

    When you are satisfied that your DRAGEN system is performing as expected, you are ready to run some of your own data through the machine, as follows:

    • Load the reference table for the reference genome

    • Determine location of input and output files

    • Process input data

    Loading the Reference Genome

    Before a reference genome can be used with DRAGEN, it must be converted from FASTA format into a custom binary format for use with the DRAGEN hardware. For more information, see .

    The reference hash table specified on the command line is automatically loaded onto the board the first time you process data with a pipeline. You can manually load the hash table for your reference genome by using the following command:

    dragen -r <reference_hash-table_directory>

    Make sure that the reference hash table directory is on the fast file IO drive.

    The default location for the hash table for hg19 is as follows.

    /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149

    The command to load reference genome hg19 from the default location is as follows.

    dragen -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149

    This command loads the binary reference genome into memory on the DRAGEN board, where it is used for processing any number of input data sets. You do not need to reload the reference genome unless you restart the system or need to switch to a different reference genome. It can take up to a minute to load a reference genome.

    DRAGEN checks whether the specified reference genome is already resident on the board. If it is, then the upload of the reference genome is automatically skipped. You can force reloading of the same reference genome using the force-load-reference (-l) command line option.

    The command to load the reference genome prints the software and hardware versions to standard output. For example:

    After the reference genome has been loaded, the following message is printed to standard output:

    Determine Input and Output File Locations

    The DRAGEN Pipeline is very fast, which requires careful planning for the locations of the input and output files. If the input or output files are on a slow file system, then the overall performance of the system is limited by the throughput of that file system. It is recommended that inputs and outputs are streamed directly from/to a mounted external storage system.

    The DRAGEN system is preconfigured with at least one fast file system consisting of a set of fast SSD disks grouped with RAID-0 for performance. This file system is mounted at /staging. This name was chosen to emphasize the fact that this area was built to be large and fast, but is not redundant. Failure of any of the file system's constituent disks leads to the loss of all data stored there.

    During processing, DRAGEN generates and reads back temporary files. With DRAGEN, it is highly recommended to always direct temporary files to the fast SSD (or /staging) by using the --intermediate-results-dir option. If the --intermediate-results-dir option is not provided, temporary files are written to the --output-directory. DRAGEN recommends streaming inputs and outputs using an mounted external storage system.

    Process Your Input Data

    To analyze FASTQ data, use the dragen command. For example, the following command can be used to analyze a single-ended FASTQ file:

    For detailed information on the command line options, see .

    For recommended command lines in typical use cases, see .

    Custom Workflow

    Autolaunch requires additional BaseSpace Sequence Hub and sample sheet settings.

    BaseSpace Sequence Hub Requirements for ICA Autolaunch

    Autolaunch uses the BaseSpace Sequence Hub (BSSH) run planning tool to create and export a v2 format sample sheet to enable streaming of sequencing run data to the project and requires the following additional settings. See Figure 1 below.

    • Access to BaseSpace Sequence Hub.

    • ICA Run Storage is enabled under BaseSpace Sequence Hub settings.

    Refer to the BaseSpace Sequence Hub support site page for information on .

    Illumina Cloud Run Planning and Auto-Launch Workflow

    Autolaunch requires a v2 format sample sheet with specific parameters that instruct the BSSH project to automatically initiate a Heme pipeline analysis in ICA. Use the run planning option in BaseSpace Sequence Hub to generate the sample sheet. The exported sample sheet is automatically populated with the required fields. Using an invalid sample sheet can result in failed runs and analyses.

    Refer to Table 1 below for descriptions of the added fields. Enter the following required run parameters in BaseSpace Sequence Hub Run Planning:

    Parameter Name
    Setting

    For more information on run planning, refer to the the

    Figure 1. BSSH Run Planning Enabled End to End Workflow

    The BaseSpace Sequence Hub setting for run monitoring and storage must be selected on the instrument to use Heme pipeline Analysis Software analysis Autolaunch. For information on preparing your instrument for DRAGEN Heme App for Whole-Genome Sequencing Analysis Software Autolaunch, refer to the documentation for your instrument.

    1. Use Run Planning in BaseSpace Sequence Hub to create and export a sample sheet.

    2. Import the sample sheet to the instrument and start the sequencing run. Data is uploaded to BaseSpace Sequence Hub and then pushed to ICA. You can monitor the run in BaseSpace Sequence Hub.

    3. When sequencing and the upload completes, analysis autolaunches in ICA. You can monitor the status of the analysis in BaseSpace Sequence Hub or ICA

    4. If necessary, requeue the analysis via the run's Summary page in BaseSpace Sequence Hub. Refer to the BaseSpace Sequence Hub support site page for more information on requeuing an analysis.

    Table 1. Additional Sample Sheet Fields for Autolaunch

    Autolaunch-compatible sample sheets contain the following fields specific to autolaunch configuration.

    Section
    Parameter
    Details
    Required

    Analysis Methods

    The Heme pipeline is a DNA only analysis software based on the DRAGEN Secondary Analysis Software. Even though it includes some of the default settings from the DNA Somatic Tumor-Only Heme WGS DRAGEN recipe, it uses a distinct recipe with different options. A user has the ability to override specific parameters via a custom configuration file.

    Figure 1. DRAGEN Variant Calling Workflow

    An example command is provided that highlights the input and output used in DragenCaller step of the Heme Pipeline, which may be found in the log file. Any parameter options not displayed on the command line would be using the default value for the DRAGEN variant caller module. The detailed parameters and default arguments for the individual modules within the DragenCaller step may be found in the replay.json output. See DRAGEN Command Line Options for detailed explanations of the parameters.

    Reference Genomes

    The Heme pipeline supports two reference genomes for the DRAGEN Map/Aligner - hg38 and hs37d5_chr.

    The hs37d5_chr genome is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.

    DRAGEN Map/Aligner

    involves aligning sequencing reads derived from DNA libraries to a reference genome prior to variant calling.

    The pipeline currently does not support UMI libraries by default. Please use the to generate the collapsed BAM as input, if so desired.

    DRAGEN continues to use these final alignments as input for various variant calls such as gene amplification (copy number) calling, small variant calling (SNV, indel, MNV, delin), and DNA library quality control.

    Small Variant Calling and Filtering

    DRAGEN supports calling SNVs, indels, MNVs, and delins in tumor-only samples by using mapped and aligned DNA reads from a tumor sample as input. Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. DRAGEN insertions and deletions are validated with lengths of at least 0–25 bp and more than 25 bp can be supported. In addition, DRAGEN also uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp can then be reassembled into complex variants (MNVs and delins). The tumor-only pipeline produces a VCF file containing both germline and somatic variants that can be further analyzed to identify tumor mutations. The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.

    DRAGEN small variant calling includes the following steps:

    1. Detects regions with sufficient read coverage (callable regions).

    2. Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).

    3. Assembles de novograph haplotypes are assembled from reads (haplotype assembly).

    4. Extracts possible somatic or germline calls (events) from column wise pileup analysis.

    Additional information is available at .

    Copy Number Variant Calling

    The DRAGEN copy number variant caller performs amplification, reference, and deletion calling for CNV targets within the assay. It counts the coverage of each target interval on the panel, uses a preprocessed panel of normal samples to normalize target counts, corrects for GC coverage bias, and calculates scores of a CNV event from observed coverage and makes copy number calls.

    Additional information is available at .

    Structural Variant Calling

    The DRAGEN Structural Variant (SV) Caller is described . The DUX4 rearrangement caller is described .

    Variant Deduplication

    The Variant Deduplication is described

    Contamination Detection

    The contamination analysis step detects foreign human DNA contamination using the SNP error file and pileup file that are generated during the small variant calling and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. In contaminated samples, the variant allele frequencies in SNPs shift from the expected values of 0%, 50%, or 100%. The algorithm collects all positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation. The contamination score is the sum of all the log likelihood scores across the predefined SNP positions with minor allele frequency < 25% in the sample and are not likely due to CNV events.

    The larger the contamination score, the more likely there is foreign DNA contamination. A sample is considered to be contaminated if the contamination score is above predefined quality threshold. The contamination score was found to be high in samples with highly rearranged genomes or HRD samples. 1% of HRD samples found to be above the threshold with no evidence for actual contamination.

    Annotation

    The Illumina Annotation Engine performs annotation of small variants, CNVs, and exon-level CNVs. The inputs are gVCF files and the outputs are annotated JSON files.

    The Heme pipeline currently does not support annotation of gVCF files. Please use the to perform tertiary analysis.

    Tumor Mutational Burden

    Not Supported in the current release. Please use the .

    Microsatellite Instability Status

    Not supported in the current release. Please use the .

    Post Processing

    A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.

    This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.

    Key Features

    • Customizability: Easily adaptable to different post-processing requirements.

    • Reusability: Can be used in multiple pipelines, reducing development effort.

    • Data transformation: Can be used to transform or modify output data in various ways.

    What you need ?

    1. A config file which has Post-Processing parameters and values

    2. A bash script , that implements desired functioanlity

    3. Any other custom resources/files that will be required by the bash script

    4. Docker container having dependencies to run the bash script

    Process

    1. Upload and configure

    2. Modify config file; Set postProcessing_container to the uploaded conatiner

    3. Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.

    4. Configure ICA Web-UI on 'Start Analysis' Page:

    Config File - <file-name>.config

    Configurable Parameters in Config file

    Parameter
    Description

    Allowed values for postProcessing_cpusMemoryConfig in the config file

    Value
    Description

    Post-Processing : Sample Script (bam2cram.sh)

    A Post-Processing bash script is a , which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.

    Custom Workflow

    Autolaunch requires additional BaseSpace Sequence Hub and sample sheet settings.

    BaseSpace Sequence Hub Requirements for ICA Autolaunch

    Autolaunch uses the BaseSpace Sequence Hub (BSSH) run planning tool to create and export a v2 format sample sheet to enable streaming of sequencing run data to the project and requires the following additional settings. See Figure 1 below.

    • Access to BaseSpace Sequence Hub.

    • ICA Run Storage is enabled under BaseSpace Sequence Hub settings.

    Refer to the BaseSpace Sequence Hub support site page for information on .

    Illumina Cloud Run Planning and Auto-Launch Workflow

    Autolaunch requires a v2 format sample sheet with specific parameters that instruct the BSSH project to automatically initiate a pipeline analysis in ICA. Use the run planning option in BaseSpace Sequence Hub to generate the sample sheet. The exported sample sheet is automatically populated with the required fields. Using an invalid sample sheet can result in failed runs and analyses.

    Refer to Table 1 below for descriptions of the added fields. Enter the following required run parameters in BaseSpace Sequence Hub Run Planning:

    Parameter Name
    Setting

    For more information on run planning, refer to the the

    Figure 1. BSSH Run Planning Enabled End to End Workflow

    The BaseSpace Sequence Hub setting for run monitoring and storage must be selected on the instrument to use pipeline Analysis Software analysis Autolaunch. For information on preparing your instrument for DRAGEN App for Whole-Genome Sequencing Analysis Software Autolaunch, refer to the documentation for your instrument.

    1. Use Run Planning in BaseSpace Sequence Hub to create and export a sample sheet.

    2. Import the sample sheet to the instrument and start the sequencing run. Data is uploaded to BaseSpace Sequence Hub and then pushed to ICA. You can monitor the run in BaseSpace Sequence Hub.

    3. When sequencing and the upload completes, analysis autolaunches in ICA. You can monitor the status of the analysis in BaseSpace Sequence Hub or ICA

    4. If necessary, requeue the analysis via the run's Summary page in BaseSpace Sequence Hub. Refer to the BaseSpace Sequence Hub support site page for more information on requeuing an analysis.

    Table 1. Additional Sample Sheet Fields for Autolaunch

    Autolaunch-compatible sample sheets contain the following fields specific to autolaunch configuration.

    Section
    Parameter
    Details
    Required

    Illumina scRNA

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN RNA/scRNA runs, it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Single-cell RNA PIPseq options

    PIPseq mode batch option to automatically set the barcode/BI source, the barcode and binning index positions and the barcode sequence list options.

    By default the barcode/BI is read from read 1 and the transcript is obtained from read 2.

    To change the barcode or binning index positions, use --scrna-barcode-position and --scrna-umi-position. These settings should be provided in the form <startPos>_<endPos> for each barcode. Connect multiple barcode sequence positions with a '+'.

    For example, a library with the cell-barcode split into three blocks of 9 bp separated by fixed linker sequences and an 8 bp UMI would be set to: --scrna-barcode-position 0_8+21_29+43_51, and --scrna-umi-position 52_59.

    The following table list some optional settings:

    Option
    Description

    For more details on PIPseq pipeline options, refer to the

    Analysis Output

    Analysis Output

    When the analysis run completes, the software generates an analysis output in a folder named /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}_Analysis_{datetimestamp}, unless a specific location is specified on the command line. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID. Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.

    Launching Analysis

    UI Options

    Manual Launch of Heme pipeline Analysis Software Analysis

    To manually launch an analysis, configure a Heme pipeline Analysis Software pipeline analysis run in ICA with the following parameters.

    RNA Amplicon

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    DRAGEN Amplicon Pillar Panel Specific Settings

    To support the varied designs of amplicon panels and the specific requirements of different analysis types (e.g., SNV, CNV, SV, MSI, RNA fusion, RNA splice variants, and RNA 3'/5' imbalance ratio), panel-specific parameter settings have been integrated into the command-line options. Each supported Pillar panel has a dedicated option, and the details for these RNA panels are listed in the table below:

      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    # Mapper 
    --enable-rna true 
    --annotation-file $GTF                  #GTF or GFF3 format 
    --enable-map-align true                 #required for RNA/scRNA 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # Single Cell 
    --enable-single-cell-rna true 
    --umi-source qname                      #default='qname' 
    --scrna-barcode-position $BARCODE_POS 
    --scrna-umi-position $UMI_POS           #see notes 
    --scrna-barcode-sequence-list $PATH     #optional 
    --single-cell-threshold ratio           #['fixed', 'ratio', inflection'] 
    --single-cell-threshold-filterby umi    #['umi', 'read'] 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
    /opt/edico/bin/dragen \
    --ref-dir /staging/dragen-app-manager/resources/Illumina_hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11_r5.0-1 \
    --output-directory DragenCaller/Sample-001 \
    --output-file-prefix Sample-001 \
    --events-log-file DragenCaller/Sample-001/events.csv \
    --vc-systematic-noise=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/snv/IDPF_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz  \
    --vc-enable-germline-tagging=true \
    --variant-annotation-data=/staging/dragen-app-manager/resources/Illumina_variant_annotation_data-tmb_annotations_4.4.4-1/tmb_annotations \
    --vc-germline-tag-hotspots=false \
    --logging-to-output-dir=true \
    --gc-metrics-enable=true \
    --enable-metrics-json=true \
    --enable-map-align=true  \
    --enable-sort=true \
    --enable-duplicate-marking=true \
    --enable-variant-caller=true \
    --heme-sv=true \
    --sv-systematic-noise=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/sv/WGS_FF_Heme_hg38_v3.1.0_systematic_noise.sv.bedpe.gz \
    --heme-cnv=true \
    --cnv-population-b-allele-vcf=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/cnv/hg38_1000G_phase1.snps.high_confidence.vcf.gz \
    --enable-variant-deduplication=true \
    --vc-output-evidence-bam=false \
    --qc-detect-contamination=true \
    --enable-dux4-caller=true \
    --max-base-quality=63 \
    --tumor-fastq-list Sample-001.fastq_list.csv \
    --tumor-fastq-list-sample-id Sample-001 \
    --force
    Note - Post-Processing feature is avaialable only for ICA Environment.
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    # Mapper 
    --enable-rna true 
    --annotation-file $GTF                  #GTF or GFF3 format 
    --enable-map-align true                 #required for RNA/scRNA 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # Single Cell PIPseq 
    --scrna-enable-pipseq-mode true 
    --single-cell-threshold ratio           #['fixed', 'ratio', inflection'] 

    Input Directory

    The run folder or FASTQ folder that contains files to analyze.

    Input Type

    Select input type of analysis will perform on. Options to select include bcl, fastq, bam and cram

    Sample or Pair IDs

    Optional subset of Sample IDs or Pair IDs to analyze.

    Reference Genome

    Select the reference genome. hs37d5_chr is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.

    Enable Ora Compression

    Enable Ora Compression (True or False). Only applicable when Input Type is bcl

    Enable Post Processing

    Enable Post Processing (True or False) to run custom scripts at the end of pipeline

    Storage Size

    The storage size to allocate for the analysis. The default and recommended value is Large.

    Custom Parameters Config File

    Optional. Select Custom Parameters Config File that override default config

    Custom Resources Directory

    Optional. Select Custom Resources Directory to use with Custom Parameters Config File

    CAUTION - This parameter ...

    Optional. Those configuration with this comment is only applies to auto-launch DRAGEN Solid WGS Tumor Normal analysis from FASTQs after BCL. Please don't set it if start analysis from ICA UI

    Input Directory

    The run folder or FASTQ folder that contains files to analyze.

    Input Type

    Select input type of analysis will perform on. Options to select include bcl, fastq, bam and cram

    Sample or Pair IDs

    Optional subset of Sample IDs or Pair IDs to analyze.

    Reference Genome

    Select the reference genome. hs37d5_chr is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.

    Enable Ora Compression

    Enable Ora Compression (True or False). Only applicable when Input Type is bcl

    Enable Post Processing

    Enable Post Processing (True or False) to run custom scripts at the end of pipeline

    Storage Size

    The storage size to allocate for the analysis. The default and recommended value is Large.

    Custom Parameters Config File

    Optional. Select Custom Parameters Config File that override default config

    Custom Resources Directory

    Optional. Select Custom Resources Directory to use with Custom Parameters Config File

    CAUTION - This parameter ...

    Optional. Those configuration with this comment is only applies to auto-launch DRAGEN Heme WGS Tumor Only analysis from FASTQs after BCL. Please don't set it if start analysis from ICA UI

    Illumina Connected Analytics help
    table below
    5 Base DNA Germline Panel UMI
    DNA Somatic Tumor-Only Heme WGS
    DNA Somatic Tumor-Only Solid WES
    DNA Somatic Tumor-Only Solid Panel
    5 Base DNA Somatic Tumor-Normal Solid WGS
    5 Base DNA Somatic Tumor-Normal Solid Panel
    5 Base DNA Somatic Tumor-Only Solid WGS
    5 Base DNA Somatic Tumor-Only Solid Panel
    DNA Somatic Tumor-Only Solid WES UMI
    DNA Somatic Tumor-Only Solid Panel UMI
    DNA Somatic Tumor-Only ctDNA Panel UMI
    5 Base DNA Somatic Tumor-Only Solid WGS UMI
    5 Base DNA Somatic Tumor-Only Solid Panel UMI
    5 Base DNA Somatic Tumor-Only ctDNA Panel UMI
    scRNA
    scRNA

    Calibrates read base qualities to account for background noise.

  • Computes read likelihoods for each read/haplotype pair.

  • Performs mutation calling by summing the genotype probabilities across all reads/haplotype pairs.

  • Performs additional filtering to improve variant calling accuracy, including using a systematic noise file. The systematic noise file indicates the statistical probability of noise at specific positions in the genome. This noise file is constructed using clean (normal) samples. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.

  • DNA alignment
    DRAGEN DNA Pipeline UMI recipe
    DRAGEN DNA Pipeline Small Variant Calling
    DRAGEN DNA Pipeline Small Variant Calling
    here
    here
    here
    Illumina Connected Insights
    DNA Somatic Tumor-Only Heme WGS DRAGEN recipe
    DNA Somatic Tumor-Only Heme WGS DRAGEN recipe

    After installing a multi-version package, see a list of installed versions at any time by running /usr/bin/dragen_versions

  • To remove any multi-version package, call yum remove on its Path

  • Adding PATH="/opt/dragen/{version}/bin:$PATH" to the last line of .bashrc file avoids the need to set the path upon each server login

  • Asserts that the alignments exactly match the expected results

    4.3 and later

    /opt/dragen/{version}

    /opt/edico/

    4.2 and earlier

    /opt/edico/

    /opt/edico/

    Licensing Reference Section
    Prepare a Reference Genome
    DRAGEN Host Software
    DRAGEN Recipes
  • Enable postprocessing, Set it to 'true'

  • Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above

  • Add 'Custom Resources Directory', set it to the custom-resource directory above.

  • postProcessing_container

    Docker Container URI , Must be present/uploaded to ICA

    postProcessing_cpusMemoryConfig

    Compute Option to Use, allowed values given below

    postProcessing_shellScript

    File name of shell-script

    single_threaded_low_mem (default)

    CPUs: 2, Mem(GB): 8

    single_threaded_medium_mem

    CPUs: 4, Mem(GB): 16

    single_threaded_high_mem

    CPUs: 8, Mem(GB): 32

    multi_threaded_low_mem

    CPUs: 16, Mem(GB): 64

    multi_threaded_medium_mem

    CPUs: 32, Mem(GB): 128

    multi_threaded_high_mem

    CPUs: 64, Mem(GB): 128

    Custom Docker
    Nextflow Template

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --scrna-enable-pipseq-mode

    Option to enable PIPseq mode.

    --scrna-barcode-position

    See example above or refer to scRNA PIPseq

    --scrna-umi-position

    See example above or refer to scRNA PIPseq

    --single-cell-threshold

    Cell filtering can be set to ['fixed', 'ratio', or 'inflection'].

    --scrna-barcode-sequence-list

    A known barcode sequence list can be optionally provided.

    --umi-source

    Optionally override the default barcode/BI source, valid option inclde ['read1', 'read2', 'qname', 'fastq'].

    Product Files
    BCL conversion
    scRNA PIPseq Pipeline User Guide

    Panel Code

    Sample Type

    Default variant caller enabled

    Command Line Options

    oncoReveal Heme

    Heme

    P-HFU-01

    RNA

    RNA fusion

    --amplicon-enable-rna-heme

    oncoReveal Fusion LBx

    Fusion LBx

    P-LBX-03

    cfRNA

    RNA fusion, RNA splice-variant

    --amplicon-enable-cfrna-lbxfusion

    oncoReveal Multi-Cancer RNA Fusion v2

    Multi-Cancer with Fusion

    SF-V2

    RNA

    RNA fusion, RNA splice-variant, RNA 3'/5' imbalance-ratio

    --amplicon-enable-rna-multicancer

    For more detail on the amplicon pipeline, please refer to DRAGEN Amplicon Pipeline

    Notes and additional options

    Hashtable

    For DRAGEN RNA amplicon runs, it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking false

    The Amplicon Pipeline disables duplicate marking. In amplicon assays, fragments originate from a limited number of unique start and end positions, making conventional duplicate detection inappropriate.

    RNA Fusion

    Option
    Description

    --rna-gf-enriched-regions $PATH

    Fusion calling parameters are automatically set in RNA amplicon mode but can be overridden in the command line. If fusion targets are not listed in the amplicon BED file, users can explicitly set it to a file containing fusion gene IDs or symbols.

    Panel Name

    Short Name

    $ dragen_versions
    The output format of this command may change. Use --json for machine readable output.
    
    Dragen Version           Size (MB)  Install Date         Path
    4.3.2                    1378.03    2024-03-10 18:26:17  /opt/dragen/4.3.2
    4.4.3                    1381.41    2024-03-18 20:56:39  /opt/dragen/4.4.3
    4.3.5                    1379.25    2024-03-11 15:20:24  /opt/dragen/4.3.5
    
    Bitstream Version        Size (MB)  Install Date         Path
    07.031.732 (0x18101306)  598.95     2024-03-10 18:26:03  /opt/bitstream/07.031.732
    07.031.745 (0x18101306)  598.95     2024-03-18 20:56:18  /opt/bitstream/07.031.745
     
    To remove a dragen version, call `yum remove` on its Path.
    $ /opt/dragen/4.3.4/self_test/self_test.sh
    #############################################################
    Logging to /var/log/dragen/self_test.1714627157_160164.0.details.log
    Using dragen executables in /opt/dragen/4.3.4/bin
    Using board(s): 0 
    #############################################################
    Running tests for board 0 (u200)
    Using scratch directory /tmp/self_test.4BO0pfPST9/0
    -------------------------------------------------------------
    Board 0 test 1, FPGA MEMORY TEST
    Loading DIAG bitstream
    Running fpga memory test, this will take ~13 minutes
    Board 0 test 1, FPGA MEMORY TEST: PASS
    -------------------------------------------------------------
    Board 0 test 2, BAR REGISTER ACCESS
    Board 0 test 2, BAR REGISTER ACCESS: PASS
    -------------------------------------------------------------
    Board 0 test 3, FPGA TEMP REG ACCESS
    FPGA Temperature: 27C  (Max Temp: 36C, Min Temp: 22C)
    Board 0 test 3, FPGA TEMP REG ACCESS: PASS
    -------------------------------------------------------------
    Board 0 test 4, BOARD SERIAL # REG ACCESS
    Serial Number: 2130069BM05V
    Board 0 test 4, BOARD SERIAL # REG ACCESS: PASS
    -------------------------------------------------------------
    Board 0 test 5, DRAGEN GENOME LICENSE
    Board 0 test 5, DRAGEN GENOME LICENSE: PASS
    -------------------------------------------------------------
    Board 0 test 6, CPLD DATE TEST
    cpld date is n/a
    Board 0 test 6, CPLD DATE TEST: PASS
    -------------------------------------------------------------
    Board 0 test 7, ENCRYPTION KEY EXISTENCE TEST
    Board 0 test 7, ENCRYPTION KEY EXISTENCE TEST: PASS
    -------------------------------------------------------------
    Board 0 test 8, PARTIAL RECONFIGURATION
    DNA-MAPPER: ok
    RNA-MAPPER: ok
    HMM: ok
    ZIP: ok
    UNZIP: ok
    DIAG: ok
    Board 0 test 8, PARTIAL RECONFIGURATION: PASS
    -------------------------------------------------------------
    Board 0 test 9, HASH TABLE GENERATION
    Board 0 test 9, HASH TABLE GENERATION: PASS
    -------------------------------------------------------------
    Board 0 test 10, MAP AND ALIGNER
    running mapper aligner: ok
    unmapped input records percentages: ok
    md5sum check dbam sorted: pass
    Board 0 test 10, MAP AND ALIGNER: PASS
    -------------------------------------------------------------
    Board 0 test 11, VARIANT CALLER E2E
    running variant caller: ok
    md5sum check dbam sorted: ok
    md5sum check VCF: ok
    Board 0 test 11, VARIANT CALLER E2E: PASS
    #############################################################
    SELF TEST COMPLETED
    SELF TEST RESULT : PASS
    #############################################################
    Log file at /var/log/dragen/self_test.1714627157_160164.0.details.log
    
    DRAGEN Host Software Version 01.001.035.01.00.30.6682 and
    
    Bio-IT Processor Version 0x1001036
    DRAGEN finished normally
    dragen \
    -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
    -1 /staging/test/data/SRA056922.fastq \
    --output-directory /staging/test/output \
    --output-file-prefix SRA056922_dragen \
    --RGID DRAGEN_RGID \
    --RGSM DRAGEN_RGSM
    
    postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
    postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
    postProcessing_shellScript = 'bam2cram.sh'
    
    
    #========================================================#
    # This is a SAMPLE Script only for illustration purpose  #
    # Modify it, according to your specific Use Case         #
    #========================================================#
    
    #must create this folder to save output files
    mkdir -p "${params.postProcessing.stepName}"
    
    cd "${params.postProcessing.stepName}"
    
    #BAMs are located in 'analysis/results' folder
    resultsdir="${params.analysisDir}/Results"
    #this file must be uploaded to custom-resources-dir
    genomefa="${params.customResourceDir}/genome.fa"
    
    sleep_interval=30 # seconds
    max_attempts=3
    
    #set sample ids
    sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")
    
    for sample_id in "\${sample_ids[@]}"; do
        counter=0
        while : ; do
            if [ "\$counter" -eq "\$max_attempts" ]; then
                echo "WARNING! \${sample_id}.bam was NOT found!"
                break
            fi
            counter=\$((counter + 1))
            bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
            if [ -z "\$bam_file" ]; then
                echo "Attempt \$counter : Waiting for \${sample_id}.bam"
                sleep \$sleep_interval
            else
                #process and break
                filename=\$(basename -s .bam \$bam_file)
                samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
                break
             fi
        done
    done
    
    exit 0
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # RNA amplicon 
    --enable-rna-amplicon true 
    --amplicon-target-bed $PATH 
    # Mapper 
    --enable-rna true 
    --annotation-file $GTF                  #GTF or GFF3 format 
    --enable-map-align true                 #required for RNA/scRNA 
    --enable-map-align-output true          #optionally save the output BAM 
    # RNA Splice Variants 
    --enable-rna-splice-variant true 
    # RNA Gene Fusions 
    --enable-rna-gene-fusion true 
    --rna-gf-enriched-regions $PATH         #see 'RNA Fusion' auto-generated from amplicon target bed
    # RNA 3'/5' imbalance-ratio             #optional for panels that support 3'/5' imbalance-ratio  
    --amplicon-enable-imbalance-ratio true  
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
  • View the analysis output results in either BaseSpace Sequence Hub or ICA.

  • - 1–70 characters.

    - Alpha numeric characters with underscores, No dashes and spaces. If you enter an underscore, dash, or space, enter an alphanumeric character before and after.

    Cloud_Heme_Settings

    SoftwareVersion

    The Heme software version

    No

    StartsFromFastq

    Set the value to TRUE or FALSE. If autolaunching from BCL files, this must be set to FALSE.

    Yes

    Cloud_Data

    Sample_ID

    The same sample ID used in the Cloud_Heme_Data section.

    No

    ProjectName

    The BaseSpace Sequence Hub project name.

    No

    LibraryName

    Combination of sample ID and index values in the No following format: sampleID_Index_Index2.

    No

    LibraryPrepKitName

    The Library Prep Kit used.

    No

    IndexAdapterKitName

    The Index Adapter Kit used.

    No

    Cloud_Settings

    GeneratedVersion

    The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.

    No

    CloudWorkflow

    ica_workflow_1

    Yes

    Cloud_Heme_Pipeline

    This value is a universal record number (URN). The valid value is defined in

    Yes

    Secondary Analysis

    BaseSpace Sequence Hub / Illumina Connected Analytics

    Application

    DRAGEN Heme App for Whole-genome Sequencing

    Cloud_Heme_Data

    Sample_ID

    The unique ID to identify a sample. Must match a Sample_ID used in the Heme_Data section.

    Yes

    Sample_Type

    Sample type.

    No

    Sample_Description

    Must meet the following requirements:

    setting up a BaseSpace Sequence Hub project
    run planning section
    BSSH enabled workflow

    No

  • View the analysis output results in either BaseSpace Sequence Hub or ICA.

  • - 1–70 characters.

    - Alpha numeric characters with underscores, No dashes and spaces. If you enter an underscore, dash, or space, enter an alphanumeric character before and after.

    Cloud_TN_Settings

    SoftwareVersion

    The software version

    No

    StartsFromFastq

    Set the value to TRUE or FALSE. If autolaunching from BCL files, this must be set to FALSE.

    Yes

    Cloud_Data

    Sample_ID

    The same sample ID used in the Cloud_TN_Data section.

    No

    ProjectName

    The BaseSpace Sequence Hub project name.

    No

    LibraryName

    Combination of sample ID and index values in the No following format: sampleID_Index_Index2.

    No

    LibraryPrepKitName

    The Library Prep Kit used.

    No

    IndexAdapterKitName

    The Index Adapter Kit used.

    No

    Cloud_Settings

    GeneratedVersion

    The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.

    No

    CloudWorkflow

    ica_workflow_1

    Yes

    Cloud_TN_Pipeline

    This value is a universal record number (URN). The valid value is defined in

    Yes

    Secondary Analysis

    BaseSpace Sequence Hub / Illumina Connected Analytics

    Application

    DRAGEN App for Whole-genome Sequencing

    Cloud_TN_Data

    Sample_ID

    The unique ID to identify a sample. Must match a Sample_ID used in the TN_Data section.

    Yes

    Sample_Type

    Sample type.

    No

    Sample_Description

    Must meet the following requirements:

    setting up a BaseSpace Sequence Hub project
    run planning section
    BSSH enabled workflow

    No

    Output Folders

    This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed.

    • 📂 Results - Contains the final result files from the pipeline.

      • 📄 MetricsOutput.tsv - Contains summary metrics for all samples.

        • 📂 Sample1

          • 📄 Sample1_MetricsOutput.tsv—Contains summary metrics for the specific sample.

          • 📄 Sample1.tumor.baf.bedgraph.gz —Contains the BED graph representation of the B-allele frequency (if available).

          • 📄 Sample1.sv.small_indel_dedup.filtered.vcf.gz — Contains DNA structural variants excluding the indels already present in the hard-filtered.vcf file after applying the DragenSvExtraFilters.

          • 📄 Sample1.hard-filtered.vcf.gz—Contains small variants VCF.

          • 📄 Sample1.cnv.vcf.gz —Contains copy number variants VCF.

    • 📂 Logs_Intermediates - Contains all intermediate files for each step of the pipeline.

      • 📂 SampleSheetValidation

      • 📂 ResourceVerification

      • 📂 RunQc(only when started from BCLs)

    • 📂 work - Contains Nextflow execution details for debugging purpose.

    • 📂 errors - Contains an Errors.tsv file if any pipipeline analysis step failed.

    • 📄 SampleSheet.csv - User input sample sheet as provided.

    • 📄 pipeline_trace.txt - Contains Nextflow pipeline step execution status.

    • 📄 timeline_${timestamp}.html - Contains Nextflow pipeline task timeline information.

    • 📄 report_${timestamp}.html - Contains Nextflow pipeline task execution details.

    • 📄 receipt - Contains pipeline analysis CLI parameters and execution environment information.

    • 📄 payload.json - Contains pipeline analysis setup parameters and execution environment information.

    • 📄 nextflow.log - Contains Nextflow pipeline execution log.

    • 📄 analysis.log - Contains Nextflow pipeline standard output.

    File Overview

    This section describes the summary output files generated during analysis.

    Metrics Output

    File name: MetricsOutput.tsv

    The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each sample.

    Run Metrics

    Run metrics from the analysis module indicate the quality of the sequencing run. Review the following metrics to assess run data quality:

    Metric
    Description
    Recommended Threshold

    PCT_Q30_R1

    Percentage of bases with a quality score ≥ 30 from Read 1.

    ≥ 80.0 (≥85.0 for NovaSeq X Plus)

    PCT_Q30_R2

    Percentage of bases with a quality score ≥ 30 from Read 2.

    ≥ 80.0 (≥85.0 for NovaSeq X Plus)

    The values in the Run Metrics section are listed as NA in the following situations:

    • The analysis was started from FASTQ files.

    • The analysis was started from BCL files, and the InterOp files are missing or corrupt.

    Sample QC Metrics

    Review the following metrics to assess sample data quality:

    Metric (UOM)
    Recommended Threshold
    Description

    TUMOR_ESTIMATED_SAMPLE_CONTAMINATION (NA)

    NA

    The estimated fraction of reads in a sample that may be from another human source

    TUMOR_MAPPED_READS_PCT (%)

    NA

    Percent of mapped reads in the tumor sample

    TUMOR_INSERT_LENGTH_MEDIAN (count)

    NA

    Median insert length of tumor sample

    TUMOR_Q30_BASES_EXCL_DUPS_AND_CLIPPED_BASES (bp)

    NA

    General
    Parameter Name
    Description
    Required
    Default

    User Reference

    The custom name of the analysis for later identification.

    Yes

    Empty

    User Tags

    Tags for the analysis to help with categorization and identificaion, enhancing organization and searchability.

    No

    Empty

    Notification

    Add a user to be notified when the analysis completes.

    No

    Input Files

    Parameter Name
    Description
    Required
    Default

    Samplesheet

    The SampleSheet.csv for the analysis

    Yes

    SampleSheet.csv in Input Folder

    Input Directory

    The input folder that contains [bcl, fastq, bam, cram] files to analyze. Multiple input [fastq, bam, cram] folders can be specified.

    Yes

    No folder selected

    Custom Parameters Config File

    The custom parameters config file for the analysis.

    No

    Settings

    Parameter Name
    Description
    Required
    Default

    Input Type

    The type of files in the Input Folder(s): bcl, fastq, bam, cram.

    Yes

    bcl

    Reference Genome

    The reference genome used for the analysis: [hs37d5_chr, hg38].

    Yes

    hg38

    Enable Ora Compression

    Compress fastq files using ora compression. [Only applies when Input Type is bcl].

    No

    Resources

    Parameter Name
    Description
    Required
    Default

    Storage Size

    The storage size to allocate for the analysis. The minimum required value is Large.

    Yes

    Large

    Other available options for the storage size:

    • "1.2TB" if option selected is Small

    • "2.4TB" if option selected is Medium

    • "7.2TB" if option selected is Large

    • "16TB" if option selected is XLarge

    • "32TB" if option selected is 2XLarge

    • "64TB" if option selected is 3XLarge

    Note It is recommended to reserve storage size twice the size of the BCL run folder, or the input fastq.gz or bam files, four times the size of the cram file (cram is 30-70% of the bam), and 8 times the size of the fastq.ora (fastq.ora is about 25% of fastq.gz).

    Using the icav2 client

    Customer may use the icav2 client to launch analysis from the CLI. The specific parameters supported may be obtained from the Project Pipeline details under the XML configuration tab.

    XML Configuration Parameters

    For information about using pipelines, refer to the Illumina Connected Analytics documentation

    Sample Sheets

    Overview

    A sample sheet is required for each analysis with the pipeline. A sample sheet is a comma-separated value (*.csv) file format used by Illumina instruments, platforms, and analysis pipelines to store settings and data for sequencing and analysis. The pipeline is compatible with the sample sheet v2. For general information on the sample sheet v2, refer to Illumina Connected Software Sample Sheet.

    A full sample sheet includes multiple sections, including a [BCLConvert_Settings] section with a list of samples and their index sequences, along with additional information required to run the pipeline in the [{app}_Data] section. For example, the Library Prep Kit is a required field in the sample sheet for the DRAGEN Heme WGS Tumor Only Pipeline. Both Illumina library prep kits or custom library prep kits are supported.

    On the other hand, the DRAGEN Solid WTS Tumor Normal Pipeline may only required a minimal sample sheet with only [Header] section and a [TN_Data] section when starting the analysis from FASTQ. This partial sample sheet is not valid when starting analysis from a run folder.

    When running analysis on a standalone DRAGEN server or ICA, a valid sample sheet can be created by:

    • BaseSpace Run Planner (preferred), see for details.

    • Downloading and modifying a sample sheet template following the requirements, see for details.

    When running analysis on a standalone DRAGEN server or on ICA, a minimal sample sheet for starting from FASTQ, BAM or CRAM can be created by:

    • Modify a sample sheet template following the requirements, see product specific templates for more information.

    Note: A minimal sample sheet may be invalid for other purposes. It is always advisable to use a valid sample sheet generated from the BaseSpace Run Planner.

    The Run Planning section of this guide is available for specific instructions to plan a run and set up a valid sample sheet for the pipeline when supported.

    New Sample Sheet options available in DRAGEN 4.4+ release

    Forward orientation for index2

    With v2 sample sheet, and DRAGEN 4.4+, it is now required for users to specify index2 orientation in forward orientation only. For additional information, see .

    [BCLConvert_Settings]
    Required
    Description

    Summary of Valid Settings for Index Orientation

    As indicated in the following Table, the index2 orientation is always Forward orientation for simplicity. The two new flags introduced are especially useful when custom LPKs are used and when a consistent index2 orientation is desired for all run folders. The IndexOrientation field is present from BaseSpace run planner generated sample sheet, and indicates that the sample sheet index2/i5 sequences are in Forward orientation.

    Look up table for index2 orientations in DRAGEN 4.4+

    • Bcl-convert SoftwareVersion must be >=4.4.

    • * indicates the situation where the IsReverseComplement flag in the RunInfo.xml is overriden by the RunInfoIndex2ReverseComplement value. NA means that IsReverseComplement flag for the index2 is not present in the RunInfo.xml file.

    • ** indicates that legacy run folders may use the two paired flags to ensure that index2 Forward orientation is consistently applied.

    Instrument Type
    IndexOrientation
    RunInfoIndex2ReverseComplement
    Index2ColumnReverseComplement
    IsReverseComplement
    Index2 Orientation
    Condition

    Summary of Legacy Settings for index2 orientations

    For backward compatibility, when the bcl-convert version specified is less than 4.4, the index2 orientation may vary depending on the instrument. In BaseSpaces run planner generated sample sheet, the IndexOrientation may still indicate Forward, but it is ignored in this situation.

    Look up table for index2 orientations in earlier DRAGEN versions

    • Bcl-convert SoftwareVersion must be <4.4.

    • *indicates the situation where the IsReverseComplement flag in the RunInfo.xml is different depending on the control software version.

    Instrument Type
    IsReverseComplement
    Index2 Orientation
    Condition

    DRAGEN Server App

    Installation Procedure on DRAGEN Server

    Downloader

    A separate lightweight downloader for Windows, macOS, and Linux operating systems is available at the DRAGEN Installer Download Site.

    Choose the downloader appropriate for your platform, when executed it will prompt you to provide a path to download the assets to. The required software packages will be downloaded into the dragen_pipelines directory under the path provided at the prompt. If the path provided was used for a previous execution of the downloader, any incomplete downloads will be resumed, existing files will be checksummed, and any files with invalid checksums will be re-downloaded.

    The downloaded directory content may be moved to the installation target DRAGEN server using a USB key with at least 128 GB of free space or by copying to Network Storage which is reachable from the target DRAGEN Server.

    Additional download information is available at the download site.

    Downloader System Requirements

    Downloader Name
    System Requirements

    Expected downloaded content

    • 📂 dragen_pipelines

      • dragen-app-manager-1.0.14-1.x86_64-el8-offline.run

      • README

      • 📂 Solid_WGS_TN_4.4.4.53

    Installer

    Installation Requirements

    DRAGEN and DRAGEN Application Manager

    The pipeline requires DRAGEN v4.4.4 or higher. If upon installation of the app this version of DRAGEN (or higher) is not installed, the software shall install this version of DRAGEN.

    The pipeline also requires DRAGEN Application Manager to be installed, and an installer is included. DRAGEN Application Manager configuration is controlled by the config.toml file located in /etc/dragen-app-manager directory. See for additional information.

    Minimum System Operating Requirements

    Hardware

    • v3 DRAGEN server or v4 DRAGEN server

    • mkfifo is enabled on the network-attached storage (NAS).

    Software

    The software installed by default on the DRAGEN server includes the following items:

    • DRAGEN server software. Refer to sample sheet settings for the DRAGEN version number.

    • Oracle Linux 8

    Storage

    • DRAGEN server v3 provides a 6.4 TB NVMe SSD. This SSD is located at the /staging directory and is suitable for storing only one or two runs of the analysis pipeline.

    • DRAGEN server v4 provides 12.8 TB via a 2 x 6.4 TB NVMe U.2 SSD configuration.

    • Consider the following when making data storage decisions.

      • A NovaSeq 6000 sequencing run that uses an S4 flow cell can produce up to 3 TB of output. ▫ The pipeline can produce an additional 4-6 TB of analysis output. For optimal performance when writing to a non-default directory, specify an analysis folder location on /staging, this ensures that the DRAGEN-related processes read and write data to the DRAGEN Server's high-speed NVMe SSD.

    Installation Instructions

    • Installing the pipeline requires root privileges.

    • Contact Illumina Customer Care to request a link to the Downloader or visit and confirm that the Genome DRAGEN license is enabled for your server.

      • Follow the instructions for DRAGEN license installation provided by Illumina Customer Care or refer to the DRAGEN server documentation.

    • Copy the directory structure from the downloader directory to the target DRAGEN server (or a path accessible with sudo privileges)

    Run Self-Test Script

    The self-test script, present after app installation, checks the following functions:

    • All required services are running.

    • All resources are in place.

    • The analysis workflow image can be launched.

    • The pipeline can run successfully on a test dataset.

    To run the self-test script, execute:

    If the self-test prints a failure message, contact Illumina Technical Support, and provide the output file found in /staging/check_Solid_WGS_TN_{version}_{datetimestamp}.tgz.

    When running an analysis on the DRAGEN server via SSH, Illumina recommends that you use a terminal multiplexer utility, which allows you to resume analysis in the event of a disconnection from the DRAGEN server.

    Uninstall pipeline

    To uninstall the pipeline, run the following command as the root user (or with sudo privileges):

    Executing the uninstall script removes the following assets:

    • All scripts, including:

      • run_Solid_WGS_TN_{version}.sh

      • check_Solid_WGS_TN_{version}.sh

      • uninstall_Solid_WGS_TN_{version}.sh

    If the uninstall script is executed with the -r or --removeResources flag, dependencies of the application being uninstalled will be removed if no other applications depend on them.

    You are not required to uninstall DRAGEN Application Manager, Docker, or the DRAGEN server software.

    To remove Docker, review the install instructions for your operating system in the Docker documentation

    Custom Config Support

    Local App Setup

    Overview

    This document describes how to use the Custom Configuration Support feature for the pipeline software. This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.

    Customization with customConfig and customResourceDir

    Users can customize pipeline behavior and file inputs using:

    • --customConfig : path to a custom configuration file listing customized parameter values.

    • --customResourceDir : path to a directory containing custom resource files.

    Both options should be used together if file-based overrides are required.

    Important note for using File Parameters

    • For file parameters (parameters that require a file), users must specify relative paths in the customConfig file. The software will join customResourceDir and the relative path to form the full file path.

    • Additionally, the value assigned to a file parameter must be enclosed in single quotes ('').

    Examples

    Command Line

    heme_custom_param.config Content

    custom_resources_Heme_dir Folder Structure

    customConfig Template (with default value)

    Supported Parameters

    Display Name
    Parameter Name
    Component
    Allowed Values
    Default Value
    Optional

    ℹ️ Note: For CRAM Input Reference Genome, a list of commonly-used human reference FASTA files can be downloaded from the Illumina support site:

    5 Base DNA Germline Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    Launching Analysis

    UI Options

    Manual Launch of pipeline Analysis Software Analysis

    To manually launch an analysis, configure a pipeline Analysis Software pipeline analysis run in ICA with the following parameters.

    General

    Parameter Name
    Description
    Required
    Default

    Input Files

    Parameter Name
    Description
    Required
    Default

    Settings

    Parameter Name
    Description
    Required
    Default

    Resources

    Parameter Name
    Description
    Required
    Default

    Other available options for the storage size:

    • "1.2TB" if option selected is Small

    • "2.4TB" if option selected is Medium

    • "7.2TB" if option selected is Large

    • "16TB" if option selected is XLarge

    Note: It is recommended to reserve storage size twice the size of the BCL run folder, or the input fastq.gz or bam files, four times the size of the cram file (cram is 30-70% of the bam), and 8 times the size of the fastq.ora (fastq.ora is about 25% of fastq.gz).

    For information about using pipelines, refer to the Illumina Connected Analytics documentation.

    RNA Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN RNA/scRNA runs, it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    RNA Variant Calling

    Option
    Description

    RNA Quant

    Option
    Description

    RNA Splice

    Option
    Description

    RNA Fusion

    Option
    Description

    RNA Amplicon

    To enable RNA amplicon, set:

    • --enable-rna-amplicon true, and

    • --amplicon-target-bed $PATH.

    If RNA amplicon mode is enabled and the amplicon bed file already includes the gene name, then it is not required to set the ENRICH options option, since DRAGEN will read the enriched genes names from the amplicon BED file (fifth column).

    Sample Sheet Creation in BaseSpace

    How to Create Sample Sheets in BaseSpace Run Planning tool

    The BaseSpace Sequence Hub Run Planning tool is available, and is used to generate a valid sample sheet in v2 format for use on a supported sequencer for both ICA and Standalone DRAGEN Server analysis options. Filling out the form on the user interface will produce a exportable sample sheet with the required fields filled in. Refer to for descriptions of fields that appear in ICA sample sheets.

    The sections below represent each step in the BaseSpace Run Planning tool.

    Note that NovaSeq X Series has a different run set up configuration screen than other instrument platforms. The software supports multi analysis, and in order to complete run setup on NovaSeq X Series, enter the appropriate Read 1, Read 2, Index 1 and Index 2 described in the instructions below.

    5 Base DNA Germline WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    DNA Germline Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-rna true 
    --annotation-file $GTF                  #GTF or GFF3 format 
    --enable-map-align true                 #required for RNA/scRNA 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # RNA Quantification 
    --enable-rna-quantification true 
    --rna-library-type A                    #see 'RNA Quant' 
    --rna-quantification-gc-bias true 
    # RNA Splice Variants 
    --enable-rna-splice-variant true 
    --rna-splice-variant-regions $PATH 
    # RNA Gene Fusions 
    --enable-rna-gene-fusion true 
    --rna-gf-enriched-regions $PATH         #see 'RNA Fusion' 

    📂 FastqGeneration (only when started from BCLs)

  • 📂 FastqValidation

  • 📂 DragenCaller

  • 📂 AdditionalSarjMetrics

  • 📂 SampleAnalysisResults

  • 📂 MetricsOutput

  • 📂 DragenSvExtraFilters

  • 📄 passing_sample_steps.json

  • Bases with a Phred quality score of 30 or higher excluding uplicated reads and clipped bases

    AVERAGE_AUTOSOMAL_COVERAGE_OVER_GENOME (count)

    NA

    Average coverage or sequencing depth across the autosomes (chromosomes 1-22)

    GC_NORMALIZED_COVERAGE_AT_GCS_20_39 (count)

    NA

    Normalized sequencing coverage in genomic regioins with GC content between 20% and 39%

    GC_NORMALIZED_COVERAGE_AT_GCS_60_79 (count)

    NA

    Normalized sequencing coverage in genomic regioins with GC content between 60% and 79%

    Quick Start
    Quick Start

    No user selected

    Output Folder

    The path to the analysis output folder.

    No

    Project output folder

    No file selected

    Custom Resources Directory

    The custom resoruces directory used for the analysis.

    No

    No folder selected

    true

    Enable Post Processing

    Use the post-processing scripts at the end of the pipeline analysis.

    No

    false

    Sample IDs

    Optional subset of Sample IDs or Pair IDs to analyze. A comma-separated list.

    No

    Empty

    Forward

    Y**

    N**

    NA

    Forward

    When SbsConsumableVersion >=3

    NovaSeq 6000Dx

    Forward

    Y

    N

    Y

    Forward

    When non-SP flow cell is used

    Forward

    Y

    N

    N*

    Forward

    When SP flow cell is used and control software is <2.4

    NovaSeq X

    Forward

    Y

    N

    N

    Forward

    When non-SP flow cell is used

    Y*

    Forward

    When SP flow cell is used and control software is >2.4

    N*

    Reverse

    When SP flow cell is used and control software is <2.4

    NovaSeq X

    Y

    Forward

    SoftwareVersion

    Required

    if SoftwareVersion >=4.4, index2 orientation must be forward; Otherwise, legacy behavior is supported

    RunInfoIndex2ReverseComplement

    Optional

    Allowed values Y/N. if SoftwareVersion >=4.4; paired presence required with Index2ColumnReverseComplement. This value overrides the RunInfo.xml isReverseComplement = Y/N flag for index2 orientation in case of conflict.

    Index2ColumnReverseComplement

    Optional

    Allowed Values Y/N. If softwareVersion >=4.4; paired presence required with RunInfoIndex2ReverseComplement. This value indicates whether the index2 column sequence is reverse complement or not.

    NovaSeq 6000

    Forward

    N**

    N**

    NA

    Forward

    When SbsConsumableVersion <3

    NovaSeq 6000

    NA

    Forward

    When SbsConsumableVersion <3

    NA

    Reverse

    When SbsConsumableVersion >=3

    NovaSeq 6000Dx

    Y

    BaseSpace Run Planner
    Requirements
    Index Orientation Guide

    Forward

    Output Folder

    The path to the analysis output folder.

    No

    Project output folder

    Custom Resources Directory

    The custom resoruces directory used for the analysis.

    No

    No folder selected

    Enable Post Processing

    Use the post-processing scripts at the end of the pipeline analysis.

    No

    false

    Sample IDs

    Optional subset of Sample IDs or Pair IDs to analyze. A comma-separated list.

    No

    Empty

    "32TB" if option selected is 2XLarge
  • "64TB" if option selected is 3XLarge

  • User Reference

    The custom name of the analysis for later identification.

    Yes

    Empty

    User Tags

    Tags for the analysis to help with categorization and identificaion, enhancing organization and searchability.

    No

    Empty

    Notification

    Add a user to be notified when the analysis completes.

    No

    Samplesheet

    The SampleSheet.csv for the analysis

    Yes

    SampleSheet.csv in Input Folder

    Input Directory

    The input folder that contains [bcl, fastq, bam, cram] files to analyze. Multiple input [fastq, bam, cram] folders can be specified.

    Yes

    No folder selected

    Custom Parameters Config File

    The custom parameters config file for the analysis.

    No

    Input Type

    The type of files in the Input Folder(s): bcl, fastq, bam, cram.

    Yes

    bcl

    Reference Genome

    The reference genome used for the analysis: [hs37d5_chr, hg38].

    Yes

    hg38

    Enable Ora Compression

    Compress fastq files using ora compression. [Only applies when Input Type is bcl].

    No

    Storage Size

    The storage size to allocate for the analysis. The minimum required value is Large.

    Yes

    Large

    No user selected

    No file selected

    true

  • install_Solid_WGS_TN_v4.4.4.53.run

  • Solid_WGS_TN_4.4.4.53.iapp

  • README

  • 📂 common

    • solid-wgs-tn-resources_4.4.4.2.ires

    • dpf-core_1.0.0.36.ires

    • dpf-templates_4.4.4.52.ires

    • dpf-docker-images_4.4.4.52.ires

    • dragen-4.4.4-12.multi.el8.x86_64.run

    • hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11-r5.0-1.ires

    • hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires

    • hs37d5_chr-cnv.graph.hla.methyl_cg.rna-11-r5.0-1.ires

    • hs37d5_chr-cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires

    • variant_annotation_data-tmb_annotations-4.4.4-1.ires

  • Network-attached storage is required for long-term storage of sequencing runs and pipeline output.

  • Managing data storage is your responsibility.

    • Illumina recommends developing a strategy to copy data from the DRAGEN server to network-attached storage.

    • Delete output data on the DRAGEN server as soon as possible. For additional information on data output and storage, refer to Illumina Instrument Control Computer Security and Networking.

  • Ensure the installer has the correct privileges by running chmod +x install_Solid_WGS_TN_v{version}.run

  • Launch the installer with root privileges sudo /path/to/install_Solid_WGS_TN_v{version}.run

    • If DRAGEN Application Manager is not already installed, the installer will exit and direct you to the path to the DRAGEN Application Manager installer

  • The application installed under DRAGEN Application Manager

    Solid_WGS_TN_{version}_Downloader_unix

    x86_64 platform with glibc 2.25+

    Solid_WGS_TN_{version}_Downloader_mac

    arm64 macOS

    Solid_WGS_TN_{version}_Downloader_windows.exe

    64-bit Windows 10+

    DRAGEN Resource Files
    DRAGEN Application Manager
    DRAGEN Installer Download Site

    included

    Yes

    CRAM Input Reference Genome

    cram_reference

    Mapper

    file

    included

    Yes

    Aligner Clip Paired End Reads Overhang

    aligner_clip_pe_overhang

    Mapper

    0,1,2

    0

    Yes

    Enable Map Align

    enable_map_align

    Mapper

    true / false

    true

    Yes

    SV Somatic Hotspot BED File

    sv_somatic_ins_tandup_hotspot_regions_bed

    Structural VC

    file

    included

    Yes

    SV Systematic Noise File

    sv_systematic_noise

    Structural VC

    file

    included

    Yes

    Output SNV Evidence BAM

    vc_output_evidence_bam

    Debug

    true / false

    false

    Yes

    QC Detect Contamination

    qc_detect_contamination

    QC

    true / false

    true

    Yes

    VC Systematic Noise File

    vc_systematic_noise

    Variant Caller

    file

    included

    Yes

    VC Somatic Hotspots File

    vc_somatic_hotspots

    Variant Caller

    Illumina DRAGEN Product Files

    file

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    Product Files
    BCL conversion
    5-Base Pipeline
    Somatic Mode
    Nirvana

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    --vc-target-bed $PATH

    Restrict the variants called to a target bed. For WTS, a bed file specifying the gene-coding regions should be provided to avoid calling erroneous variants in non-coding regions due to noisy reads.

    --rna-library-type

    Set the library according to the read orientations. Set to 'A' to auto detect the correct read orientation. Alternatively select 'IU', 'ISR', 'ISF', 'U', 'SR', or 'SF'.

    --rna-splice-variant-normals $PATH

    Optional setting list of normal splice variants that will be used filter false positive calls. The file should be a tab separated file with the following first four columns: (1) contig name, (2) first base of the splice junction (1-based), (3) last base of the splice junction (1-based), (4) strand (0: undefined, 1: +, 2: -).

    --rna-splice-variant-regions $PATH

    Target region bed file. Required for panels. The name of the region must be specified in the fourth column.

    --rna-gf-enriched-regions $PATH

    For panels, the list of enriched genes should be set, either as a list of genes or a list of regions in BED format.

    Product Files
    BCL conversion
    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    SNV

    DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

    Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    For more information, see CNV Calling.

    In-run PON

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    Step 1. Generate CNV target counts of individual samples from the sequencing run.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    /usr/local/bin/check_Solid_WGS_TN_{version}.sh
    /usr/local/bin/uninstall_Solid_WGS_TN_{version}..sh
    run_Heme_WGS_TO_{version}.sh \
      --inputType bcl \
      --inputFolder /heme_input_bcl \
      --customConfig /path/heme_custom_param.config \
      --customResourceDir custom_resources_Heme_dir
    # custom parameters
    vc_output_evidence_bam = false
    qc_detect_contamination = true
    aligner_clip_pe_overhang = 0
    
    # custom reference files
    vc_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
    sv_systematic_noise = '/sv/WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz'
    vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'
    custom_resources_Heme/
    ├── snv
    │   ├── WGS_hg38_v1.0_systematic_noise.snv.bed.gz
    │   └── somatic_hotspots_GRCh38.vcf.gz
    └── sv
        └── WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz
    #vc_systematic_noise = ''
    #enable_map_align = true
    #sv_systematic_noise = ''
    #vc_output_evidence_bam = false
    #qc_detect_contamination = true
    #vc_somatic_hotspots = ''
    #sv_somatic_ins_tandup_hotspot_regions_bed = ''
    #cram_reference = ''
    #aligner_clip_pe_overhang = 0
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON. See 'In-run PON' section below. 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
    Step 1: Run Settings
    Parameter Name
    Required
    Description

    Run Name

    Required

    Run Name can contain 255 alphanumeric characters, dashes, underscores, periods, and spaces; and must start with an alphanumeric, a dash or an underscore.

    Run Description

    Optional

    Run Description can contain 255 characters except square brackets, asterisks, and commas.

    Instrument Platform

    Required

    Choose from supported instruments:

    • NovaSeq 6000/6000Dx

    • NovaSeq X Series

    Secondary Analysis

    Required

    Step 2: Configuration

    Note: On NovaSeq X Series, this page is called "Configuration 1". The right hand corner of the UI displays the Read 1, Read 2, Index 1 and Index 2 entered on the previous run settings screen.

    Parameter Name
    Required
    Description

    Application*

    Required

    the pipeline name

    Description

    Optional

    Optional text field

    Library Prep Kit

    Required

    - Illumina DNA Prep Kit (IDP)

    Required

    Step 3: Sample Settings

    Users can manually enter sample information, or download a template file to bulk upload sample information. Users can import the completed template or a compatible sample sheet.

    Parameter Name
    Required
    Description

    Read Lengths: Read 1 and Read 2

    Required Not applicable on NovaSeq X Series

    Auto filled with the standard values, but can be optionally overwritten.

    Override Cycles

    Required on NovaSeq X Series

    Entered based on Run Settings read lengths & index 1 / index 2

    Lane Usage

    Not applicable on NovaSeq X Series or NextSeq 1000 / 2000

    Checkbox allows users to apply the same lane across samples.

    Lane

    Required if Lane Usage is unchecked Not applicable on NextSeq 1000 / 2000

    Step 4: Run Review

    Once all details are captured and pass validation, the user can review the details on the Run Review screen. From here they can choose to edit details in previous screens or export the sample sheet. Once completed, press the Cancel button to finish run planning.

    Note: once leaving this screen, the run and sample sheet will not be accessible.

    For NovaSeqX Plus users, the run can be saved as a draft or as a planned run (via “Save as Draft” and “Save as Planned” buttons respectively). Either selection will save the run to the Planned Runs screen on BaseSpace. There is no option to export the sample sheet on this screen.

    Planned Runs Screen (NovaSeq X Series only)

    The Planned Runs screen lists all planned or drafted runs. Users can set drafted runs to planned, export the sample sheet, and edit or delete a run on this screen.

    Once the run is saved as Planned, it will appear on the NovaSeq X Series instrument where it can be selected for sequencing.

    For more information on run planning, refer to the BaseSpace Sequence Hub support site page.

    Guided Examples based on TSO 500

    Please review these guided examples of TSO 500 analysis workflows that include a step of setting up a run in BaseSpace Run Planning tool:

    • NovaSeq 6000Dx: TSO 500 pipeline Auto-launch Analysis in Cloud

    • NextSeq 500/550Dx: TSO 500 pipeline and Connected Insights Auto-launch Analysis in Cloud

    ICA Auto-launch Sample Sheet Requirements
    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    5-Base Methylation

    Option
    Description

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    For more information see: 5-Base Pipeline.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    For more information, see CNV Calling.

    DNA Germline Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    SNV

    DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

    Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    In-run PON

    For CNV PON requirements and generation options see .

    Step 1. Generate CNV target counts of individual samples from the sequencing run.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    DRAGEN Server App

    Installation Procedure on DRAGEN Server

    Downloader

    A separate lightweight downloader for Windows, macOS, and Linux operating systems is available at the DRAGEN Installer Download Site.

    Choose the downloader appropriate for your platform, when executed it will prompt you to provide a path to download the assets to. The required software packages will be downloaded into the dragen_pipelines directory under the path provided at the prompt. If the path provided was used for a previous execution of the downloader, any incomplete downloads will be resumed, existing files will be checksummed, and any files with invalid checksums will be re-downloaded.

    The downloaded directory content may be moved to the installation target DRAGEN server using a USB key with at least 128 GB of free space or by copying to Network Storage which is reachable from the target DRAGEN Server.

    Additional download information is available at the download site.

    Downloader System Requirements

    Downloader Name
    System Requirements

    Expected downloaded content

    • 📂 dragen_pipelines

      • dragen-app-manager-1.0.14-1.x86_64-el8-offline.run

      • README

      • 📂 Heme_WGS_TO_4.4.4.62

    Installer

    Installation Requirements

    DRAGEN and DRAGEN Application Manager

    The pipeline requires DRAGEN v4.4.4 or higher. If upon installation of the app this version of DRAGEN (or higher) is not installed, the software shall install this version of DRAGEN.

    The pipeline also requires DRAGEN Application Manager to be installed, and an installer is included. DRAGEN Application Manager configuration is controlled by the config.toml file located in /etc/dragen-app-manager directory. See for additional information.

    Minimum System Operating Requirements

    Hardware

    • v3 DRAGEN server or v4 DRAGEN server

    • mkfifo is enabled on the network-attached storage (NAS).

    Software

    The software installed by default on the DRAGEN server includes the following items:

    • DRAGEN server software. Refer to sample sheet settings for the DRAGEN version number.

    • Oracle Linux 8

    Storage

    • DRAGEN server v3 provides a 6.4 TB NVMe SSD. This SSD is located at the /staging directory and is suitable for storing only one or two runs of the analysis pipeline.

    • DRAGEN server v4 provides 12.8 TB via a 2 x 6.4 TB NVMe U.2 SSD configuration.

    • Consider the following when making data storage decisions.

      • A NovaSeq 6000 sequencing run that uses an S4 flow cell can produce up to 3 TB of output. ▫ The Heme pipeline can produce an additional 4-6 TB of analysis output. For optimal performance when writing to a non-default directory, specify an analysis folder location on /staging, this ensures that the DRAGEN-related processes read and write data to the DRAGEN Server's high-speed NVMe SSD.

    Installation Instructions

    • Installing the Heme pipeline requires root privileges.

    • Contact Illumina Customer Care to request a link to the Downloader or visit and confirm that the Genome DRAGEN license is enabled for your server.

      • Follow the instructions for DRAGEN license installation provided by Illumina Customer Care or refer to the DRAGEN server documentation.

    • Copy the directory structure from the downloader directory to the target DRAGEN server (or a path accessible with sudo privileges)

    Run Self-Test Script

    The self-test script, present after app installation, checks the following functions:

    • All required services are running.

    • All resources are in place.

    • The analysis workflow image can be launched.

    • The Heme pipeline can run successfully on a test dataset.

    To run the self-test script, execute:

    The following output will show if installation is completed successfully.

    If the self-test prints a failure message, contact Illumina Technical Support, and provide the output file found in /staging/check_Heme_WGS_TO_{timestamp}.tgz.

    When running an analysis on the DRAGEN server via SSH, Illumina recommends that you use a terminal multiplexer utility, which allows you to resume analysis in the event of a disconnection from the DRAGEN server.

    Uninstall Heme pipeline

    To uninstall the Heme pipeline, run the following command:

    Executing the uninstall script removes the following assets:

    • All scripts, including:

      • run_Heme_WGS_TO_{version}.sh

      • check_Heme_WGS_TO_{version}.sh

      • uninstall_Heme_WGS_TO_{version}.sh

    If the uninstall script is executed with the -r or --removeResources flag, dependencies of the application being uninstalled will be removed if no other applications depend on them.

    You are not required to uninstall DRAGEN Application Manager, Docker, or the DRAGEN server software.

    To remove Docker, review the install instructions for your operating system in the Docker documentation

    DNA Germline WGS UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    SNV

    DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

    Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    # Multi-Region Joint Detection (MRJD)

    Option
    Description

    For futher details refer to .

    DNA Germline WES

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    SNV

    DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

    Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    In-run PON

    For CNV PON requirements and generation options see .

    For Targeted Caller PON requirements and generation options see .

    CNV and Targeted Caller require separate PON files, but the intermediate counts files can be generated in the same DRAGEN command line invocation. Follow the steps below to generate the CNV and Targeted Caller PON files. Note that Targeted Caller is only supported with the Illumina CS/PGx Custom Enrichment Research Panel.

    Step 1. Generate CNV target counts and Targeted exome counts of individual samples from the sequencing run.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    Step 3. Targeted Caller PON file generation.

    $TARGETED_PON_COUNTS_LIST is a text file with one line for each path to a Targeted Caller exome counts file generated in step 1 (<output-file-prefix>.targeted.exome.counts.json.gz). Individual exome counts files are merged into a single <output-file-prefix>.targeted.pon.json.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN Targeted Caller using the --targeted-pon option.

    Targeted Caller

    A systematic noise file corresponding to one of the pre-built pangenome references can be downloaded from the [DRAGEN Software Support Site page]https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).

    5 Base DNA Germline WGS UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    CNV

    Option
    Description

    For more information, see .

    DNA Germline WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    SNV

    DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

    Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    # Multi-Region Joint Detection (MRJD)

    Option
    Description

    For futher details refer to .

    5 Base DNA Germline Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    Sample Sheet Requirements

    The pipeline has fields that are required in addition to general sample sheet requirements. Follow the steps below to create a valid samplesheet.

    Standard Sample Sheet Requirements

    The following sample sheet requirements describe required and optional fields for the pipeline. Depending on the deployment (standalone DRAGEN server, ICA with auto-launch, ICA with manual launch), certain sections and required values can deviate from the standard requirements. These deviations are noted in the information below.

    The analysis fails if the sample sheet requirements are not met.

    Use the following steps to create a valid sample sheet.

      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    # CNV 
    --enable-cnv true 
    --cnv-enable-self-normalization true 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON. See 'In-run PON' section below. 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF  #optional to enable germline ASCN 
    --cnv-enable-self-normalization true 
    # HLA genotyper 
    --enable-hla true 
    # Targeted caller 
    --enable-targeted true 
    # Star allele 
    --enable-star-allele true 
    # PGX 
    --enable-pgx true                       #PGX 
    # Short tandem repeats 
    --repeat-genotype-enable true 
    # Multi-Region Joint Detection (MRJD) 
    --enable-mrjd true 
    --mrjd-enable-high-sensitivity-mode true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON. See 'In-run PON' section below. 
    # HLA genotyper 
    --enable-hla true 
    # Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel) 
    --enable-targeted true 
    --targeted-pon $PATH                    #Targeted PON. See 'In-run PON' section below. 
    --targeted-systematic-noise $PATH       #Targeted systematic noise file 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-min-supporting-reads 1            #Default=2 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    # CNV 
    --enable-cnv true 
    --cnv-enable-self-normalization true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF  #optional to enable germline ASCN 
    --cnv-enable-self-normalization true 
    # HLA genotyper 
    --enable-hla true 
    # Targeted caller 
    --enable-targeted true 
    # Star allele 
    --enable-star-allele true 
    # PGX 
    --enable-pgx true                       #PGX 
    # Short tandem repeats 
    --repeat-genotype-enable true 
    # Multi-Region Joint Detection (MRJD) 
    --enable-mrjd true 
    --mrjd-enable-high-sensitivity-mode true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-min-supporting-reads 1            #Default=2 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    • BaseSpace/Illumina Connected Analytics (to generate sample sheet for cloud analysis)

    • Local

    Read 1

    Required on Instrument Platform NovaSeq X Series

    Fill with value 151 or custom values

    Index 1

    Required on Instrument Platform NovaSeq X Series

    Fill the value depending on the Library Prep Kit used: 10

    Index 2

    Required on Instrument Platform NovaSeq X Series

    Fill the value depending on the Library Prep Kit used: 10

    Read 2

    Required on Instrument Platform NovaSeq X Series

    Fill with value 151 or custom values

    Sample Container ID

    Optional

    Unique Identifier for the container that holds the sample

    - Illumina DNA PCR Free Prep Kit (IDPFP)

    Index Adapter Kit

    Optional

    - IDT for Illumina DNA/RNA UD Indexes Set A B C D, Tagmentation (both IDP and IDPFP)

    Optional

    - Illumina DNA/RNA UD Indexes Set A B C D, Tagmentation (IDP)

    Specify lanes for each sample. The unmarked checkbox at the top of the dropdown selects all lanes.

    Case ID

    Optional

    The identifier used to pair DNA and RNA samples in a run. The field is mandatory whether a sample is part of a pair, or not.

    To note: The Sample ID field in the generated samplesheet will be auto-filled based on the Pair ID values captured. “_dna” and “_rna” (for DNA and RNA samples respectively) will be appended to the Pair ID value to create the Sample ID.

    Index ID

    Required

    Index set ID options are based on selected Index Adapter Kit

    Project

    Optional

    Optional field to describe the associated project

    Starts from Fastq

    Required

    True or False

    If auto-launching the pipeline from BCL files, set the value to False. If auto-launching the pipeline from FASTQ after auto-launching BCL Convert, set the value to True.

    DNA Barcode Mismatches Index 1**

    DNA Barcode Mismatches Index 2**

    RNA Barcode Mismatches Index 1**

    RNA Barcode Mismatches Index 2**

    Required on NovaSeq X

    Default value is set to 1.

    These fields are required by NovaSeq X and represent BCL Convert settings for index diversity checks when demultiplexing. These values are not used in the pipeline analysis.

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

  • install_Heme_WGS_TO_v4.4.4.62.run

  • Heme_WGS_TO_4.4.4.62.iapp

  • README

  • 📂 common

    • dpf-core_1.0.0.36.ires

    • dpf-templates_4.4.4.52.ires

    • dpf-docker-images_4.4.4.52.ires

    • dragen-4.4.4-12.multi.el8.x86_64.run

    • heme_wgs_to_resources_4.4.4.2.ires

    • hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires

    • hs37d5_chr-cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires

    • variant_annotation_data-tmb_annotations-4.4.4-1.ires

  • Network-attached storage is required for long-term storage of sequencing runs and Heme pipeline output.

  • Managing data storage is your responsibility.

    • Illumina recommends developing a strategy to copy data from the DRAGEN server to network-attached storage.

    • Delete output data on the DRAGEN server as soon as possible. For additional information on data output and storage, refer to Illumina Instrument Control Computer Security and Networking.

  • Ensure the installer has the correct privileges by running chmod +x install_Heme_WGS_TO_v{version}.run

  • Launch the installer with root privileges sudo /path/to/install_Heme_WGS_TO_v{version}.run

    • If DRAGEN Application Manager is not already installed, the installer will exit and direct you to the path to the DRAGEN Application Manager installer

  • The application installed under DRAGEN Application Manager

    Heme_WGS_TO_{version}_Downloader_unix

    x86_64 platform with glibc 2.25+

    Heme_WGS_TO_{version}_Downloader_mac

    arm64 macOS

    Heme_WGS_TO_{version}_Downloader_windows.exe

    64-bit Windows 10+

    DRAGEN Resource Files
    DRAGEN Application Manager
    DRAGEN Installer Download Site

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    Product Files
    BCL conversion
    Somatic Mode
    Nirvana
    CNV Calling
    CNV Preprocessing | Panel of Normals
    Targeted Caller | Exome calling

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    Product Files
    BCL conversion
    UMI Options
    5-Base Pipeline
    Somatic Mode
    Nirvana
    CNV Calling

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --cnv-enable-cyto-output true

    Enable Cytogenetics-compatible output (default true), see Cytogenetics Modality. Only available with the Germline ASCN caller.

    --cnv-enable-mosaic-calling true

    Enable MOSAIC-calling mode (default true). Only available with the Germline ASCN caller.

    --enable-mrjd

    If set to true, MRJD is enabled for the DRAGEN pipeline.

    --mrjd-enable-high-sensitivity-mode

    If set to true, MRJD high sensitivity mode is enabled for the DRAGEN pipeline. See the MRJD section in the user guide for information on variant types reported in MRJD default mode and high-sensitivity mode (default=false).

    Product Files
    BCL conversion
    Somatic Mode
    Nirvana
    CNV Calling
    MRJD

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    Product Files
    BCL conversion
    UMI Options
    5-Base Pipeline
    Somatic Mode
    Nirvana
    /usr/local/bin/check_Heme_WGS_TO_{version}.sh
    Checking system configuration...OK!
    Now running a test execution of the pipeline.
    This could take up to 15 minutes...
    Verifying analysis output.
    Successfully validated test analysis results.
    SUCCESS!
    DRAGEN Heme WGS Tumor Only Pipeline is correctly configured and ready for use.
    /usr/local/bin/uninstall_Heme_WGS_TO_{version}.sh
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
    # Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel) 
    --targeted-generate-exome-counts true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --targeted-pon-counts-list $TARGETED_PON_COUNTS_LIST 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    Product Files
    BCL conversion
    UMI Options
    Somatic Mode
    Nirvana
    CNV Calling
    CNV Preprocessing | Panel of Normals

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --cnv-enable-cyto-output true

    Enable Cytogenetics-compatible output (default true), see Cytogenetics Modality. Only available with the Germline ASCN caller.

    --cnv-enable-mosaic-calling true

    Enable MOSAIC-calling mode (default true). Only available with the Germline ASCN caller.

    --enable-mrjd

    If set to true, MRJD is enabled for the DRAGEN pipeline.

    --mrjd-enable-high-sensitivity-mode

    If set to true, MRJD high sensitivity mode is enabled for the DRAGEN pipeline. See the MRJD section in the user guide for information on variant types reported in MRJD default mode and high-sensitivity mode (default=false).

    Product Files
    BCL conversion
    UMI Options
    Somatic Mode
    Nirvana
    CNV Calling
    MRJD

    Download the sample sheet v2 template that matches the instrument & assay run.

  • In the Sequencing Settings section, enter the following required parameters:

  • [Sequencing_Settings] Section

    Sample Parameter
    Required
    Details

    LibraryPrepKits

    Required

    Accepted values are: IlluminaDNAPrep or IlluminaDNAPCRFree

    1. In the BCL Convert Settings section, enter the following required parameters:

    [BCLConvert_Settings] Section

    Sample Parameter
    Required
    Details

    SoftwareVersion

    Required

    The DRAGEN component software version. The pipeline requires 4.4.4 or higher. To ensure you are using the latest compatible version, refer to the software release notes.

    AdapterRead1

    Required

    If using 10 bp indexes with UDP: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC Analysis fails if the incorrect adapter sequences are used

    AdapterRead2

    Required

    If using 10 bp indexes with UDP: CTGTCTCTTATACACATCTGACGCTGCCGACGA Analysis fails if the incorrect adapter sequences are used

    AdapterBehavior

    Optional

    1. In the BCL Convert Data section, enter the following parameters for each sample.

    [BCLConvert_Data] Section

    Sample Parameter
    Required
    Details

    Sample_ID

    Required

    Must match a Sample_ID listed in the [Heme_Data] section section.

    Index

    Required

    Index 1 sequence valid for Index_ID assigned to matching Sample_ID in the [Heme_Data] section.

    Index2

    Required

    Index 2 sequence valid for Index_ID assigned to matching Sample_ID in the [Heme_Data] section.

    Lane

    Only for NovaSeq 6000 XP, NovaSeq 6000Dx, or NovaSeq X workflows

    1. In the [Heme_Data] section, enter the following parameters:

    [Heme_Data] Section header changes depending on the deployment: Section header changes depending on the deployment:

    • Standalone DRAGEN Server and ICA with Manual Launch: Heme_Data

    • ICA with Auto-launch: Cloud_Heme_Data

    [Heme_Data] Section

    Sample Parameter
    Required
    Details

    Sample_ID

    Required

    The unique ID to identify a sample. The sample ID is included in the output file names. Sample IDs are not case sensitive. Sample IDs must have the following characteristics: - Unique for the run. - 1–70 characters. - No spaces. - Alphanumeric characters with underscores and dashes. If you use an underscore or dash, enter an alphanumeric character before and after the underscore or dash. eg, Sample1-T5B1_022515. - Cannot be called all, default, none, unknown, undetermined, stats, or reports. - Must match a Sample_ID listed in the [BCLConvert_Data] section. Each sample must have a unique combination of Lane (if applicable), sample ID, and index ID or the analysis will fail.

    Sample_Type

    Optional

    Enter DNA

    Case_ID

    Optional

    A unique ID that links the same biological samples from the same individual. It is used for variant interpretation in downstream software such as the Illumina Connected Insights software

    Sample_Description

    Optional

    To ensure a successful analysis, follow these guidelines:

    1. Avoid any blank lines at the end of the sample sheet; these can cause the analysis to fail.

    2. When running local analysis using the command line save the sample sheet in the sequencing run folder with the default name SampleSheet.csv, or choose a different name and specify the path in the command-line options.

    ICA with Auto-launch: Sample Sheet Requirements

    Refer to the following requirements to create sample sheets for running the analysis on ICA with Auto-launch. For sample sheet requirements common between deployments see Standard Sample Sheet Requirements. Samples sheets can be created using BaseSpace Run Planning Tool or manually by downloading and editing a sample sheet template

    To auto-launch analysis from the sequencer run folder, ensure the StartsFromFastq and SampleSheetRequested fields are set to FALSE. To auto-launch analysis from FASTQs after BCL Convert auto-launch, StartsFromFastq and SampleSheet Requested fields must be set to TRUE

    [Cloud_Heme_Data] Section

    Refer to [Heme_Data] Section for this section's requirements.

    [Cloud_Heme_Settings] Section

    Parameters
    Required
    Details

    SoftwareVersion

    Not Required

    The Heme pipeline software version

    StartsFromFastq

    Required

    Set the value to TRUE or FALSE. To auto-launch from BCL files, set to FALSE. To auto-launch from FASTQ files after auto-launch of BCL Convert, set to TRUE.

    SampleSheetRequested

    Required

    Set the value to TRUE or FALSE. To auto-launch from BCL files, set to FALSE. To auto-launch from FASTQ files after auto-launch of BCL Convert, set to TRUE.

    [Cloud_Data] Section

    Parameters
    Required
    Details

    Sample_ID

    Not Required

    The same sample ID used in the Cloud_HemeS_Data section.

    ProjectName

    Not Required

    The BaseSpace project name.

    LibraryName

    Not Required

    Combination of sample ID and index values in the following format: sampleID_Index_Index2

    LibraryPrepKitName

    Required

    The Library Prep Kit used.

    [Cloud_Settings] Section

    Parameter
    Required
    Details

    GeneratedVersion

    Not Required

    The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.

    CloudWorkflow

    Not Required

    Ica_workflow_1

    Cloud_Heme_Pipeline

    Required

    This value is a universal record number (URN). The valid values are described in the

    BCLConvert_Pipeline

    Required

    The value is a URN in the following format: urn:ilmn:ica:pipeline: <pipeline-ID>#<pipeline-name>

    DNA Germline WES UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    For DRAGEN germline runs, it is recommended to use the pangenome hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    SNV

    DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

    Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    In-run PON

    For CNV PON requirements and generation options see .

    For Targeted Caller PON requirements and generation options see .

    CNV and Targeted Caller require separate PON files, but the intermediate counts files can be generated in the same DRAGEN command line invocation. Follow the steps below to generate the CNV and Targeted Caller PON files. Note that Targeted Caller is only supported with the Illumina CS/PGx Custom Enrichment Research Panel.

    Step 1. Generate CNV target counts and Targeted exome counts of individual samples from the sequencing run.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    Step 3. Targeted Caller PON file generation.

    $TARGETED_PON_COUNTS_LIST is a text file with one line for each path to a Targeted Caller exome counts file generated in step 1 (<output-file-prefix>.targeted.exome.counts.json.gz). Individual exome counts files are merged into a single <output-file-prefix>.targeted.pon.json.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN Targeted Caller using the --targeted-pon option.

    Targeted Caller

    A systematic noise file corresponding to one of the pre-built pangenome references can be downloaded from the [DRAGEN Software Support Site page]https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).

    Analysis Methods

    DNA Analysis Methods

    The software is a DNA only analysis pipeline based on the DRAGEN Secondary Analysis Software. Even though it includes some of the default settings from the DNA Somatic Tumor-Normal Solid WGS DRAGEN recipe, it uses a distinct recipe with different options. A user has the ability to override specific parameters via a custom configuration file.

    Figure 1. DRAGEN Variant Calling Workflow

    The software performs germline variant calling on the normal sample, and reports the following variants:

    • SNV (annotated)

    • CNV (annotated)

    • SV (annotated)

    • Targeted callers (cyp2b6, cyp2d6, cyp21a2, gbna, hba, lpa, rh and smn)

    • Expansion hunter

    • VNTR

    The software perform somatic variant calling on the tumor sample and reports the following variants:

    • SNV (annotated)

    • MNV

    • CNV (annotated, requires germline SNV and CNV VCF)

    • SV (annotated, with variant deduplication)

    An example command is provided that highlights the input and output used in DragenCaller step of the software, which may be found in the DRAGEN run log file. Any parameter options not displayed on the command line would be using the default value for the DRAGEN variant caller module. The detailed parameters and default arguments for the individual modules within the DragenCaller step may be found in the replay.json output. See for detailed explanations of the parameters.

    Reference Genomes

    The pipeline supports two reference genomes for the DRAGEN Map/Aligner - hg38 and hs37d5_chr.

    The hs37d5_chr genome is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.

    DRAGEN Map/Aligner

    involves aligning sequencing reads derived from DNA libraries to a reference genome prior to variant calling.

    The software currently supports both tumor and normal samples with UMI. Please use the to get details on the options.

    DRAGEN continues to use these final alignments as input for various variant calls such as gene amplification (copy number) calling, small variant calling (SNV, indel, MNV, delin), and DNA library quality control.

    Small Variant Calling and Filtering

    DRAGEN supports calling SNVs, indels, MNVs, and delins in tumor-only samples by using mapped and aligned DNA reads from a tumor sample as input. Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. DRAGEN insertions and deletions are validated with lengths of at least 0–25 bp and more than 25 bp can be supported. In addition, DRAGEN also uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp can then be reassembled into complex variants (MNVs and delins). The tumor-only pipeline produces a VCF file containing both germline and somatic variants that can be further analyzed to identify tumor mutations. The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.

    DRAGEN small variant calling includes the following steps:

    1. Detects regions with sufficient read coverage (callable regions).

    2. Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).

    3. Assembles de novograph haplotypes are assembled from reads (haplotype assembly).

    4. Extracts possible somatic or germline calls (events) from column wise pileup analysis.

    Additional information is available at .

    Somatic mode

    The supports both matched tumor-normal pairs and tumor only samples. The germline mode of the small variant caller is used to analyze the normal sample in the matched pair.

    Copy Number Variant Calling

    The DRAGEN copy number variant caller performs amplification, reference, and deletion calling for CNV targets within the assay. It counts the coverage of each target interval on the panel, uses a preprocessed panel of normal samples to normalize target counts, corrects for GC coverage bias, and calculates scores of a CNV event from observed coverage and makes copy number calls.

    Additional information is available at .

    Absolute Copy Numbers (ABCN)

    Absolute copy numbers are calculated by the CNV ASCN Caller. See .

    Loss of Heterozygosity

    See more information available at .

    Structural Variant Calling

    The DRAGEN Structural Variant (SV) Caller is described .

    The DUX4 rearrangement caller is described .

    Variant Deduplication

    The Variant Deduplication is described

    Contamination Detection

    The contamination analysis step detects foreign human DNA contamination using the SNP error file and pileup file that are generated during the small variant calling and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. In contaminated samples, the variant allele frequencies in SNPs shift from the expected values of 0%, 50%, or 100%. The algorithm collects all positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation. The contamination score is the sum of all the log likelihood scores across the predefined SNP positions with minor allele frequency < 25% in the sample and are not likely due to CNV events.

    The larger the contamination score, the more likely there is foreign DNA contamination. A sample is considered to be contaminated if the contamination score is above predefined quality threshold. The contamination score was found to be high in samples with highly rearranged genomes or HRD samples. 1% of HRD samples found to be above the threshold with no evidence for actual contamination.

    Annotation

    The Illumina Annotation Engine performs annotation of small variants, and CNVs. The inputs are gVCF files and the outputs are annotated JSON files.

    The Illumina Annotation Engine processes each variant entry and annotates with available information from databases such as dbSNP, gnomAD genome and exome, 1000 genomes, ClinVar, COSMIC, RefSeq, and Ensembl. The header includes version information and general details. Each annotated variant is included as a nested dictionary structure in separate lines following the header.

    The database content included with Nirvana database is available at the .

    The pipeline currently does not support annotation of gVCF files. Please use the to perform tertiary analysis.

    Biomarkers

    Tumor Mutational Burden

    DRAGEN is used to compute tumor mutational burden (TMB) in coding regions where there is sufficient coverage.

    The following variants are excluded from the TMB calculation:

    • Non-PASS variants.

    • Mitochondrial variants.

    • MNVs.

    • Variants that do not meet a minimum depth threshold.

    Variants with a population allele count ≥ 10 that are observed in either the 1000 Genomes or gnomAD databases are marked as germline. MNVs, which do not count towards TMB, may be marked as germline when all their component small variants are marked as germline. The proxy filter scans the variants surrounding a specific variant and identifies those variants with similar variant allele frequencies (VAF). If the majority of surrounding variants of similar VAF are germline, then the variant is also marked as germline.

    The formula for TMB calculation is:

    Outputs are captured in a .tmb.trace.tsv file that contains information on variants used in the TMB calculation and a .tmb.metrics.json file that contains the TMB score calculation and configuration details.

    Please see the for details about the TMB biomarker analysis.

    Microsatellite Instability Status

    DRAGEN can determine the MSI status of a sample. It uses a normal reference file, which was created from a set of normal samples. During sequencing, normal reference files are generated by tabulating read counts for each microsatellite site. The normal file contains the read count distribution for each microsatellite.

    MSI calling for a tumor-only sample is performed by first tabulating tumor counts from the read alignments for each microsatellite site. Then, the Jensen-Shannon distance (JSD) is calculated between each pair of tumor and normal baseline samples. DRAGEN determines unstable sites by performing Chi-square testing of tumor JSD and normal JSD distributions. Unstable sites are called if the mean distance difference of the two JSD distributions is ≥to the distance threshold and Chi-square p-value is ≤ to the p-value threshold. Lastly, DRAGEN produces an MSI status given assessed site count, unstable site count, the percentage of unstable sites in all assessed sites, and the sum of the Jensen-Shannon distance of all the unstable sites.

    Please see the for details about the MSI biomarker analysis.

    HRD

    Homologous Recombination Deficiency (HRD) score is a whole genome signature measurement of genomic instability. The HRD is composed of the sum of three components: loss of heterozygosity (LOH), telomeric allele imbalance (TAI), and large-scale state transition (LST). A panel of normal samples is used for both bias reduction and normalization prior to HRD score estimation. Final HRD results can be found in the *.hrdscore.csv file.

    Please see the for details about the HRD biomarker analysis.

    HLA Typing

    Please see the for details.

    Targeted Callers

    Please see

    Expansion Hunter

    Please see .

    Variable Number Tandem Repeat (VNTR)

    Please see

    5 Base DNA Somatic Tumor-Only Solid Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    5 Base DNA Somatic Tumor-Normal Solid WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    5 Base DNA Somatic Tumor-Only ctDNA Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    5 Base DNA Somatic Tumor-Only Solid WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    5 Base DNA Somatic Tumor-Normal Solid Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON. See 'In-run PON' section below. 
    # HLA genotyper 
    --enable-hla true 
    # Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel) 
    --enable-targeted true 
    --targeted-pon $PATH                    #Targeted PON. See 'In-run PON' section below. 
    --targeted-systematic-noise $PATH       #Targeted systematic noise file 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-enable-self-normalization true 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    Merge Duplex UMIs
    Merge Duplex UMIs

    Enter trim This indicates that the BCL Convert software trims the specified adapter sequences from each read.

    MinimumTrimmedReadLength

    Optional

    Enter 35. Reads with a length trimmed below this point are masked.

    MaskShortReads

    Optional

    Enter 35. Reads with a length trimmed below this point are masked.

    Indicates which lane corresponds to a given sample. Enter a single numeric value per row. Cannot be empty, i.e the analysis fails if the Lane column is present without a value in each row.

    Sample description must meet the following requirements: - 1–50 characters. - Alphanumeric characters with underscores, dashes and spaces. If you enter a underscore, dash, or space, enter an alphanumeric character before and after. eg, heme-WGS_213.

    IndexAdapterKitName

    Not Required

    The Index Adapter Kit used.

    Release Information
    TMB
  • MSI

  • HRD

  • ASCN

  • LOH

  • DUX4

  • HLA

  • Calibrates read base qualities to account for background noise.

  • Computes read likelihoods for each read/haplotype pair.

  • Performs mutation calling by summing the genotype probabilities across all reads/haplotype pairs.

  • Performs additional filtering to improve variant calling accuracy, including using a systematic noise file. The systematic noise file indicates the statistical probability of noise at specific positions in the genome. This noise file is constructed using clean (normal) samples. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.

  • Variants that do not meet the minimum variant allele threshold.
  • Variants that fall outside the eligible regions.

  • Tumor driver mutations. Variants with a population allele count ≥ 50 are treated as tumor driver mutations. Germline variants are not counted towards TMB. Variants are determined as germline based on a database and a proxy filter.

  • TMB=Filtered VariantsEligible Region Size(Mbp)TMB = {Filtered\ Variants \over Eligible\ Region\ Size (Mbp)}TMB=Eligible Region Size(Mbp)Filtered Variants​
    NonsynonymousTMB=Filtered Nonsynonymous VariantsEligible Region Size(Mbp)Nonsynonymous TMB = {Filtered\ Nonsynonymous\ Variants \over Eligible\ Region\ Size (Mbp)}NonsynonymousTMB=Eligible Region Size(Mbp)Filtered Nonsynonymous Variants​
    DRAGEN Command Line Options
    DNA alignment
    DRAGEN DNA Pipeline UMI
    DRAGEN DNA Pipeline Small Variant Calling
    DRAGEN Somatic Pipeline
    DRAGEN DNA Pipeline Small Variant Calling
    ASCN Caller
    DRAGEN DNA Pipeline - LOH
    here
    here
    here
    Nirvana online documentation
    Illumina Connected Insights
    DRAGEN DNA Pipeline - Biomarkers - TMB
    DRAGEN DNA Pipeline - Biomarkers - MSI
    DRAGEN DNA Pipeline - Biomarkers - HRD
    DRAGEN DNA Pipeline - HLA Typing
    DRAGEN DNA Pipeline - Targeted Callers
    DRAGEN DNA Pipeline - Expansion Hunter
    DRAGEN DNA Pipeline - VNTR
    /opt/edico/bin/dragen \
    --ref-dir /staging/dragen-app-manager/resources/Illumina_hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11_r5.0-1 \
    --output-directory DragenCaller/Sample-001 \
    --output-file-prefix Sample-001 \
    --events-log-file DragenCaller/Sample-001/events.csv \
    --enable-map-align=true \
    --enable-map-align-output=true \
    --enable-variant-caller=true \
    --vc-emit-ref-confidence=GVCF \
    --vc-enable-vcf-output=true \
    --enable-targeted=true \
    --targeted-merge-vc=true \
    --enable-star-allele=true \
    --enable-cnv=true \
    --cnv-enable-self-normalization=true \
    --repeat-genotype-enable=true \
    --enable-sv=true \
    --enable-vntr=true \
    --sv-vntr-merge=false \
    --enable-hla=true \
    --hla-enable-class-2=true \
    --vc-output-evidence-bam=false \
    --qc-detect-contamination=true \
    --qc-coverage-ignore-overlaps=false \
    --logging-to-output-dir=true \
    --max-base-quality=63 \
    --enable-duplicate-marking false \
    --tumor-normal-has-umi both \
    --umi-source qname \
    --umi-library-type nonrandom-duplex \
    --umi-min-supporting-reads 1 \
    --umi-correction-table /staging/dragen-app-manager/resources/Illumina_solid-wgs-tn-resources_4.4.4.2/umi/umi_correction_table.txt.gz \
    --bam-input Sample-001.bam \
    --force 

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    Product Files
    BCL conversion
    UMI Options
    Somatic Mode
    Nirvana
    CNV Calling
    CNV Preprocessing | Panel of Normals
    Targeted Caller | Exome calling

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Product Files
    BCL conversion
    5-Base Pipeline
    Somatic Mode
    Nirvana
    Product Files

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. Germline-aware Mode.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Product Files
    BCL conversion
    5-Base Pipeline
    Somatic Mode
    CNV Calling
    Nirvana
    Product Files
    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    UMI

    Option
    Description

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    For more information see: UMI Options.

    5-Base Methylation

    Option
    Description

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    For more information see: 5-Base Pipeline.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    5-Base Methylation

    Option
    Description

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    For more information see: 5-Base Pipeline.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    5-Base Methylation

    Option
    Description

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    For more information see: 5-Base Pipeline.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    DRAGEN Reference Support

    DRAGEN supports the construction of reference hash tables for both human and non-human reference genomes. The reference autodetect feature of DRAGEN is able to recognize the reference hash tables build on the four Human reference genomes: hg19 (hg19), GRCh37/hs37d5 (hs37d5), GRCh38/hs38d1(hg38), and T2T-CHM13v2.0 (chm13).

    DRAGEN supports pangenome reference hash tables which extend the reference genomes with alternative variant paths from a sample cohort used to construct the pangenome reference. A pangenome-based reference improves the mapping accuracy of Illumina reads in the “Difficult-to-Map Regions” of the genome and the downstream variant calling.

    Pre-built human references are available for download at DRAGEN Software Support Site page.

    The pangenome is the recommended for Germline human analyses. The accuracy achieved with pangenome references are highlighted in the plot below.

    In the following tables we summarize the reference support for each DRAGEN component and the recommended reference type for each component.

    Germline

    Recommended reference type per component

    Component
    Human
    Non-Human

    Reference support

    Component
    Human hg19
    Human hs37d5
    Human hg38
    Human chm13
    Non-Human

    Somatic

    Recommended reference type per component

    Component
    Human
    Non-Human

    Reference support

    Component
    Human hg19
    Human hs37d5
    Human hg38
    Human chm13
    Non-Human

    Methylation

    Recommended reference type per component

    Sample Prep
    Pipeline
    Human
    Non-Human

    Reference support

    Sample Prep
    Pipeline
    Human hg19
    Human hs37d5
    Human hg38
    Human chm13
    Non-Human

    Annotation

    Recommended reference type per component

    Species
    Human
    Non-Human

    Component support

    Component
    Human hg19
    Human hs37d5
    Human hg38
    Human chm13
    Non-Human

    * DRAGEN supports the component execution, however the component's accuracy has not been established.

    By default, DRAGEN will error out if a linear reference is provided when running a component for which a pangemome reference is recommended as listed in the above table. If the user is sure that a linear reference is reference is desired, the error can be suppressed by setting --validate-pangenome-reference=false.

    See for how to build a custom reference genome.

    5 Base DNA Somatic Tumor-Only Solid WGS UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    5 Base DNA Somatic Tumor-Only Solid Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    5-Base Methylation

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --bam-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
    # Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel) 
    --targeted-generate-exome-counts true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN pangenome hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --targeted-pon-counts-list $TARGETED_PON_COUNTS_LIST 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-enable-umi-liquid true             #>= 0.1% VAF 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-enable-self-normalization true 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-min-supporting-reads 1            #Default=2 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-enable-self-normalization true 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-min-supporting-reads 1            #Default=2 
    # 5-Base 
    --methylation-conversion illumina 
    --methylation-generate-cytosine-report true 
    --methylation-compress-cx-report true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-enable-umi-solid true              #>= 1% VAF 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    Merge Duplex UMIs
    Bed File Collection
    Bed File Collection

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 2. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    Merge Duplex UMIs

    Targeted Callers

    Pangenome

    Linear

    RNA

    Linear

    Linear

    De Novo

    Pangenome

    Linear

    Joint Genotyping

    Pangenome

    Linear

    Biomarkers (HLA)

    Pangenome

    Linear

    gVCF genotyper

    Pangenome

    Linear

    Yes*

    Yes

    SV

    Yes

    Yes

    Yes

    Yes*

    Yes

    Expansion Hunter

    Yes

    Yes

    Yes

    No

    No

    Targeted Callers

    Yes

    Yes

    Yes

    No

    No

    RNA

    Yes

    Yes

    Yes

    Yes*

    Yes

    De Novo

    Yes

    Yes

    Yes

    Yes*

    Yes

    Joint Genotyping

    Yes

    Yes

    Yes

    Yes*

    Yes

    Biomarkers (HLA)

    Yes

    Yes

    Yes

    Yes*

    No

    gVCF genotyper

    Yes

    Yes

    Yes

    Yes*

    Yes

    Yes*

    No

    CNV

    Yes

    Yes

    Yes

    Yes*

    No

    SV

    Yes

    Yes

    Yes

    Yes*

    No

    TruSeq Methyl Capture

    Methylation

    Linear

    Linear

    Yes

    Yes

    Yes

    No

    No

    TruSeq DNA Methyl

    Methylation

    Yes

    Yes

    Yes

    No

    No

    TruSeq Methyl Capture

    Methylation

    Yes

    Yes

    Yes

    No

    No

    SNV

    Pangenome

    Linear

    CNV

    Pangenome

    Linear

    SV

    Pangenome

    Linear

    Expansion Hunter

    Pangenome

    Linear

    SNV

    Yes

    Yes

    Yes

    Yes

    Yes

    CNV

    Yes

    Yes

    SNV

    Linear

    Linear

    UMI SNV

    Linear

    Linear

    CNV

    Linear

    Linear

    SV

    Linear

    Linear

    SNV

    Yes

    Yes

    Yes

    Yes*

    No

    UMI SNV

    Yes

    Yes

    5-base

    Germline

    Pangenome

    Linear

    5-base

    Somatic

    Linear

    Linear

    TruSeq DNA Methyl

    Methylation

    Linear

    5-base Germline

    Germline

    Yes

    Yes

    Yes

    No

    No

    5-base Somatic

    Nirvana

    Pangenome

    Linear

    Nirvana

    Yes

    Yes

    Yes

    No

    Yes

    Prepare a Reference Genome

    Yes

    Yes

    Linear

    Somatic

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Product Files
    BCL conversion
    UMI Options
    5-Base Pipeline
    Somatic Mode
    CNV Calling
    Nirvana
    Product Files

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --methylation-compress-cx-report true

    Set to true to enable compression of the CX_report (default=true).

    --methylation-keep-ref-cytosine true

    Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).

    --enable-cpg-methylated-mapping true

    Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.

    --methylation-report-to-vcf

    Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).

    --methylation-report-to-gvcf

    Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --methylation-conversion STRING

    Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).

    --methylation-protocol STRING

    Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.

    --methylation-mapq-threshold INT

    Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).

    --methylation-generate-mbias-report true

    Whether to generate a per-sequencer-cycle methylation bias report (default=true).

    --mbias-report-include-overlaps

    Calculate methylation stats for overlapping bases between mates (default=false).

    --methylation-generate-cytosine-report true

    Whether to generate a genome-wide cytosine methylation CX_report file (default=false).

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Product Files
    BCL conversion
    UMI Options
    5-Base Pipeline
    Somatic Mode
    Nirvana
    Product Files

    DNA Amplicon

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for DNA amplicon samples.

    DRAGEN Standard DNA Amplicon settings

    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
    Bed File Collection
    Bed File Collection
    DRAGEN Amplicon Pillar Panel Specific Settings

    To support the varied designs of amplicon panels and the specific requirements of different analysis types (e.g., SNV, CNV, SV, MSI, RNA fusion, RNA splice variants, and RNA 3'/5' imbalance ratio), panel-specific parameter settings have been integrated into the command-line options. Each supported Pillar panel has a dedicated option, and the details for these DNA panels are listed in the table below:

    Panel Name

    Short Name

    Panel Code

    Sample Type

    Default variant caller enabled

    Command Line Options

    oncoReveal BRCA1 & BRCA2 plus CNV

    BRCA CNV

    BR283

    DNA

    SNV, CNV

    --amplicon-enable-dna-brca

    oncoReveal Lymphoid

    Lymphoid

    P-LYM-01

    DNA

    For more detail on the amplicon pipeline, please refer to DRAGEN Amplicon Pipeline

    Notes and additional options

    Hashtable

    For DRAGEN amplicon runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Enables map-align.

    --enable-map-align-output true

    Optionally save the output BAM (Default=false).

    Amplicon post-alignment processing

    Option
    Description

    --amplicon-primer-length INT

    If an alignment starts inside the primer region of the amplicon target, the alignment is assigned to the amplicon.

    --amplicon-allow-partial-target true

    In order to detect deletion events that are close to the target boundaries, we now require only one of the reads to start in the primer region (Default=true)

    For more detail on the amplicon post-alignment processing, please refer to DRAGEN Amplicon Pipeline

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking false

    The Amplicon Pipeline disables duplicate marking. In amplicon assays, fragments originate from a limited number of unique start and end positions, making conventional duplicate detection inappropriate. (Default=false)

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest. Default is amplicon target bed.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2).

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). For ctDNA, the default is 0.001 (0.1%).

    --vc-af-call-threshold FLOAT

    If the AF filter is enabled using --vc-enable-af-filter=true, the option sets the allele frequency call threshold for nuclear chromosomes to emit a call in the VCF. The default value is 0.01. For ctDNA, the default is 0.001.

    --vc-af-filter-threshold FLOAT

    If the AF filter is enabled using --vc-enable-af-filter=true, the option sets the allele frequency filter threshold for nuclear chromosomes to mark emitted VCF calls as filtered. The default value is 0.05. For ctDNA, the default is 0.003.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    CNV

    Option
    Description

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. By default, bed is used for standard panels and hslm for Pillar panels with a pre-built PON.

    --amplicon-cnv-use-default-pon false

    We recommend including in-run normal samples—matched in sample type and library preparation—in the same sequencing run to serve as the PON. If generating a custom PON is not feasible, for Pillar panels, the pre-packaged panel-specific PON can be used as a fallback. To enable this, set the option to true

    --cnv-segmentation-bed $PATH

    You can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed. If bed segmentation mode is used, the segmentation bed is auto-generated from amplicon target bed by default

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    MSI

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 500 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and PON for a microsatellite: 0.3 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Default as amplicon target bed.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in amplicon workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Optional systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed).

    For more information, see Structural Variant Calling.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows.

    DRAGEN has pre-built systematic noise files for Pillar panels. To achieve high sensitivity, we recommend generating a custom systematic noise file as described in the Custom section

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-50 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-50 normal samples.

    Gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise).

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are considered experimental in amplicon.

    Custom

    Custom systematic noise files can be generated for amplicon Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 50 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (<output-file-prefix>.target.counts.gz as cnv-enable-gcbias-correction is by default false in amplicon). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    DNA Somatic Tumor-Normal Solid WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    Analysis Output

    When the analysis run completes, the software generates an analysis output in a specified location with the folder name /staging/DRAGEN_Solid_WGS_Tumor_Normal_Pipeline_{version}_Analysis_{datetimestamp}. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID.

    Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.

    Output Folders

    📂 Results - Contains the final result files from the pipeline.

    📄 MetricsOutput.tsv - Contains summary metrics for all samples.

    📂 Case1

    📄 Case1_MetricsOutput.tsv - Contains summary metrics for tumor and normal samples for Case1.

    📂 TumorSample1

    📄 TumorSample1.hard-filtered.vcf.gz - Contains somatic small variant calls.

    📄 TumorSample1.cnv.vcf.gz - Contains somatic copy number variant calls.

    📄 TumorSample1.sv.vcf.gz - Contains somatic structural variant calls.

    📄 TumorSample1_SNV_Tumor_Annotated.json.gz - Contains somatic small variant annotations.

    📄 TumorSample1_CNV_Tumor_Annotated.json.gz - Contains somatic copy number variant annotations.

    📄 TumorSample1_SV_Tumor_Annotated.json.gz - Contains somatic structural variant annotations.

    📄 TumorSample1.tmb.metrics.csv - Contains the TMB result and metrics.

    📄 TumorSample1.microsat_output.json - Contains the MSI result and metrics.

    📄 TumorSample1.hrdscore.csv - Contains the HRD result and metrics.

    📄 TumorSample1.tn.bw - Contains tangent normalized somatic coverage in BigWig format.

    📄 TumorSample1.tumor.baf.bedgraph.gz - Contains somatic b-allele frequency in BedGraph format.

    📄 TumorSample1.bam - Contains aligned somatic reads in BAM format.

    📄 TumorSample1.bam.bai - Contains index of aligned somatic reads in BAI format.

    📂 NormalSample1

    📄 NormalSample1.hard-filtered.vcf.gz - Contains germline small variant calls.

    📄 NormalSample1.cnv.vcf.gz - Contains germline copy number variant calls.

    📄 NormalSample1.sv.vcf.gz - Contains germline structural variant calls.

    📄 NormalSample1.repeats.vcf.gz - Contains germline short tandem repeat calls.

    📄 NormalSample1.vntr.vcf.gz - Contains germline variable number tandem repeat calls.

    📄 NormalSample1.targeted.vcf.gz - Contains germline targeted (star allele) calls.

    📄 NormalSample1.targeted.json - Contains germline targeted (star allele) data in JSON format.

    📄 NormalSample1_SNV_Normal_Annotated.json.gz - Contains germline small variant annotations.

    📄 NormalSample1_CNV_Normal_Annotated.json.gz - Contains germline copy number variant annotations.

    📄 NormalSample1SV_Normal_Annotated.json.gz - Contains germline structural variant annotations.

    📄 NormalSample1.hla.tsv - Contains germline HLA typing calls.

    📄 NormalSample1.bam - Contains aligned germline reads in BAM format.

    📄 NormalSample1.bam.bai - Contains index of aligned germline reads in BAI format.

    📂 Logs_Intermediates - Contains all intermediate files for each step of the pipeline (BAMs moved to the Results folder).

    📂 ResourceVerification

    📂 SampleSheetValidation

    📂 NormalFastqValidation

    📂 TumorFastqValidation

    📂 DragenCaller

    📂 TumorNormalVariantCaller

    📂 Tmb

    📂 Annotation

    📂 SampleAnalysisResults

    📂 AdditionalSarjMetrics

    📂 MetricsOutput

    📂 Work - (DRAGEN server only) Contains information and files related to Nextflow execution.

    File Overview

    This section describes the summary output files generated during analysis.

    Metrics Output

    The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each case.

    Normal DNA Input QC Metrics

    Normal DNA Dedup/UMI QC Metrics

    Normal DNA Coverage QC Metrics

    Tumor DNA Input QC Metrics

    Tumor DNA Dedup/UMI QC Metrics

    Tumor DNA Coverage QC Metrics

    Tumor DNA T/N Sample Match QC Metrics

    Tumor DNA Purity QC Metrics

    DNA Somatic Tumor-Only Heme WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # DNA amplicon 
    --enable-dna-amplicon true 
    --amplicon-target-bed $PATH 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #optional for SNV systematic noise
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) except for ctDNA (which is 0.1%)
    # SV 
    --enable-sv true 
    --sv-systematic-noise $PATH             #optional for SV systematic noise
    # CNV 
    --enable-cnv true 
    --cnv-combined-counts $PATH             #CNV PON 
    # Annotation 
    --enable-variant-annotation true 
    --variant-annotation-data PATH 
    # Microsatellite Instability (MSI)      #optional for panels that support MSI
    --amplicon-enable-msi=true
    --msi-microsatellites-file $PATH        #MSI site file
    --msi-ref-normal-input $PATH            #MSI PON file
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --enable-dna-amplicon true 
    --amplicon-target-bed $PATH 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-dna-amplicon true 
    --amplicon-target-bed $PATH 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-dna-amplicon true 
    --amplicon-target-bed $PATH 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --enable-sv true 
    --sv-systematic-noise $PATH             #Optional 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-enable-self-normalization true 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # TMB 
    --enable-tmb true 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-normal 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 

    SNV, SV

    --amplicon-enable-dna-lymphoid

    oncoReveal Core LBx

    Core LBx

    P-LBX-01

    cfDNA

    SNV, CNV, MSI

    --amplicon-enable-cfdna-core

    oncoReveal Essential LBx

    Essential LBx

    P-LBX-04

    cfDNA

    SNV, CNV, MSI

    --amplicon-enable-cfdna-essential

    oncoReveal Essential MPN

    MPN

    MY7

    DNA

    SNV

    --amplicon-enable-dna-mpn

    oncoReveal Multi-Cancer v4 with CNV

    Multi-Cancer with CNV

    HS341

    DNA

    SNV, CNV

    --amplicon-enable-dna-multicancer

    oncoReveal Myeloid

    Myeloid

    MY766

    DNA

    SNV, SV

    --amplicon-enable-dna-myeloid

    oncoReveal Nexus 21 Gene

    Nexus

    P-CMC-01

    DNA

    SNV, SV

    --amplicon-enable-dna-nexus

    oncoReveal Solid Tumor v2

    Solid Tumor v2

    P-ST-02

    DNA

    SNV

    --amplicon-enable-dna-solidtumor

    NA

    10.0

    PCT_SOFT_CLIPPED_BASES_R2 (%)

    NA

    10.0

    PCT_SUPPLEMENTARY_(CHIMERIC)_ALIGNMENTS (%)

    NA

    15.0

    ESTIMATED_READ_LENGTH (bp)

    NA

    NA

    MEAN_INSERT_LENGTH (bp)

    NA

    NA

    MEDIAN_INSERT_LENGTH (bp)

    NA

    NA

    INPUT_BASES_OVER_REFERENCE_GENOME_SIZE (Count)

    NA

    NA

    ESTIMATED_SAMPLE_CONTAMINATION (%)

    NA

    2.00

    NA

    NA

    TOTAL_NUMBER_OF_FAMILIES (Count)

    NA

    NA

    FAMILIES_DISCARDED (Count)

    NA

    NA

    DUPLEX_FAMILIES (Count)

    NA

    NA

    MEAN_FAMILY_DEPTH (Count)

    NA

    NA

    NA

    10.0

    PCT_SOFT_CLIPPED_BASES_R2 (%)

    NA

    10.0

    PCT_SUPPLEMENTARY_(CHIMERIC)_ALIGNMENTS (%)

    NA

    15.0

    ESTIMATED_READ_LENGTH (bp)

    NA

    NA

    MEAN_INSERT_LENGTH (bp)

    NA

    NA

    MEDIAN_INSERT_LENGTH (bp)

    NA

    NA

    INPUT_BASES_OVER_REFERENCE_GENOME_SIZE (Count)

    NA

    NA

    ESTIMATED_SAMPLE_CONTAMINATION (%)

    NA

    2.00

    NA

    NA

    TOTAL_NUMBER_OF_FAMILIES (Count)

    NA

    NA

    FAMILIES_DISCARDED (Count)

    NA

    NA

    DUPLEX_FAMILIES (Count)

    NA

    NA

    MEAN_FAMILY_DEPTH (Count)

    NA

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    TOTAL_INPUT_READS (Count)

    NA

    NA

    PCT_MAPPED_READS (%)

    90.00

    NA

    PCT_PROPERLY_PAIRED_READS (%)

    90.00

    NA

    PCT_Q30_BASES (%)

    80.00

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    PCT_DUPLICATE_MARKED_READS (%)

    NA

    20.00

    PCT_READS_WITH_VALID_OR_CORRECTABLE_UMIS (%)

    NA

    NA

    PCT_READS_IN_DISCARDED_FAMILIES (%)

    NA

    NA

    PCT_READS_FILTERED_OUT (%)

    NA

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    AVERAGE_GENOME_COVERAGE (Count)

    20.00

    NA

    PCT_UNIFORMITY_OF_COVERAGE_OVER_20PCT_OF_MEAN (%)

    50.00

    NA

    PCT_GENOME_20X (%)

    80.00

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    TOTAL_INPUT_READS (Count)

    NA

    NA

    PCT_MAPPED_READS (%)

    90.00

    NA

    PCT_PROPERLY_PAIRED_READS (%)

    90.00

    NA

    PCT_Q30_BASES (%)

    80.00

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    PCT_DUPLICATE_MARKED_READS (%)

    NA

    20.00

    PCT_READS_WITH_VALID_OR_CORRECTABLE_UMIS (%)

    NA

    NA

    PCT_READS_IN_DISCARDED_FAMILIES (%)

    NA

    NA

    PCT_READS_FILTERED_OUT (%)

    NA

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    AVERAGE_GENOME_COVERAGE (Count)

    20.00

    NA

    PCT_UNIFORMITY_OF_COVERAGE_OVER_20PCT_OF_MEAN (%)

    50.00

    NA

    PCT_GENOME_20X (%)

    80.00

    NA

    Metric (UOM)

    LSL Guideline

    USL Guideline

    OUTLIER_BAF_FRACTION (NA)

    NA

    0.90

    Metric (UOM)

    LSL Guideline

    USL Guideline

    ESTIMATED_PURITY (%)

    20.00

    NA

    PCT_SOFT_CLIPPED_BASES_R1 (%)

    PCT_READS_WITH_UNCORRECTABLE_UMIS (%)

    PCT_SOFT_CLIPPED_BASES_R1 (%)

    PCT_READS_WITH_UNCORRECTABLE_UMIS (%)

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. Germline-aware Mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-enable-liquid-tumor-mode true

    DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.

    --sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE

    Set the Tumor-in-Normal (TiN) contamination tolerance level.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --heme-cnv true

    Configures DRAGEN to use CNV settings for HEME.

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --heme-sv true

    Configures DRAGEN to use SV settings for Liquid Tumors (e.g., AML/MLL).

    --sv-min-scored-variant-size $INT

    100000

    For more information, see Structural Variant Calling.

    DUX4

    Option
    Description

    --dux4-skip-santiy-check true

    Bypass the requirements checks if the input datasets don't comply with parameters listed in prerequisites

    For more information, see DUX4-rearrangement Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    DNA Somatic Tumor-Only Solid WGS

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    DNA Somatic Tumor-Only Solid WGS UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    DNA Somatic Tumor-Normal Solid WGS UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    DNA Somatic Tumor-Only Solid WES

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Required 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --heme-sv true 
    --sv-systematic-noise $PATH             #Recommended 
    # DUX4 
    --enable-dux4-caller true 
    # CNV 
    --heme-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-enable-self-normalization true 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --enable-sv true 
    --sv-systematic-noise $PATH             #Recommended 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-enable-self-normalization true 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # SV 
    --enable-sv true 
    --sv-systematic-noise $PATH             #Recommended 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-enable-self-normalization true 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --tumor-normal-has-umi STRING           #Sample(s) containing UMI ['tumor', 'both']. 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-sq-filter-threshold 17.5           #recommended in tumor-normal UMI mode 
    # SV 
    --enable-sv true 
    --sv-systematic-noise $PATH             #Optional 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-enable-self-normalization true 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # TMB 
    --enable-tmb true 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-normal 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    Bed File Collection

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-min-scored-variant-size $INT

    100000

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-min-scored-variant-size $INT

    100000

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    UMI Options
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --tumor-normal-has-umi STRING

    Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. Germline-aware Mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-enable-liquid-tumor-mode true

    DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.

    --sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE

    Set the Tumor-in-Normal (TiN) contamination tolerance level.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    UMI Options
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --sv-min-scored-variant-size $INT

    100000

    For more information, see Structural Variant Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    DNA Somatic Tumor-Only Solid Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    SNV

    Option
    Description

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see .

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    DNA Somatic Tumor-Normal Solid WES

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Duplicate Marking

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    SNV

    Option
    Description

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see .

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    DNA Somatic Tumor-Only Solid Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See:

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    UMI

    Option
    Description

    For more information see: .

    SNV

    Option
    Description

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    For more detail on the small variant caller in somatic mode please refer to

    HLA

    Option
    Description

    CNV

    Option
    Description

    For more information, see .

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    See the user guide: .

    MSI

    Microsatellite sites file can be downloaded here: .

    For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.

    Option
    Description and recommended setting

    SV

    Option
    Description
    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    For more information, see .

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .

    Prebuilt WES/WGS noise files
    Description

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: .

    Prebuilt WGS noise files
    Description

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see .

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    DNA Mapping

    DNA Mapping

    Seed Density Option

    The seed-density option controls how many (normally overlapping) primary seeds from each read the mapper looks up in its hash table for exact matches. The maximum density value of 1.0 generates a seed starting at every position in the read, ie, (L-K+1) K-base seeds from an L-base read.

    Seed density must be between 0.0 and 1.0. Internally, an available seed pattern equal or close to the requested density is selected. The sparsest pattern is one seed per 32 positions, or density 0.03125.

    • Accuracy Considerations--Generally, denser seed lookup patterns improve mapping accuracy. However, for modestly long reads (eg, 50 bp+) and low sequencer error rates, there is little to be gained beyond the default 50% seed lookup density.

    • Speed Considerations--Denser seed lookup patterns generally slow down mapping, and sparser seed patterns speed it up. However, when the seed mapping stage can run faster than the aligning stage, a sparser seed pattern does not make the mapper much faster.

    Relationship to Reference Seed Interval

    Functionally, a denser or sparser seed lookup pattern has an impact very similar to a shorter or longer reference seed interval (build hash table option --ht-ref-seed-interval). Populating 100% of reference seed positions and looking up 50% of read seed positions has the same effect as populating 50% of reference seed positions and looking up 100% of read seed positions. Either way, the expected density of seed hits is 50%.

    More generally, the expected density of seed hits is the product of the reference seed density (the inverse of the reference seed interval) and the seed lookup density. For example, if 50% of reference seeds are populated and 33.3% (1/3) of read seed positions are looked up, then the expected seed hit density should be 16.7% (1/6).

    DRAGEN automatically adjusts its precise seed lookup pattern to ensure it does not systematically miss the seed positions populated from the reference. For example, the mapper does not look up seeds matching only odd positions in the reference when only even positions are populated in the hash table, even if the reference seed interval is 2 and seed-density is 0.5.

    Map Orientations Option

    The --Mapper.map-orientations option is used in mapping reads for bisulfite methylation analysis. It is set automatically based on the value set for ‑‑methylation-protocol.

    The --Mapper.map-orientations option can restrict the orientation of read mapping to only forward in the reference genome, or only reverse-complemented. The valid values for --map-orientations are as follows.

    • 0--Either orientation (default)

    • 1--Only forward mapping

    • 2--Only reverse-complemented mapping

    If mapping orientations are restricted and paired end reads are used, the expected pair orientation can only be FR, not FF or RF.

    Seed-Editing Options

    Although DRAGEN primarily maps reads by finding exact reference matches to short seeds, it can also map seeds differing from the reference by one nucleotide by also looking up single-SNP edited seeds. Seed editing is usually not necessary with longer reads (100 bp+), because longer reads have a high probability of containing at least one exact seed match. This is especially true when paired ends are used, because a seed match from either mate can successfully align the pair. But seed editing can, for example, be useful to increase mapping accuracy for short single-ended reads, with some cost in increased mapping time. The following options control seed editing:

    Seed Editing Options

    Command-Line Option Name
    Configuration File Option Name

    edit-mode and edit-chain-limit

    The edit-mode and edit-chain-limit options control when seed editing is used. The following four edit-mode values are available:

    Mode
    Description

    Edit mode 0 requires all seeds to match exactly. Mode 3 is the most expensive because every seed that fails to match the reference exactly is edited. Modes 1 and 2 employ heuristics to look up edited seeds only for reads most likely to be salvaged to accurate mapping.

    The main heuristic in edit modes 1 and 2 is a seed chain length test. Exact seeds are mapped to the reference in a first pass over a given read, and the matching seeds are grouped into chains of similarly aligning seeds. If the longest seed chain (in the read) exceeds a threshold edit-chain-limit, the read is judged not to require seed editing, because there is already a promising mapping position.

    Edit mode 1 triggers seed editing for a given read using the seed chain length test. If no seed chain exceeds edit-chain-limit (including if no exact seeds match), then a second seed mapping pass is attempted using edited seeds. Edit mode 2 further optimizes the heuristic for paired-end reads. If either mate has an exact seed chain longer than edit-chain-limit, then seed editing is disabled for the pair, because a rescue scan is likely to recover the mate alignment based on seed matches from one read. Edit mode 2 is the same as mode 1 for single-ended reads.

    edit-seed-num and edit-read-len

    For edit modes 1 and 2, when the heuristic triggers seed editing, these options control how many seed positions are edited in the second pass over the read. Although exact seed mapping can use a densely overlapping seed pattern, such as seeds starting at 50% or 100% of read positions, most of the value of seed editing can be obtained by editing a much sparser pattern of seeds, even a nonoverlapping pattern. Generally, if a user application can afford to spend some additional amount of mapping time on seed editing, a greater increase in mapping accuracy can be obtained for the same time cost by editing seeds in sparse patterns for a large number of reads, than by editing seeds in dense patterns for a small number of reads.

    Whenever seed editing is triggered, these two options request edit-seed-num seed editing positions, distributed evenly over the first edit-read-len bases of the read. For example, with 21-base seeds, edit-seed-num=6 and edit-read-len=100, edited seeds can begin at offsets {0, 16, 32, 48, 64, 80} from the 5' end, consecutive seeds overlapping by 5 bases. Because sequencing technologies often yield better base qualities nearer the (5') beginning of each read, this can focus seed editing where it is most likely to succeed. When a particular read is shorter than edit-read-len, fewer seeds are edited.

    Seed editing is more expensive when the reference seed interval (build hash table option ‑-ht‑ref-seed-interval) is greater than 1. For edit modes 1 and 2, additional seed editing positions are automatically generated to avoid missing the populated reference seed positions. For edit mode 3, the time cost can increase dramatically because query seeds matching unpopulated reference positions typically miss and trigger editing.

    DNA Aligning

    Smith-Waterman Alignment Scoring Settings

    The first stage of mapping is to generate seeds from the read and look for exact matches in the reference genome. These results are then refined by running full Smith-Waterman alignments on the locations with the highest density of seed matches. This well-documented algorithm works by comparing each position of the read against all the candidate positions of the reference. These comparisons correspond to a matrix of potential alignments between read and reference. For each of these candidate alignment positions, Smith-Waterman generates scores that are used to evaluate whether the best alignment passing through that matrix cell reaches it by a nucleotide match or mismatch (diagonal movement), a deletion (horizontal movement), or an insertion (vertical movement). A match between read and reference provides a bonus, on the score, and a mismatch or indel imposes a penalty. The overall highest scoring path through the matrix is the alignment chosen.

    The specific values chosen for scores in this algorithm indicate how to balance, for an alignment with multiple possible interpretations, the possibility of an indel as opposed to one or more SNPs, or the preference for an alignment without clipping. The default DRAGEN scoring values are reasonable for aligning moderate length reads to a whole human reference genome for variant calling applications. But any set of Smith-Waterman scoring parameters represents an imprecise model of genomic mutation and sequencing errors, and differently tuned alignment scoring values can be more appropriate for some applications.

    The following alignment options control Smith-Waterman Alignment:

    Command-Line Option Name
    Configuration File Option Name
    • global The global option (value can be 0 or 1) controls whether alignment is forced to be end-to-end in the read. When set to 1, alignments are always end-to-end, as in the Needleman-Wunsch global alignment algorithm (although not end-to-end in the reference), and alignment scores can be positive or negative. When set to 0, alignments can be clipped at either or both ends of the read, as in the Smith-Waterman local alignment algorithm, and alignment scores are nonnegative. Generally, global=0 is preferred for longer reads, so significant read segments after a break of some kind (large indel, structural variant, chimeric read, and so forth) can be clipped without severely decreasing the alignment score. Setting global=1 might not have the desired effect with longer reads because insertions at or near the ends of a read can function as pseudoclipping. Also, with global=0, multiple (chimeric) alignments can be reported when various portions of a read match widely separated reference positions. Using global=1 is sometimes preferable with short reads, which are unlikely to overlap structural breaks, unable to support chimeric alignments, and are suspected of incorrect mapping if they cannot align well end-to-end. Consider using the unclip-score option, or increasing it, instead ofsetting global=1, to make a soft preference for unclipped alignments.

    Paired-End Options

    DRAGEN can process paired-end data passed via a pair of FASTQ files or in a single interleaved FASTQ file. The hardware maps the two ends separately, and then determines a set of alignments that seem most likely to form a pair in the expected orientation and having roughly the expected insert size. The alignments for the two ends are evaluated for the quality of their pairing, with larger penalties for insert sizes far from the expected size. The following options control processing of paired-end data:

    • Reorientation The pe-orientation option specifies the expected paired-end orientation. Only pairs with this orientation can be flagged as proper pairs. Valid values are as follows:

      • 0--FR (default)

      • 1--RF

    The pe-max-penalty option limits how much the estimated MAPQ for one read can increase because its mate aligned nearby. A paired alignment is never assigned MAPQ higher than the MAPQ that it would have received mapping single-ended, plus this value. By default, pe-max-penalty = mapq-max = 255, effectively disabling this limit. The key difference between unpaired-pen and pe-max-penalty is that unpaired-pen affects calculated pair scores and thus which alignments are selected and pe-max-penalty affects only reported MAPQ for paired alignments.

    Mean Insert Size Detection

    When working with paired-end data, DRAGEN must choose among the highest-quality alignments for the two ends to try to choose likely pairs. To make this choice, DRAGEN uses a skew normal insert model to evaluate the likelihood that a pair of alignments constitutes a pair. This model is based on the observation that common library preparation methods have insert-size distributions that are sometimes close to normal, but also sometimes clearly asymmetric, often skewing toward longer insert sizes. The skew normal insert model is used only for the DNA mode.

    If you know the statistics of your library prep for an input file (and the file consists of a single read group), you can specify the characteristics of the insert-length distribution: mean, standard deviation, shape (or skewness) and three quartiles. These characteristics can be specified with the Aligner.pe-stat-mean-insert, Aligner.pe-stat-stddev-insert, Aligner.pe-stat-shape-insert, Aligner.pe-stat-quartiles-insert, and Aligner.pe-stat-mean-read-len options. However, it is typically preferable to allow DRAGEN to detect these characteristics automatically.

    Dragen automatically samples the insert-length distribution. When the software starts execution, it runs a sample of up to 2,000,000 pairs through the aligner, calculates the distribution, and then uses the resulting statistics for evaluating all pairs in the input set.

    The DRAGEN host software reports the statistics in its stdout log in a report, as follows:

    Note that the Mean, Standard deviation and Quartiles reported above are the sample mean, standard deviation and quartiles calculated from the initial sample of up to 2,000,000 pairs, assuming a normal distribution. The sample mean and standard deviation are used to fit the parameters of a skew-normal distribution. A skew-normal distribution is defined by starting with an underlying normal distribution (whose mean we call position or xi and standard deviation we call scale or omega) and folding a varying portion of the probability mass from one side of the mean (e.g., left side) to the other (e.g., right) side. The portion folded varies smoothly, from 0% at the original mean, approaching 100% from the left tail to the right tail. A shape parameter which we call alpha controls how rapidly the folded fraction increases, and at alpha=0 there is no folding and the distribution remains normal.

    In the standard output, we also include the command line options needed to reproduce the DRAGEN run with the same insert stat settings. Note that when specifying stats on the command line, the skew-normal xi value should be used for Aligner.pe-stat-mean-insert. The omega value should be used for Aligner.pe-stat-stddev-insert, and the alpha value should be used for Aligner.pe-stat-shape-insert. If Aligner-pe-stat-shape-insert is not specified on the command line, a default value of 0 is assumed.

    The insert length distribution for each sample is written to fragment_length_hist.csv. Each sample starts with the following lines

    These lines are followed by the histogram for the first ~2M read pairs for DNA (~100K read pairs for RNA). The histogram counts are aggregated across all read groups sharing the same sample id (RGSM field).

    When the number of sample pairs is very small, there is not enough information to characterize the distribution with high confidence. In this case, DRAGEN applies default statistics that specify a very wide insert distribution, which tends to admit pairs of alignments as proper pairs, even if they may lie tens of thousands of bases apart. In this situation, DRAGEN outputs a message, as follows:

    The small samples formula calculates standard deviation as follows:

    The default model is "standard deviation = 10000". If the first 2M reads are unmapped or if all pairs are improper pairs, then the standard deviation is set to 10000 and the mean and quartiles are set to 0. Note that the minimum value for standard deviation is 12, which is independent of the number of samples. Also, in the DNA mode when we have fewer than 1000 high quality alignments we revert to the normal distribution based insert model, because of insufficient number of samples to accurately estimate the parameters of the skew normal distribution.

    For RNA-Seq data, the insert size distribution is not normal due to pairs containing introns. The DRAGEN software estimates the distribution using a kernel density estimator to fit a long tail to the samples. This estimate leads to a more accurate mean and standard deviation for RNA-Seq data and proper pairing.

    DRAGEN writes detected paired-end stats into a tab-delimited log file in the output directory called .insert-stats.tab. This file contains the statistical distribution of detected insert sizes for each read group, including quartiles, mean, standard deviation, shape, minimum, and maximum. The information matches the standard-out report above. Additionally, the log file includes the minimum and maximum insert limits that DRAGEN applied for rescue scans. Note that the reported mean and standard deviation in this tab-limited log file are the xi and omega parameters of the skew-normal distribution.

    Rescue Scans

    For paired-end reads, where a seed hit is found for one mate but not the other, rescue scans hunt for missing mate alignments within a rescue radius of the mean insert length. Normally, the DRAGEN host software sets the rescue radius to 2.5 standard deviations of the empirical insert distribution. But in cases where the insert standard deviation is large compared to the read length, the rescue radius is restricted to limit mapping slowdowns. In this case, a warning message is displayed, as follows:

    Although the user can ignore this warning, or specify an intermediate rescue radius to maintain mapping speed, it is recommended to use 2.5 sigmas for the rescue radius to maintain mapping sensitivity. To disable rescue scanning, set max-rescues to 0.

    Output Options

    DRAGEN can track multiple independent alignments for each read. These alignments include the optimal (primary) one, as well as those mapping different subsegments of the read, (chimeric/supplementary), and sub-optimal (secondary) mappings of the read to different areas of the reference.

    For DNA alignment by default, DRAGEN can emit one primary alignment for each read, up to three chimeric alignments (Aligner.supp-aligns=3), and no secondary alignments (Aligner.sec-aligns=0). The maximum user-specified value for supp-aligns or sec-aligns is 4095.

    You can use the following configuration options to control how many of each type of alignment to include in DRAGEN output.

    • mapq-max The mapq-max option specifies a ceiling on the estimated MAPQ that can be reported for any alignment, from 0 to 255. If the calculated MAPQ is higher, this value is reported instead. The default is 60.

    • supp-aligns, sec-aligns The supp-aligns and sec-aligns options restrict the maximum number of supplementary (ie, chimeric and SAM FLAG 0x800) alignments and secondary (ie, suboptimal and SAM FLAG 0x100) alignments, respectively, that can be reported for each read. A maximum of 4095 supplementary alignments and 4095 secondary alignments can be reported for any read, in addition to a primary alignment. High settings for these two options impact speed so it is advisable to increase only as needed.

    Each bit determines whether local alignments of that type are reported with hard clipping (1) or soft clipping (0). The default is 6, meaning primary alignments use soft clipping and supplementary and secondary alignments use hard clipping.

    Mapping with ALT-contigs

    The GRCh38 human reference contains many more alternate haplotypes (ALT contigs) than previous versions of the reference. Generally, including ALT contigs in the mapping reference improves mapping and variant calling specificity, because misalignments are eliminated for reads matching an ALT contig but scoring poorly against the primary assembly. However, mapping with GRCh38's ALT contigs without special treatment can substantially degrade variant calling sensitivity in corresponding regions, because many reads align equally well to an ALT contig and to the corresponding position in the primary assembly.

    Masked Based ALT-awareness

    The recomeneded and default approach for dealing with ALT-contigs in DRAGEN is masking regions of ALT contigs of high similarity to their corresponding primary contig. This approach is more accurate than liftover based ALT-awarness because there are many places where the "correct" or most useful liftover between a long ALT haplotype and the primary assembly is ambiguous. Incorrect liftover can produce dense clusters of mismapped reads and false variant calls. The base masking approach has the benefits of using ALT contigs without the negative consequences.

    Masked hash tables are built from a standard hg18 or hg38 FASTA that contains ALT contigs. The hash table builder will automatically mask regions of the ALT contigs with Ns.

    Liftover Based ALT-awarness

    With liftover based ALT-awareness, the mapper and aligner are aware of the liftover relationship between ALT contig positions and corresponding primary assembly positions. Seed matches within ALT contigs are used to obtain corresponding primary assembly alignments, even if the latter score poorly. Liftover groups are formed, each containing a primary assembly alignment candidate, and zero or more ALT alignment candidates that lift to the same location. Each liftover group is scored according to its best-matching alignments, taking properly paired alignments into account. The winning liftover group provides its primary assembly representative as the primary output alignment, with MAPQ calculated based on the score difference to the second-best liftover group. Emitting primary alignments within the primary assembly maintains normal aligned coverage and facilitates variant calling there. If the --Aligner.en-alt-hap-aln option is set to 1 and --Aligner.supp-aligns is greater than 0, then corresponding alternate haplotype alignments can also be output, flagged as supplementary alignments.

    DRAGEN requires ALT-Aware hash tables for any hg19 or GRCh38 reference where ALT contigs are detected. To disable this requirement in DRAGEN, set the --ht-alt-aware-validate option to false.

    The following is a comparison of alternative options for dealing with alternate haplotypes.

    • Mapping without ALT contigs in the reference:

      • False-positive variant calls result when reads matching an alternate haplotype misalign somewhere else.

      • Poor mapping and variant calling sensitivity where reads matching an ALT contig differ greatly from the primary assembly.

    • Mapping with ALT contigs but no ALT awareness:

    DRAGEN Multigenome Mapper

    The Multigenome Mapper in DRAGEN significantly improves the accuracy of mapping Illumina reads, particularly in challenging regions such as segmental duplications and other difficult to map regions. This advanced method leverages population haplotypes from pangenome references to incorporate additional variant information, constructing alternative haplotype paths that improve reads mapping. By offering these alternate paths, the Multigenome Mapper enables reads containing population-specific variants to align directly to their most likely genomic locations, reducing mapping ambiguity. This improved mapping also results in improved variant calling accuracy.

    When given a set of population variants (VCF) or haplotypes, the pangenome reference modification is categorized in the following types:

    • Alternate contigs represent population haplotypes. Alt-contigs can have a single variant or a combination of nearby phased variants.

    • Ambiguous codes (IUPAC codes) to represent SNPs. To improve alignment, it edits the reference FASTA with isolated population SNPs.

    • Haplotype database. An additional haplotype database is built and used to augment the reference FASTA with population variants. A multigenome based mapper algorithm is used to score read alignment according to the variants in this database.

    The DRAGEN pangenome hashtables are available to download from the .

    DNA Somatic Tumor-Normal Solid WES UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    DNA Somatic Tumor-Normal Solid Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    DNA Somatic Tumor-Only ctDNA Panel UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.

    Notes and additional options

    Hashtable

    DNA Somatic Tumor-Normal Solid Panel

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    DNA Somatic Tumor-Only Solid WES UMI

    A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

    Notes and additional options

    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # TMB 
    --enable-tmb true 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-normal 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-enable-umi-solid true              #>= 1% VAF 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    Bed File Collection
    Merge Duplex UMIs
    Bed File Collection
    Merge Duplex UMIs
    Bed File Collection

    --Aligner.unclip-score

    unclip-score

    --Aligner.no-unclip-score

    no-unclip-score

    --Aligner.aln-min-score

    aln-min-score

    --Aligner.min-score-coeff

    min-score-coeff

    match-score The match-score option specifies the score for a read nucleotide matching a reference nucleotide (A, C, G, or T), or matching a reference 2–3 nucleotide IUPAC-IUB code. Its value is an unsigned integer, from 0 to 15. match_score=0 can only be used when global=1. A higher match score results in longer alignments, and fewer long insertions.
  • match-n-score The match-n-score option specifies the score for an aligned position where the read position and/or the reference position is an N code. This option is a signed integer, from -16 to 15.

  • mismatch-pen The mismatch-pen option is the penalty (negative score) for a read nucleotide mismatching any reference nucleotide or IUPAC-IUB code, except N. This option is an unsigned integer, from 0 to 63. A higher mismatch penalty results in alignments with more insertions, deletions, and clipping to avoid SNPs.

  • gap-open-pen The gap-open-pen option is the penalty (negative score) for opening a gap (ie, an insertion or deletion). This value is only for a 0-base gap. It is always added to the gap length times gap-ext-pen. This option is an unsigned integer, from 0 to 127. A higher gap open penalty causes fewer insertions and deletions of any length in alignment CIGARs, with clipping or alignment through SNPs used instead.

  • gap-ext-pen The gap-ext-pen option is the penalty (negative score) for extending a gap (ie, an insertion or deletion) by one base. This option is an unsigned integer, from 0 to 15. A higher gap extension penalty causes fewer long insertions and deletions in alignment CIGARs, with short indels, clipping, or alignment through SNPs used instead.

  • unclip-score The unclip-score option is the score bonus for an alignment reaching the beginning or end of the read. An end-to-end alignment receives twice this bonus. This option is an unsigned integer, from 0 to 127. A higher unclipped bonus causes alignment to reach the beginning and/or end of a read more often, where this can be done without too many SNPs or indels. A nonzero unclip-score is useful when global=0 to make a soft preference for unclipped alignments. Unclipped bonuses have little effect on alignments when global=1, because end-to-end alignments are forced anyway (although 2 × unclip-score does add to every alignment score unless no-unclip-score = 1). Note that, especially with longer reads, setting unclip-score much higher than gap-open-pen can have the undesirable effect of insertions at or near one end of a read being utilized as pseudoclipping, as happens with global=1

  • no-unclip-score The no-unclip-score option can be 0 or 1. The default is 1. When no-unclip-score is set to 1, any unclipped bonus (unclip-score) contributing to an alignment is removed from the alignment score before further processing, such as comparison with aln-min-score, comparison with other alignment scores, and reporting in AS or XS tags. However, the unclipped bonus still affects the best-scoring alignment found by Smith-Waterman alignment to a given reference segment, biasing toward unclipped alignments When unclip-score > 0 causes a Smith-Waterman local alignment to extend out to one or both ends of the read, the alignment score stays the same or increases if no-unclip-score=0, whereas it stays the same or decreases if no-unclip-score=1. The default, no-unclip-score=1, is recommended when global=1, because every alignment is end-to-end, and there is no need to add the same bonus to every alignment. When changing no-unclip-score, consider whether aln-min-score should be adjusted. When no-unclip-score=0, unclipped bonuses are included in alignment scores compared to the aln-min-score floor, so the subset of alignments filtered out by aln-min-score can change significantly with no-unclip-score.

  • aln-min-score The aln-min-score option specifies a minimum acceptable alignment score. Any alignment results scoring lower are discarded. Increasing or decreasing aln-min-score can reduce or increase the percentage of reads mapped. This option is a signed integer (negative alignment scores are possible with global=0). aln-min-score also affects MAPQ estimates. The primary contributor to MAPQ calculation is the difference between the best and second-best alignment scores. A read's best alignment score is saved in the AS SAM tag, and the second-best score (if available) is saved in the XS tag. aln-min-score serves as the suboptimal alignment score if nothing higher was found except the best score. Therefore, increasing aln-min-score can decrease reported MAPQ for some low-scoring alignments. You can use the min-score-coeff option to adjust aln-min-score as a function of read length.

  • min-score-coeff The min-score-coeff option makes adjustments to aln-min-score per read base. When using the min-score-coeff and aln-min-score options together, you can define the minimum alignment score for each read as an affine function of read lengths. The minimum score for an N-base read is calculated as follows: (min-score-coeff)\*N+(aln-min-score) The min-score-coeff option is an integer ranging from –64 to 63.999. If the value is 0, then the minimum alignment score is fixed at aln-min-score for all read length. You can use positive values for min-score-coeff to allow shorter reads to match with lower alignment scores, but require longer reads to achieve higher scores.

  • 2--FF
  • unpaired-pen For paired end reads, best mapping positions are determined jointly for each pair, according to the largest pair score found, considering the various combinations of alignments for each mate. A pair score is the sum of the two alignment scores minus a pairing penalty, which estimates the unlikelihood of insert lengths further from the mean insert than this aligned pair. The unpaired-pen option specifies how much alignment pair scores should be penalized when the two alignments are not in properly paired position or orientation. This option also serves as the maximum pairing penalty for properly paired alignments with extreme insert lengths. The unpaired-pen option is specified in Phred scale, according to its potential impact on MAPQ. Internally, it is scaled into alignment score space based on Smith-Waterman scoring parameters.

  • pe-max-penalty

  • sec-phred-delta The sec-phred-delta option controls which secondary alignments are emitted based on the alignment score relative to the primary reported alignment. Only secondary alignments with likelihood within this Phred value of the primary are reported.

  • sec-aligns-hard The sec-aligns-hard option suppresses the output of all secondary alignments if there are more secondary alignments than can be emitted. Set sec-aligns-hard to 1 to force the read to be unmapped when not all secondary alignments can be output.

  • supp-as-sec When the supp-as-sec option is set to 1, then supplementary (chimeric) alignments are reported with SAM FLAG 0x100 instead of 0x800. The default is 0. The supp-as-sec option provides compatibility with tools that do not support FLAG 0x800.

  • hard-clips The hard-clips option is used as a field of 3 bits, with values ranging from 0 to 7. The bits specify alignments, as follows:

    • Bit 0--primary alignments

    • Bit 1--supplementary alignments

    • Bit 2--secondary alignments

  • False-positive variant calls from misaligned reads matching ALT contigs are eliminated.

  • Low or zero aligned coverage in primary assembly regions covered by alternate haplotypes, due to some reads mapping to ALT contigs.

  • Low or zero MAPQ in regions covered by alternate haplotypes, where they are similar or identical to the primary assembly.

  • Variant calling sensitivity is dramatically reduced throughout regions covered by alternate haplotypes.

  • Mapping with ALT contigs and ALT awareness:

    • False-positive variant calls from misaligned reads matching ALT contigs are eliminated.

    • Normal aligned coverage in regions covered by alternate haplotypes because primary alignments are to the primary assembly.

    • Normal MAPQs are assigned because alignment candidates in alternative haplotypes are not considered in competition.

    • Good mapping and variant calling sensitivity where reads matching an ALT contig differ greatly from the primary assembly.

  • --Mappper.seed-density

    seed-density

    -Mapper.edit-mode

    edit-mode

    --Mapper.edit-seed-num

    edit-seed-num

    --Mapper.edit-read-len

    edit-read-len

    --Mapper.edit-chain-limit

    edit-chain-limit

    0

    No editing (default)

    1

    Chain length test

    2

    Paired chain length test

    3

    Full seed editing

    --Aligner.global

    global

    --Aligner.match-score

    match-score

    --Aligner.match-n-score

    match-n-score

    --Aligner.mismatch-pen

    mismatch-pen

    --Aligner.gap-open-pen

    gap-open-pen

    --Aligner.gap-ext-pen

    DRAGEN Software Support Site page

    gap-ext-pen

    Initial paired-end statistics detected for read group RGID, based on 39042 high quality pairs for FR orientation
            Quartiles (25 50 75) = 398 409 420
            Mean = 410.192
            Standard deviation = 14.1254
            NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN)
    
            Skew-normal insert distribution applied:
              Position (xi) = 424.084
              Scale (omega) = 19.8719
              Shape (alpha) = -1.88125
    
            To rerun with identical insert stats, specify:
              --Aligner.pe-stat-mean-insert=424.084
              --Aligner.pe-stat-stddev-insert=19.8719
              --Aligner.pe-stat-shape-insert=-1.88125
              --Aligner.pe-stat-quartiles-insert="398 409 420"
              --Aligner.pe-stat-mean-read-len=101
     #Sample: sample name
     FragmentLength,Count
    WARNING: Less than 28 high quality pairs found - standard deviation is
    calculated from the small samples formula
     if samples < 3 then                                                     
          standard deviation = 10000                                          
     else if samples < 28 then                                               
        standard deviation = 25 * (standard deviation + 1) / (samples - 2) 
     end if                                                                   
                                                                              
     if standard deviation < 12 then                                         
          standard deviation = 12                                             
     end if                                                                   
    Rescue radius = 220
         Effective rescue sigmas = 0.5
                WARNING: Default rescue sigmas value of 2.5 was overridden by host software!
                The user may wish to set rescue sigmas value explicitly with --Aligner.rescue-sigmas

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-min-scored-variant-size $INT

    100000

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    CNV Preprocessing | Panel of Normals

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. Germline-aware Mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-enable-liquid-tumor-mode true

    DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.

    --sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE

    Set the Tumor-in-Normal (TiN) contamination tolerance level.

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    CNV Preprocessing | Panel of Normals

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    --sv-min-scored-variant-size $INT

    100000

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Product Files
    BCL conversion
    UMI Options
    Somatic Mode
    CNV Calling
    Nirvana
    TMB Germline Variants
    Product Files
    Structural Variant Calling
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    Product Files
    DRAGEN Baseline Builder App on BaseSpace
    ICA
    CNV Preprocessing | Panel of Normals
    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    UMI

    Option
    Description

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    For more information see: UMI Options.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. .

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --sv-enable-liquid-tumor-mode true

    DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.

    --sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE

    Set the Tumor-in-Normal (TiN) contamination tolerance level.

    For more information, see Structural Variant Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    UMI

    Option
    Description

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    For more information see: UMI Options.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. .

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --sv-enable-liquid-tumor-mode true

    DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.

    --sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE

    Set the Tumor-in-Normal (TiN) contamination tolerance level.

    For more information, see Structural Variant Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    UMI

    Option
    Description

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    For more information see: UMI Options.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant minimum allele frequency for usable variants. Default=0.05. Set to 0.002 for ctDNA.

    --vc-callability-tumor-thresh

    Required read coverage to use a site. Default=50. Set to 1000 for ctDNA.

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.

    Option
    Description and recommended setting for Liquid (cfDNA)

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 500

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.02

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --sv-min-scored-variant-size $INT

    100000

    For more information, see Structural Variant Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    In the TN pipeline this must be set to false for BAM/CRAM input.

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Duplicate Marking

    Option
    Description

    --enable-duplicate-marking true

    By default, DRAGEN marks duplicate reads and exclude them from variant calling.

    --enable-positional-collapsing true

    Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-normal-cnv-vcf $CNV_NORMAL_VCF

    Specify germline CNVs from the matched normal sample. .

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --sv-enable-liquid-tumor-mode true

    DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.

    --sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE

    Set the Tumor-in-Normal (TiN) contamination tolerance level.

    For more information, see Structural Variant Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    Hashtable

    For DRAGEN somatic runs it is recommended to use the linear hashtable.

    See: Product Files

    Input options

    DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.

    FQ list Input

    FQ Input

    BAM Input

    CRAM Input

    Mapping and Aligning

    Option
    Description

    --enable-map-align true

    Optionally disable map & align (default=true).

    --enable-map-align-output true

    Optionally save the output BAM (default=false).

    --Aligner.clip-pe-overhang 2

    Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.

    Fractional (Raw Reads) Downsampling

    DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

    Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

    Option
    Description

    --enable-fractional-down-sampler

    Set to true to enable fractional downsampling. The default value is false.

    --down-sampler-normal-subsample

    Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

    --down-sampler-tumor-subsample

    Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

    --down-sampler-random-seed

    Specify the random seed for different runs of the same input data. The default value is 42.

    UMI

    Option
    Description

    --umi-source STRING

    Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.

    --umi-library-type STRING

    Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.

    --umi-nonrandom-whitelist $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.

    --umi-correction-table $PATH

    If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.

    --umi-min-supporting-reads INT

    Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).

    --umi-metrics-interval-file $BED

    Target region in BED format.

    For more information see: UMI Options.

    SNV

    Option
    Description

    --vc-target-bed

    Limit variant calling to region of interest.

    --vc-combine-phased-variants-distance INT

    Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)

    --vc-emit-ref-confidence GVCF

    To enable gVCF output.

    --vc-enable-vcf-output

    To enable VCF file output during a gVCF run, set to true. The default value is false.

    --vc-systematic-noise $PATH

    Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).

    --vc-somatic-hotspots $PATH

    DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.

    High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.

    High Sensitivity Option
    Description

    --vc-target-vaf FLOAT

    The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).

    --vc-enable-umi-solid true

    Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.

    --vc-enable-umi-liquid true

    Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.

    For more detail on the small variant caller in somatic mode please refer to Somatic Mode

    HLA

    Option
    Description

    --enable-hla

    Enable HLA typer (this setting by default will only genotype class 1 genes)

    --hla-as-filter-min-threshold

    Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.

    --hla-as-filter-ratio-threshold

    Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.

    --hla-enable-class-2

    Extend genotyping to HLA class 2 genes (default=true).

    CNV

    Option
    Description

    --cnv-enable-gcbias-correction true

    Enable or disable GC bias correction when generating target counts.

    --cnv-segmentation-mode $SEG_MODE

    Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.

    --cnv-segmentation-bed $PATH

    If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.

    --cnv-population-b-allele-vcf $POP_VCF

    Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.

    For more information, see CNV Calling.

    Annotation

    For instructions on how to download the Nirvana annotation database, please refer to Nirvana

    TMB

    The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.

    Option
    Description

    --tmb-vaf-threshold FLOAT

    Variant mininum allele frequency for usable variants (default=0.05)

    --vc-callability-tumor-thresh INT

    Required read coverage to use a site (default=50).

    --tmb-enable-proxi-filter BOOL

    Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).

    See the user guide: TMB Germline Variants.

    MSI

    Microsatellite sites file can be downloaded here: Product Files.

    Option
    Description and recommended setting

    --msi-coverage-threshold INT

    Minimum coverage for a microsatellite: 60 (default)

    --msi-distance-threshold FLOAT

    Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)

    SV

    Option
    Description

    --sv-call-regions-bed

    Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.

    --sv-exclusion-bed

    Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.

    --enable-variant-deduplication true

    Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.

    --sv-systematic-noise $BEDPE

    Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.

    --sv-somatic-ins-tandup-hotspot-regions-bed $BED

    Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)

    --sv-min-candidate-variant-size

    Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.

    Option
    Recommended Value for Liquid Tumors (e.g. AML/MLL)

    --sv-min-scored-variant-size $INT

    100000

    For more information, see Structural Variant Calling.

    Resource Files

    DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

    SNV Systematic Noise

    Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

    DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.

    Prebuild

    Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.

    Prebuilt WES/WGS noise files
    Description

    WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FF

    FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WGS FFPE (only hg38)

    WES_hg38_v2.0.0_systematic_noise.snv.bed.gz

    For WES FF and FFPE

    Custom

    This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.

    For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.

    Step 2. Generate the final noise file.

    This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted

    The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    SV Systematic Noise

    SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.

    Prebuilt

    Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.

    Prebuilt WGS noise files
    Description

    WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, FF/FFPE

    IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz

    For WGS, HEME

    Custom

    Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.

    Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.

    Step 2. Build the BEDPE file using input VCFs from previous step.

    Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.

    CNV Panel of Normals (PON)

    For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.

    If a matched normal is available it is recommended to include it in the PON.

    Step 1. Generate CNV target counts of individual normal samples.

    Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.

    Step 2. CNV combined counts file generation.

    $CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.

    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-emit-ref-confidence BP_RESOLUTION 
    --vc-enable-vcf-output true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${GVCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-emit-ref-confidence BP_RESOLUTION 
    --vc-enable-vcf-output true 
    --vc-enable-umi-solid true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${GVCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --tumor-normal-has-umi STRING           #Sample(s) containing UMI ['tumor', 'both']. 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-sq-filter-threshold 17.5           #recommended in tumor-normal UMI mode 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # TMB 
    --enable-tmb true 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-normal 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --tumor-normal-has-umi STRING           #Sample(s) containing UMI ['tumor', 'both']. 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-enable-umi-solid true              #>= 1% VAF 
    --vc-sq-filter-threshold 17.5           #recommended in tumor-normal UMI mode 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # TMB 
    --enable-tmb true 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-normal 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-emit-ref-confidence BP_RESOLUTION 
    --vc-enable-vcf-output true 
    --vc-enable-umi-solid true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${GVCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-enable-umi-liquid true             #>= 0.1% VAF 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --vc-callability-tumor-thresh 1000 
    --tmb-vaf-threshold 0.002 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-emit-ref-confidence BP_RESOLUTION 
    --vc-enable-vcf-output true 
    --vc-enable-umi-liquid true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${GVCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
    --fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    --enable-duplicate-marking true         #default=true 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Recommended 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    --vc-target-vaf 0.03                    #Default = 0.03 (>= 3% VAF) 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-use-somatic-vc-baf true 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # Annotation 
    --variant-annotation-data PATH 
    --enable-variant-annotation true 
    # TMB 
    --enable-tmb true 
    # HLA genotyper 
    --enable-hla true 
    --hla-as-filter-min-threshold 29.0      #panel specific setting 
    --hla-as-filter-ratio-threshold 0.85    #panel specific setting 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-normal 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --fastq-list $PATH 
    --fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --fastq-file1 $PATH 
    --fastq-file2 $PATH 
    --RGSM $STRING 
    --RGID $STRING 
    --tumor-bam-input $PATH 
    --bam-input $PATH 
    --tumor-cram-input $PATH 
    --cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-emit-ref-confidence BP_RESOLUTION 
    --vc-enable-vcf-output true 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${GVCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    # Inputs 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # Mapper 
    --enable-map-align true                 #optional with BAM/CRAM input 
    --enable-map-align-output true          #optionally save the output BAM 
    --enable-sort true                      #default=true 
    # UMI 
    --umi-enable true 
    --umi-source STRING                     #Default='qname' 
    --umi-library-type STRING               #e.g. random-duplex 
    --umi-min-supporting-reads 1            #Default=2 
    # Small variant caller 
    --enable-variant-caller true 
    --vc-target-bed $VC_TARGET_BED 
    --vc-systematic-noise $PATH             #Required 
    --vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
    # SV 
    --enable-sv true 
    --sv-exome true 
    --sv-call-regions-bed $SV_TARGET_BED 
    # CNV 
    --enable-cnv true 
    --cnv-population-b-allele-vcf $POP_VCF 
    --cnv-target-bed $PATH 
    --cnv-combined-counts $PATH             #CNV PON 
    # HRD Scoring 
    --enable-hrd true                       #requires CNV 
    # Annotation 
    --variant-annotation-data PATH 
    --vc-enable-germline-tagging true 
    # TMB 
    --enable-tmb true 
    --tmb-enable-proxi-filter true          #Optional for Tumor-Only 
    # HLA genotyper 
    --enable-hla true 
    # Microsatellite Instability (MSI) 
    --msi-command tumor-only 
    --msi-ref-normal-input $PATH 
    --msi-microsatellites-file $PATH 
    --msi-coverage-threshold 40 
    --tumor-fastq-list $PATH 
    --tumor-fastq-list-sample-id $STRING 
    --tumor-fastq1 $PATH 
    --tumor-fastq2 $PATH 
    --RGSM-tumor $STRING 
    --RGID-tumor $STRING 
    --tumor-bam-input $PATH 
    --tumor-cram-input $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --vc-detect-systematic-noise=true 
    --vc-target-bed $VC_TARGET_BED          #Region assessed in assay 
    --vc-target-bed-padding 500 
    --vc-enable-germline-tagging=true 
    --variant-annotation-data $PATH 
    --intermediate-results-dir $PATH 
    --output-directory $PATH 
    --output-file-prefix $STRING 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --build-sys-noise-vcfs-list ${VCF_LIST} 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    --umi-enable true 
    --umi-source STRING                     #default='qname' 
    --umi-library-type STRING               #see 'UMI' 
    --sv-detect-systematic-noise true 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
    --tumor-fastq-list-sample-id $STRING 
    # CNV 
    --enable-cnv true 
    --cnv-target-bed $PATH 
      
    /opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
    --ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
    --output-directory $OUTPUT 
    --intermediate-results-dir $PATH        #e.g. SSD /staging 
    --output-file-prefix $PREFIX 
    --enable-cnv true 
    --cnv-generate-combined-counts true 
    --cnv-normals-list $CNV_NORMALS_LIST 

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 2. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --vc-enable-liquid-tumor-mode true

    Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.

    --vc-override-tumor-pcr-params-with-normal false

    Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    Bed File Collection
    Bed File Collection
    Merge Duplex UMIs
    Bed File Collection

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --tumor-normal-has-umi STRING

    Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    Germline-aware Mode

    --umi-emit-multiplicity both

    Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.

    --umi-start-mask-length INT

    Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.

    --umi-end-mask-length INT

    Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.

    --tumor-normal-has-umi STRING

    Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    Germline-aware Mode

    --vc-sq-filter-threshold $NUM

    Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-systematic-noise-filter-threshold-in-hotspot $INT

    Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.

    --vc-excluded-regions-bed $BED

    Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection

    --sv-min-scored-variant-size

    After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.

    Germline-aware Mode

    Prepare a Reference Genome

    Before a reference genome can be used with DRAGEN, it must be converted from FASTA format into a custom binary format for use with the DRAGEN hardware. The options used in this preprocessing step offer tradeoffs between performance and mapping quality.

    Pre-built DRAGEN reference genomes are available for download in the Illumina customer portal. If you find that performance and mapping quality with these are adequate, there is a good chance that you can simply work with these supplied reference genomes. Depending on your read lengths and other particular aspects of your application, you may be able to improve mapping quality and/or performance by tuning the reference preprocessing options.

    Hash Table Background

    The DRAGEN mapper extracts many overlapping seeds (subsequences or K-mers) from each read, and looks up those seeds in a hash table residing in memory on its PCIe card, to identify locations in the reference genome where the seeds match. Hash tables are ideal for extremely fast lookups of exact matches. The DRAGEN hash table must be constructed from a chosen reference genome using the --build-hash-table option, which extracts many overlapping seeds from the reference genome, populates them into records in the hash table, and saves the hash table as a binary file.

    Automatic Reference Detection

    DRAGEN will attempt to detect the provided reference in order to automatically apply recommended resources and settings. There are four human references that DRAGEN can detect: hg38, hg19, hs37d5, and chm13v2. DRAGEN is able to detect references that contain a subset of the primary contigs from one of these references, as long as the names and lengths of the detected contigs are consistent with the names and lengths from the standarad assemblies of these references.

    In detail, automatic reference detection operates as follows:

    We define a primary contig of a human genome to be an autosome (1-22) or sex chromosome (X,Y). Let F be the input fasta. For each reference genome R in hg38, hg19, hs37d5, and chm13v2, DRAGEN checks if there are any contigs in F that have the same name and length as a primary contig in R, and that there are no contigs in F that have the same name as a contig in R, but with different length. If these conditions hold for exactly one of hg38, hg19, hs37d5, and chm13v2, then that reference is detected and resources may be applied automatically.

    The DRAGEN hash table builder will automatically apply decoy contigs and mask bed files to detected reference. Other pipelines may also apply automatic resources. For example variant callers may apply machine learning models and target bed files.

    Naming Conventions

    In order for DRAGEN to correctly detect the provided reference, it is important to use the standard naming conventions for each of the four human assemblies that DRAGEN detects:

    Assembly
    Autosome and Sex Chromosome Names

    Reference Seed Interval

    The size of the DRAGEN hash table is proportionate to the number of seeds populated from the reference genome. The default is to populate a seed starting at every position in the reference genome, ie, roughly 3 billion seeds from a human genome. This default requires at least 32 GB of memory on the DRAGEN PCIe board.

    To operate on larger, nonhuman genomes or to reduce hash table congestion, it is possible to populate less than all reference seeds using the --ht-ref-seed-interval option to specify an average reference interval. The default interval for 100% population is --ht-ref-seed-interval 1, and 50% population is specified with --ht-ref-seed-interval 2. The population interval does not need to be an integer. For example, --ht-ref-seed-interval 1.2 indicates 83.3% population, with mostly 1-base and some 2-base intervals to achieve a 1.2 base interval on average.

    Hash Table Occupancy

    It is characteristic of hash tables that they are allocated a certain size, but always retain some empty records, so they are less than 100% occupied. A healthy amount of empty space is important for quick access to the DRAGEN hash table. Approximately 90% occupancy is a good upper bound. Empty space is important because records are pseudo-randomly placed in the hash table, resulting in an abnormally high number of records in some places. These congested regions can get quite large as the percentage of empty space approaches zero, and queries by the DRAGEN mapper for some seeds can become increasingly slow.

    Hash Table / Seed Length

    The hash table is populated with reference seeds of a single common length. This primary seed length is controlled with the --ht-seed-len option, which defaults to 21.

    The longest primary seed supported is 27 bases when the table is 8 GB to 31.5 GB in size. Generally, longer seeds are better for run time performance, and shorter seeds are better for mapping quality (success rate and accuracy). A longer seed is more likely to be unique in the reference genome, facilitating fast mapping without needing to check many alternative locations. But a longer seed is also more likely to overlap a deviation from the reference (variant or sequencing error), which prevents successful mapping by an exact match of that seed (although another seed from the read may still map), and there are fewer long seed positions available in each read.

    Longer seeds are more appropriate for longer reads, because there are more seed positions available to avoid deviations.

    Seed Length Recommendations

    Hash Table / Seed Extensions

    Due to repetitive sequences, some seeds of any given length match many locations in the reference genome. DRAGEN uses a unique mechanism called seed extension to successfully map such high-frequency seeds. When the software determines that a primary seed occurs at many reference locations, it extends the seed by some number of bases at both ends, to some greater length that is more unique in the reference.

    For example, a 21-base primary seed may be extended by 7 bases at each end to a 35-base extended seed. A 21-base primary seed may match 100 places in the reference. But 35-base extensions of these 100 seed positions may divide into 40 groups of 1-3 identical 35-base seeds. Iterative seed extensions are also supported, and are automatically generated when a large set of identical primary seeds contains various subsets that are best resolved by different extension lengths.

    The maximum extended seed length, by default equal to the primary seed length plus 128, can be controlled with the --ht-max-ext-seed-len option. For example, for short reads, it is advisable to set the maximum extended seed shorter than the read length, because extensions longer than the whole read can never match.

    It is also possible to tune how aggressively seeds are extended using the following options (advanced usage):

    --ht-cost-coeff-seed-len

    --ht-cost-coeff-seed-freq

    --ht-cost-penalty

    --ht-cost-penalty-incr

    There is a tradeoff between extension length and hit frequency. Faster mapping can be achieved using longer seed extensions to reduce seed hit frequencies, or more accurate mapping can be achieved by avoiding seed extensions or keeping extensions short, while tolerating the higher hit frequencies that result. Shorter extensions can benefit mapping quality both by fitting seeds better between SNPs, and by finding more candidate mapping locations at which to score alignments. The default extension settings along with default seed frequency settings, lean aggressively toward mapping accuracy, with relatively short seed extensions and high hit frequencies.

    The defaults for the seed frequency options are as follows:

    Option
    Default

    Seed Frequency Limit and Target

    One primary or extended seed can match multiple places in the reference genome. All such matches are populated into the hash table, and retrieved when the DRAGEN mapper looks up a corresponding seed extracted from a read. The multiple reference positions are then considered and compared to generate aligned mapper output. However, the DRAGEN software enforces a limit on the number of matches, or frequency, of each seed, which is controlled with the --ht-max-seed-freq option. By default, the frequency limit is 16. In practice, when the software encounters a seed with higher frequency, it extends it to a sufficiently long secondary seed that the frequency of any particular extended seed pattern falls within the limit. However, if a maximum seed extension would still exceed the limit, the seed is rejected, and not populated into the hash table. Instead, a single High Frequency record is populated.

    This seed frequency limit does not tend to impact DRAGEN mapping quality notably, for two reasons. First, because seeds are rejected only when extension fails, only extremely high-frequency primary seeds, typically with many thousands of matches are rejected. Such seeds are not very useful for mapping. Second, there are other seed positions to check in a given read. If another seed position is unique enough to return one or more matches, the read can still be properly mapped. However, if all seed positions were rejected as high frequency, often this means that the entire read matches similarly well in many reference positions, so even if the read were mapped it would be an arbitrary choice, with very low or zero MAPQ.

    Thus, the default frequency limit of 16 for --ht-max-seed-freq works well. However, it may be decreased or increased, up to a maximum of 256. A higher frequency limit tends to marginally increase the number of reads mapped (especially for short reads), but commonly the additional mapped reads have very low or zero MAPQ. This also tends to slow down DRAGEN mapping, because correspondingly large numbers of possible mappings are occasionally considered.

    In addition to a frequency limit, a target seed frequency can be specified with --ht-target-seed-freq option. This target frequency is used when extensions are generated for high frequency primary seeds. Extension lengths are chosen with a preference toward extended seed frequencies near the target. The default of 4 for --ht-target-seed-freq means that the software is biased toward generating shorter seed extensions than necessary to map seeds uniquely.

    References with ALT contigs

    When building a reference hash table from a fasta with ALT contigs, it may be desired to mask certain regions of high similarity, or to establish a liftover realtionships between primary and alternate contigs. The recommended approach is masking, as described in the Map-Align section. When hg19 or hg38 alt contigs are detected, the hash table builder will require a liftover file or a bed file to mask the alt contigs. If non are provided, a mask bed file from <INSTALL_PATH>/resources/ht_builder/ will be used automaticaly.

    Masked References

    DRAGEN has adopted a masked approach to handle native reference ALT contigs, where strategic regions are masked to increased accuracy. The hash table builder will build the mapper hash table as if the regions that were specified in the argument for ht-mask-bed were masked with N's. The hash table builder will only allow setting one of ht-mask-bed or ht-alt-liftover. Each line in the bed file is expected to contain a contig name, start position (0-based), and end position (1-based), seperated by a single tab or space. Lines that start with # are ignored by the hash table builder to allow commenting. Any line with a contig name that is not found in the input fasta is skipped and logged to the DRAGEN log file. Likewise, lines that describe empty intervals are skipped. If all lines are skipped this way, the hash table builder will issue an error and abort, unless the mask bed file was automatically applied (see Automatic masking). The hash table builder will always issue an error and abort if an interval described in the BED file is outside of the range of the corresponding contig in the fasta. Lines that are not skipped are written to a file called mask.bed that will be present in the hash table output directory, and whose digest will appear in hash_table.cfg. This file is used when a reference is loaded to the FPGA card to dynamically mask reference.bin.

    Automatic masking

    When running from a fasta for which hg38 or hg19 is detected (See Automatic Reference Detection), and no argument for ht-mask-bed or ht-alt-liftover was provided, the hash table builder will automatically apply the corresponding bed file for the detected reference from <INSTALL_PATH>/resources/ht_builder/. Note that the hash table builder will identify alt contigs by name. So when running from an input fasta that contains alt contig with standard names but modified base content, it is recommended to suppress automatic masking by setting ht-suppress-mask=true or by passing a custom mask bed file to ht-mask-bed.

    Handling Decoy Contigs

    The behavior of DRAGEN with respect to the handling of decoy contigs in the reference has changed since version 2.6.

    Starting with DRAGEN 3.x, DRAGEN's hash table builder automatically detects the absence of the decoy contigs from the reference and adds it to the FASTA file, prior to building the hash table. The decoys file is found at <INSTALL_PATH>/resources/ht_builder/hs_decoys.fa.gz. If the reference is missing the decoy contigs, then the reads which map to the decoy contigs are artificially marked as unmapped in the output BAM (because the original reference does not have the decoy contig). This results in an artificially lower mapping rate, however, the accuracy of variant calling is improved thanks to removing false positive caused by decoy reads.

    Illumina recommends using this feature by default. However, you can to set the --ht-suppress-decoys option to true to suppress adding these decoys to the hash table.

    The table below describes the difference in behavior between older DRAGEN versions (2.6 and earlier) and DRAGEN 3.x versions with respect to the handling of decoy contigs in the hash table builder:

    DRAGEN Behavior
    DRAGEN 2.6 and earlier versions
    DRAGEN 3.0 and later versions

    Prepare a Pangenome Reference

    DRAGEN analysis is capable of mapping on a pangenome hash table. The pangenome hash table introduces alternate graph paths to the linear reference hash table to represent more broadly the allelic diversity of the population over the whole genome or in specific regions defined in a bed file. Gain on accuracy from this methodology has been described in scientific blogs available on the . Mutigenome hash tables for CHM13_v2, hg38, hg19 and hs37d5 assemblies are available on the .

    See for information on the multigenome mapping method.

    It is possible to build a custom pangenome reference in order to:

    • customize the released pangenome hash table with custom bed files or hash table builder options. A set of bed files are available in the resource files on the .

    • generate a population-specific-pangenome hash table from pangenome msVCF generated from the BSSH app.

    • generate a human or non-human pangenome hash table from customer-provided msVCF.

    The input files required are a single multi-sample VCF file containing the set of population variants, and optionally bed files restricting graph to some region. The generated files, including hash_table.cmp and associated files in the specified output directory, can then be used as the reference hash table for the DRAGEN mapper. DRAGEN software supports the tool on human reference with files available on the . For non-human, the user provides the required resource files.

    Usage

    To enable the pangenome hash table builder, example command usage is :

    dragen --build-hash-table true (required) --ht-graph-msvcf-file <path to a multi-sampple VCF file (required for pangenome reference) --ht-reference <reference.fasta> (required) --ht-graph-extra-kmer-bed < graph.bed> (optional) --ht-mask-bed <mask.bed> (optional) --ht-graph-exclusion-bed <exclusion bed> (optional) --output-directory <DIR> (required) [options]

    Inputs

    Set of population variants, in a multi-sample VCF (msVCF)

    The custom pangenome hash table builder tool uses a set of population variants provided by the user to generate a pangenome hash table. The variants must be specified in VCF format, in a single multi-sample VCF (msVCF) file containing the variants for a set of individuals. This multi-sample VCF file must have specific formatting described below.

    Specific msVCF input formatting

    The custom pangenome hash table builder tool only supports msVCF file input respecting the format described below:

    • msVCF compliant with 4.2 VCF format specification

    • with variants positionally sorted in the same contig order as the main FASTA reference genome provided in --ht-reference

    • records shall include diploid or haploid GT calls

    • supports multi-allelic variants merged in multi-line or separated in multiple lines

    Note: INFO/FORMAT subfields must be defined in the header. Events with undefined subfields are ignored.

    To build a high-performance custom genome it is highly recommended to use long read sequencing data. We recommend using external tools such as Whatshap (https://github.com/whatshap/whatshap) to generate phased input. DRAGEN analysis leverages the phasing information to reconstruct population haplotypes.

    Reference genome

    A reference genome in FASTA format must be provided. Reference genomes are available to download from the .

    Note: the reference genome provided as input must be the same as the one used to generate the input phased msVCF. If the msVCF contains variants from regions not present in the fasta file, the pangenome reference builder will stop with an error.

    Exclusion bed file (optional)

    This bed file is used to filter out regions of the msVCF file. Variants that fall within intervals defined in the "Graph exclusion bed" file will be ignored and not used in any part of the pangenome reference builder. The result will be the same as if the input msVCF did not contain any variants in the regions defined in the exclusion bed. The file is optional, by default every variants in the msVCF file will be used. Exclusion bed files are available to download from .

    A custom exclusion bed file can also be provided given the following format: tab delimited with first three columns being: contig name, start position, end position. Any line with a contig name that is not found in the input FASTA is skipped. Any lines that describe empty intervals are skipped.

    Note: records of the exclusion bed file provided must be from the same build as the reference genome used to build the pangenome reference.

    Extra kmer bed file (optional)

    This file is used to define regions in the genome where extra seeds will be indexed in the hash table. By default, only seed extracted from the primary reference will be extracted and saved in the reference hash table for mapping. This option will additionally generate seeds from population variants in the defined regions. It is recommended to include the expected difficult regions in this bed file. Extra-kmer-bed files are available to download from for the human hg38, hg19, hs37d5, and chm13 references.

    An Extra-kmer-bed bed file can also be provided given the following format: tab delimited with first three columns being: contig name, start position, end position. Any line with a contig name that is not found in the input FASTA is skipped. Any lines that describe empty intervals are skipped.

    Note: records of the Extra-kmer-bed file provided must be from the same build as the reference genome used to build the graph reference.

    Mask bed file (recommended)

    A mask bed file must be provided in order to mask certain regions of high similarity between primary and alternate contigs present in the main genome FASTA. Mask bed files are available to download from the .

    A custom mask bed file can also be provided given the following format: tab delimited with first three columns being: contig name, start position, end position. Any line with a contig name that is not found in the input FASTA is skipped. Any lines that describe empty intervals are skipped.

    Note: records of the mask bed file provided must be from the same build as the reference genome used to build the graph reference.

    Command line options

    Option
    Required
    Description

    Note: The custom graph reference hash table end to end pipeline will return an error if options --ht-alt-liftover or --ht-allow-mask-and-liftover are specified.

    Output

    The hash table builder generates the following outputs:

    File
    Description

    Prepare a linear Reference

    Usage

    Use the --build-hash-table option to transform a reference FASTA into the hash table for DRAGEN mapping. It takes as input a FASTA file (multiple reference sequences being concatenated) and a preexisting output directory. Build command usage is as follows:

    Input

    The --ht-reference and --output-directory options are required for building a hash table. The --ht‑reference option specifies the path to the reference FASTA file, while --output-directory specifies a preexisting directory where the hash table output files are written. Illumina recommends organizing various hash table builds into different folders. As a best practice, folder names should include any nondefault parameter settings used to generate the contained hash table. The sequence names in the reference FASTA file must be unique.

    Command line options

    Option
    Required
    Description

    Liftover Based ALT-Aware Hash Tables

    While masking is the recommended approach to dealing with ALT contigs, DRAGEN also supports a liftover based method. To enable liftover based ALT-aware mapping in DRAGEN, build the hash table with a liftover file by using the --ht-alt-liftover option. The hash table builder classifies each reference sequence as primary or alternate based on the liftover file, and packs primaries before alternates in reference.bin. SAM liftover files for hg38DH and hg19 are in the <INSTALL_PATH>/resources/ht_builder folder.

    Custom Liftover Files

    Custom liftover files can be used in place of those provided with DRAGEN. Liftover files must be SAM format, but no SAM header is required. SEQ and QUAL fields can be omitted ('*'). Each alignment record should have an alternate haplotype reference sequence name as QNAME, indicating the RNAME and POS of its liftover alignment in a destination (normally primary assembly) reference sequence.

    Reverse-complemented alignments are indicated by bit 0x10 in FLAG. Records flagged unmapped (0x4) or secondary (0x100) are ignored. The CIGAR may include hard or soft clipping, leaving parts of the ALT contig unaligned.

    A single reference sequence cannot serve as both an ALT contig (appearing in QNAME) and a liftover destination (appearing in RNAME). Multiple ALT contigs can align to the same primary assembly location. Multiple alignments can also be provided for a single ALT contig (extras optionally be flagged 0x800 supplementary), such as to align one portion forward and another portion reverse-complemented. However, each base of the ALT contig only receives one liftover image, according to the first alignment record with an M CIGAR operation covering that base.

    SAM records with QNAME missing from the reference genome are ignored, so that the same liftover file may be used for various reference subsets, but an error occurs if any alignment has its QNAME present but its RNAME absent.

    Options for advanced users

    Primary Seed Length

    The --ht-seed-len option specifies the initial length in nucleotides of seeds from the reference genome to populate into the hash table. At run time, the mapper extracts seeds of this same length from each read, and looks for exact matches (unless seed editing is enabled) in the hash table.

    The maximum primary seed length is a function of hash table size. The limit is k=27 for table sizes from 16 GB to 64 GB, covering typical sizes for whole human genome, or k=26 for sizes from 4 GB to 16 GB.

    The minimum primary seed length depends mainly on the reference genome size and complexity. It needs to be long enough to resolve most reference positions uniquely. For whole human genome references, hash table construction typically fails with k < 16. The lower bound may be smaller for shorter genomes, or higher for less complex (more repetitive) genomes. The uniqueness threshold of --ht-seed-len 16 for the 3.1Gbp human genome can be understood intuitively because log4(3.1 G) ≈ 16, so it requires at least 16 choices from 4 nucleotides to distinguish 3.1 G reference positions.

    Accuracy Considerations

    For read mapping to succeed, at least one primary seed must match exactly (or with a single SNP when edited seeds are used). Shorter seeds are more likely to map successfully to the reference, because they are less likely to overlap variants or sequencing errors, and because more of them fit in each read. So for mapping accuracy, shorter seeds are mainly better.

    However, very short seeds can sometimes reduce mapping accuracy. Very short seeds often map to multiple reference positions, and lead the mapper to consider more false mapping locations. Due to imperfect modeling of mutations and errors by Smith-Waterman alignment scoring and other heuristics, occasionally these noise matches may be reported. Run time quality filters such as --Aligner.aln_min_score can control the accuracy issues with very short seeds.

    Speed Considerations

    Shorter seeds tend to slow down mapping, because they map to more reference locations, resulting in more work such as Smith-Waterman alignments to determine the best result. This effect is most pronounced when primary seed length approaches the reference genome's uniqueness threshold, eg, K=16 for whole human genome.

    Application Considerations

    Read Length---Generally, shorter seeds are appropriate for shorter reads, and longer seeds for longer reads. Within a short read, a few mismatch positions (variants or sequencing errors) can chop the read into only short segments matching the reference, so that only a short seed can fit between the differences and match the reference exactly. For example, in a 36 bp read, just one SNP in the middle can block seeds longer than 18 bp from matching the reference. By contrast, in a 250 bp read, it takes 15 SNPs to exceed a 0.01% chance of blocking even 27 bp seeds.

    Paired Ends---The use of paired end reads can make longer seeds yield good mapping accuracy. DRAGEN uses paired end information to improve mapping accuracy, including with rescue scans that search the expected reference window when only one mate has seeds mapping to a given reference region. Thus, paired end reads have essentially twice the opportunity for an exact matching seed to find their correct alignments.

    Variant or Error Rate---When read differences from the reference are more frequent, shorter seeds may be required to fit between the difference positions in a given read and match the reference exactly.

    Mapping Percentage Requirement---If the application requires a high percentage of reads to be mapped somewhere (even at low MAPQ), short seeds may be helpful. Some reads that do not match the reference well anywhere are more likely to map using short seeds to find partial matches to the reference.

    Maximum Seed Length

    The --ht-max-ext-seed-len option limits the length of extended seeds populated into the hash table. Primary seeds (length specified by --ht-seed-len) that match many reference positions can be extended to achieve more unique matching, which may be required to map seeds within the maximum hit frequency (--ht-max-seed-freq).

    Given a primary seed length k, the maximum seed length can be configured between k and k+128. The default is the upper bound, k+128.

    When to Limit Seed Extension

    The --ht-max-ext-seed-len option is recommended for short reads, eg, less than 50 bp. In such cases, it is helpful to limit seed extension to the read length minus a small margin, such as 1-4 bp. For example, with 36 bp reads, setting --ht-max-ext-seed-len to 35 might be appropriate. This ensures that the hash table builder does not plan a seed extension longer than the read causing seed extension and mapping to fail at run time, for seeds that could have fit within the read with shorter extensions.

    While seed extension can be similarly limited for longer reads, eg, setting --ht-max-ext-seed-len to 99 for 100 bp reads, there is little utility in this because seeds are extended conservatively in any event. Even with the default k+128 limit, individual seeds are only extended to the lengths required to fit under the maximum hit frequency (--ht-max-seed-freq), and at most a few bases longer to approach the target hit frequency (‑‑ht‑target-seed-freq), or to avoid taking too many incremental extension steps.

    Maximum Hit Frequency

    The --ht-max-seed-freq option sets a firm limit on the number of seed hits (reference genome locations) that can be populated for any primary or extended seed. If a given primary seed maps to more reference positions than this limit, it must be extended long enough that the extended seeds subdivide into smaller groups of identical seeds under the limit. If, even at the maximum extended seed length (--ht-max-ext-seed-len), a group of identical reference seeds is larger than this limit, their reference positions are not populated into the hash table. Instead, a single High Frequency record is populated.

    The maximum hit frequency can be configured from 1 to 256. However, if this value is too low, hash table construction can fail because too many seed extensions are needed. The practical minimum for a whole human genome reference, other options being default, is 8.

    Accuracy Considerations

    Generally, a higher maximum hit frequency leads to more successful mapping. There are two reasons for this. First, a higher limit rejects fewer reference positions that cannot map under it. Second, a higher limit allows seed extensions to be shorter, improving the odds of exact seed matching without overlapping variants or sequencing errors.

    However, as with very short seeds, allowing high hit counts can sometimes hurt mapping accuracy. Most of the seed hits in a large group are not to the true mapping location, and occasionally one of these noise hits may be reported due to imperfect scoring models. Also, the mapper limits the total number of reference positions it considers, and allowing very high hit counts can potentially crowd out the actual best match from consideration.

    Speed Considerations

    Higher maximum hit frequencies slow down read mapping, because seed mapping finds more reference locations, resulting in more work, such as Smith-Waterman alignments, to determine the best result.

    Pangenome Reference

    The DRAGEN Software enables the user to build a custom pangenome hash table from a set of population variants. The population variants are specified in a single multi-sample VCF file.

    • --ht-graph-msvcf-file: Input file containing list of population variants, in multi-sample VCF format.

    This replaces the previous options that were previously used to build a graph Reference that are now deprecated.

    List of deprecated options :

    • --ht-pop-alt-contigs: Population based alternate contigs FASTA.

    • --ht-pop-alt-liftover: Liftover SAM file of population alternate contigs.

    • --ht-pop-snps: Population based SNPs VCF

    ALT-Contigs

    The following options control building hash tables from references with ALT-contigs. See References with ALT contigs for more information.

    • --ht-mask-bed: Set a custom BED file that defines which regions to mask. If not provided, the DRAGEN software automatically applies BED files for hg38 and hg19 from <INSTALL_PATH>/resources/ht_builder.

    • --ht-alt-liftover: Set a liftover file to build a liftover based ALT-aware hash table. SAM liftover files for hg38DH and hg19 are provided in <INSTALL_PATH>/resources/ht_builder.

    • --ht-allow-mask-and-liftover

    Decoy Contigs

    • --ht-decoys The DRAGEN software automatically detects the use of hg19 and hg38 references and adds decoys to the hash table when they are not found in the FASTA file. Use the --ht-decoys option to specify the path to a decoys file. The default is <INSTALL_PATH>/resources/ht_builder/hs_decoys.fa.gz.

    • --ht-suppress-decoys: Suppress automatic detection of the default decoys file when building the hash table.

    Processing Options

    • --ht-num-threads The --ht-num-threads option determines the maximum number of worker CPU threads that are used to speed up hash table construction. The default for this option is 8, with a maximum of 32 threads allowed. If your server supports execution of more threads, it is recommended that you use the maximum. For example, the DRAGEN servers contain 24 cores that have hyperthreading enabled, so a value of 32 should be used. When using a higher value, adjust --ht-max-table-chunks needs to be adjusted as well. The servers have 128 GB of memory available.

    • --ht-max-table-chunks The --ht-max-table-chunks option controls the memory footprint during hash table construction by limiting the number of ~1 GB hash table chunks that reside in memory simultaneously. Each additional chunk consumes roughly twice its size (~2 GB) in system memory during construction. The hash table is divided into power-of-two independent chunks, of a fixed chunk size, X, which depends on the hash table size, in the range 0.5 GB < X ≤ 1 GB. For example, a 24 GB hash table contains 32 independent 0.75 GB chunks that can be constructed by parallel threads with enough memory and a 16 GB hash table contains 16 independent 1 GB chunks. The default is

    Size Options

    • --ht-mem-limit Memory Limit. The --ht-mem-limit option controls the generated hash table size by specifying the DRAGEN card memory available for both the hash table and the encoded reference genome. The ‑‑ht‑mem-limit option defaults to 32 GB when the reference genome approaches WHG size, or to a generous size for smaller references. Normally there is little reason to override these defaults.

    • --ht-size Hash Table Size. This option specifies the hash table size to generate, rather than calculating an appropriate table size from the reference genome size and the available memory (option --ht-mem-limit). Using default table sizing is recommended and using --ht-mem-limit

    Seed Population Options

    • --ht-ref-seed-interval Seed Interval. The --ht-ref-seed-interval option defines the step size between positions of seeds in the reference genome populated into the hash table. An interval of 1 (default) means that every seed position is populated, 2 means 50% of positions are populated, etc. Noninteger values are supported, eg, 2.5 yields 40% populated. Seeds from a whole human reference are easily 100% populated with 32 GB memory on DRAGEN boards. If a substantially larger reference genome is used, change this option.

    • --ht-soft-seed-freq-cap and --ht-max-dec-factor Soft Frequency Cap and Maximum Decimation Factor for Seed Thinning. Seed thinning is an experimental technique to improve mapping performance in high-frequency regions. When primary seeds have higher frequency than the cap indicated by the --ht-soft-seed-freq-cap option

    Seed Extension Control

    DRAGEN seed extension is dynamic, applied as needed for particular K-mers that map to too many reference locations. Seeds are incrementally extended in steps of 2--14 bases (always even) from a primary seed length to a fully extended length. The bases are appended symmetrically in each extension step, determining the next extension increment if any.

    There is a potentially complex seed extension tree associated with each high frequency primary seed. Each full tree is generated during hash table construction and a path from the root is traced by iterative extension steps during seed mapping. The hash table builder employs a dynamic programming algorithm to search the space of all possible seed extension trees for an optimal one, using a cost function that balances mapping accuracy and speed. The following options define that cost function:

    • --ht-target-seed-freq Target Hit Frequency. The --ht-target-seed-freq option defines the ideal number of hits per seed for which seed extension should aim. Higher values lead to fewer and shorter final seed extensions, because shorter seeds tend to match more reference positions.

    • --ht-cost-coeff-seed-len Cost Coefficient for Seed Length The --ht-cost-coeff-seed-len option assigns the cost component for each base by which a seed is extended. Additional bases are considered a cost because longer seeds risk overlapping variants or sequencing errors and losing their correct mappings. Higher values lead to shorter final seed extensions.

    Pipeline Specific Hash Tables

    RNA-Seq

    When building a hash table, DRAGEN configures the options for DNA analysis by default. To run RNA-Seq data, you must build an RNA-Seq hash table by setting --ht-build-rna-hashtable to true. If running RNA-Seq alignment, use the original --output-directory instead of the automatically generated subdirectory.

    CNV

    If using the CNV pipeline, set --ht-build-cnv-hashtable to true. The command generates an additional Kmer hash map that is used in the CNV algorithm. Illumina recommends to always use the --ht-build-cnv-hashtable option, so you can perform CNV calling with the same hash table used for mapping and aligning.

    Methylation

    To run the methylation pipeline, you must build a methylation-specific hash table. DRAGEN can build a single-pass or legacy multi-pass methylation hash table. Methylation runs using a single-pass hash table are completed faster than the legacy multipass hash tables. Single-pass hash tables are recommended for building methylation tables and running analyses.

    Hash Table Type
    Hash Table Commands

    Single-pass

    The following is an example of a single-pass hash table build. The example generates a combined hash table in your reference index folder under the methyl_converted subdirectory.

    dragen --build-hash-table true \ --output-directory $REFDIR \ --ht-reference $FASTA \ --ht-num-threads 40 \ --ht-methylated-combined=true \ --ht-seed-len 27

    Multipass

    Multi-pass methylation mapping requires building two special hash tables with reference bases converted from C to T in one table and G to A in the other table. The conversions are performed automatically when using the --ht-methylated command line option. The converted hash tables are generated in two subdirectories under the folder specified using the --output-directory command line option. The subdirectories are named CT_converted and GA_converted, corresponding with the base conversions. When using the hash tables for methylated alignment runs, make sure to refer to the --output-directory folder, not the subdirectories.

    The base conversions remove a significant amount of information from the hash tables. You might need to use different hash table parameters than in a conventional hash table build. The following options are recommended for building hash tables for mammalian species.

    dragen --build-hash-table=true --output-directory $REFDIR --ht-reference $FASTA --ht-max-seed-freq 16 --ht-seed-len 27 --ht-num-threads 40 --ht-methylated=true

    HLA

    To run the HLA caller, an HLA-specific anchored reference hash table must be built. Set --ht-build-hla-hashtable to true. The command will create a anchored_hla subdirectory inside the --output-directory. The HLA-specific reference subdirectory can be built at the same time as the primary reference construction.

    An HLA resource file is packaged with DRAGEN and located at the following path after installation: <INSTALL_PATH>/resources/hla/HLA_resource.v1.fasta.gz. This file is used by default when building the HLA-specific anchored hash table. A custom file can be specified with --ht-hla-reference. See the HLA section for more information

    with the following FILTER codes, non-PASS records are ignored:

    • ##FILTER=<ID=PASS,Description="All filters passed">

  • with the following FORMAT field :

    • ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

  • for better results, we recommend variants to be left-aligned.

  • maximum number of recommended samples in the msVCF is 256. Higher number may lead to very high memory usage at hash table creation.

  • --ht-mask-bed

    No (but recommended)

    Path to the mask bed file

    --ht-graph-exclusion-bed

    No

    Path to the exclusion bed file

    --output-directory

    Yes

    Specify the directory where all related hash table files will be written

    : Allow the use of both
    --ht-mask-bed
    and
    --ht-alt-liftover
    together.
  • --ht-suppress-mask: Suppress automatic detection of the default mask bed files when building the hash table.

  • --ht-max-table-chunks
    equal to
    --ht-num-threads
    , but with a minimum default
    --ht-max-table-chunks
    of 8. It makes sense to have these two options match, because building one hash table chunk requires one chunk space in memory and one thread to work on it. Nevertheless, there are build-speed advantages to raising
    --ht-max-table-chunks
    higher than
    --ht-num-threads
    , or to raising
    --ht-num-threads
    higher than
    --ht-max-table-chunks
    .
    is the next best choice.
    , only a fraction of seed positions are populated to stay under the cap. The
    --ht-max-dec-factor
    option specifies a maximum factor by which seeds can be thinned. For example,
    --ht-max-dec-factor 3
    retains at least 1/3 of the original seeds.
    --ht-max-dec-factor 1
    disables any thinning. Seeds are decimated in careful patterns to prevent leaving any long gaps unpopulated. The idea is that seed thinning can achieve mapped seed coverage in high frequency reference regions where the maximum hit frequency would otherwise have been exceeded. Seed thinning can also keep seed extensions shorter, which is also good for successful mapping. Based on testing to date, seed thinning has not proven to be superior to other accuracy optimization methods.
  • --ht-rand-hit-hifreq and --ht-rand-hit-extend Random Sample Hit with HIFREQ Record and EXTEND Record. Whenever a HIFREQ or EXTEND record is populated into the hash table, it stands in place of a large set of reference hits for a certain seed. Optionally, the hash table builder can choose a random representative of that set, and populate that HIT record alongside the HIFREQ or EXTEND record. Random sample hits provide alternative alignments that are very useful in estimating MAPQ accurately for the alignments that are reported. They are never used outside of this context for reporting alignment positions, because that would result in biased coverage of locations that happened to be selected during hash table construction. To include a sample hit, set --ht-rand-hit-hifreq to 1. The --ht-rand-hit-extend option is a minimum pre-extension hit count to include a sample hit, or zero to disable. Modifying these options is not recommended.

  • --ht-cost-coeff-seed-freq Cost Coefficient for Hit Frequency. The --ht-cost-coeff-seed-freq option assigns the cost component for the difference between the target hit frequency and the number of hits populated for a single seed. Higher values result primarily in high-frequency seeds being extended further to bring their frequencies down toward the target.

  • --ht-cost-penalty Cost Penalty for Seed Extension. The --ht-cost-penalty option assigns a flat cost for extending beyond the primary seed length. A higher value results in fewer seeds being extended at all. Current testing shows that zero (0) is appropriate for this parameter.

  • --ht-cost-penalty-incr Cost Increment for Extension Step. The --ht-cost-penalty-incr option assigns a recurring cost for each incremental seed extension step taken from primary to final extended seed length. More steps are considered a higher cost because extending in many small steps requires more hash table space for intermediate EXTEND records, and takes substantially more run time to execute the extensions. A higher value results in seed extension trees with fewer nodes, reaching from the root primary seed length to leaf extended seed lengths in fewer, larger steps.

  • hg38, hg19, chm13v2

    chr1-chr22, chrX, chrY

    hs37d5

    1-22, X, Y

    Value for --ht-seed-len

    Read Length

    21

    100 bp to 150 bp

    17 to 19

    shorter reads (36 bp)

    27

    250+ bp

    --ht-cost-coeff-seed-len

    1

    --ht-cost-coeff-seed-freq

    0.5

    --ht-cost-penalty

    0

    --ht-cost-penalty-incr

    0.7

    --ht-max-seed-freq

    16

    --ht-target-seed-freq

    4

    Reference does not include the decoy contigs (eg, hg19)

    Decoy reads mismap elsewhere in the genome due to the lack of contigs in the reference. Artificially higher mapping rate. False positive calls in noisy regions to which the decoy contigs are mismapped.

    DRAGEN automatically detects the absence of the decoy contig from the reference and adds it to the FASTA file. Artificially lower mapping rate because decoy reads which map to the decoy contigs are artificially marked as unmapped in the output BAM (because the original reference does not have the decoy contig). False positive calls are avoided thanks to adding the decoy contigs under the hood. Therefore this helps variant calling.

    Reference includes the decoy contigs (eg, hs37d5)

    Decoy reads map to the decoy contigs. High mapping rate. No false positive calls caused by decoy reads because decoy reads map to the right place

    Decoy reads map to the decoy contigs. High mapping rate. No false positive calls caused by decoy reads because decoy reads map to the right place

    --build-hash-table

    Yes

    Set to true

    --ht-graph-msvcf-file

    Yes

    Path to the multi-sample VCF file containing population variants

    --ht-reference

    Yes

    Path to the reference genome FASTA file.

    --ht-graph-extra-kmer-bed

    No

    Path to the extra kmer bed file

    reference.bin

    The reference sequences, encoded in 4 bits per base. Four-bit codes are used, so the size in bytes is roughly half the reference genome size. In between reference sequences, N are trimmed and padding is automatically inserted. For example, hg19 has 3,137,161,264 bases in 93 sequences. This is encoded in 1,526,285,312 bytes = 1.46 GB, where 1 GB means 1 GiB or 2^30^ bytes.

    hash_table.cmp

    Compressed hash table. The hash table is decompressed and used by the DRAGEN mapper to look up primary seeds with length specified by the --ht-seed-len option and extended seeds of various lengths.

    hash_table.cfg

    A list of parameters and attributes for the generated hash table, in a text format. This file provides key information about the reference genome and hash table.

    hash_table.cfg.bin

    A binary version of hash_table.cfg used to configure the DRAGEN hardware.

    hash_table_stats.txt

    A text file listing extensive internal statistics on the constructed hash including the hash table occupancy percentages. This table is for information purposes. It is not used by other tools.

    mask.bed

    Present only for masked hash tables. A tab delimeted bed file that describes the masked regions. Contains all lines from the input bed file that are not comment lines, lines that describe empty intervals, or lines with contig names that were not found in the input fasta.

    --build-hash-table

    Yes

    Set to true

    --ht-reference

    Yes

    Path to the reference genome FASTA file.

    --ht-mask-bed

    No (but recommended)

    Path to the mask bed file. If not provided, the DRAGEN software automatically applies BED files for hg38 and hg19 from <INSTALL_PATH>/resources/ht_builder.

    --output-directory

    Yes

    Specify the directory where all related hash table files will be written

    single-pass

    --ht-methylated-combined=true --ht-seed-len 27

    multi-pass

    --ht-methylated=true --ht-seed-len 27 --ht-max-seed-freq 16

    Illumina Genomics Research Hub site
    DRAGEN Software Support Site page
    DRAGEN Multigenome Mapper
    DRAGEN Software Support Site page
    DRAGEN Software Support Site page
    DRAGEN Software Support Site page
    DRAGEN Software Support Site page
    DRAGEN Software Support Site page
    DRAGEN Software Support Site page
    Using Custom HLA Reference Files
    dragen --build-hash-table true [options] --ht-reference
    <reference.fasta> --output-directory <outdir>

    DRAGEN Host Software

    You use the DRAGEN host software program dragen to build and load reference genomes, and then to analyze sequencing data by decompressing the data, mapping, aligning, sorting, duplicate marking with optional removal, and variant calling.

    Invoke the software using the dragen command. The command line options are described in the following sections.

    Command line options can also be set in a configuration file. For more information on configuration files, see Configuration Files . If an option is set in the configuration file and is also specified on the command-line, the command line option overrides the configuration file.

    Command-line Options

    The following are examples of frequently used command lines:

    • Build Reference/Hash Table

    • Run Map/Align and Variant Caller (*.fastq to *.vcf)

    • Run Map/Align (*.fastq to *.bam)

    • Run Variant Caller Only (*.bam to *.vcf)

    For recommended command lines in typical use cases, see .

    Reference Genome Options

    Before you can use the DRAGEN system for aligning reads, you must load a reference genome and its associated hash tables onto the PCIe card. For information on preprocessing a reference genome's FASTA files into the native DRAGEN binary reference and hash table formats, see . You must also specify the directory containing the preprocessed binary reference and hash tables with the -r [or --ref-dir] option. This argument is always required.

    Use the following command to load the reference genome and hash tables to DRAGEN card memory separately from processing reads.

    dragen -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149

    Use the -l (--force-load-reference) option to force the reference genome to load even if it is already loaded.

    dragen -l -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149

    The time needed to load the reference genome depends on the size of the reference, but for typical recommended settings, it takes approximately 30--60 seconds.

    Operating Modes

    DRAGEN has two primary modes of operation, as follows:

    • Mapper/aligner

    • Variant caller

    DRAGEN is capable of performing each mode independently or as an end-to-end solution. DRAGEN also allows you to enable and disable decompression, sorting, duplicate marking, and compression along the DRAGEN pipeline.

    • Full pipeline mode To execute full pipeline mode, set --enable-variant-caller to true and provide input as unmapped reads in *.fastq, *.bam, or *.cram formats. DRAGEN performs decompression, mapping, aligning, sorting, and optional duplicate marking and feeds directly into the variant caller to produce a VCF file. In this mode, DRAGEN uses parallel stages throughout the pipeline to drastically reduce the overall run time.

    • Map/align mode Map/align mode is enabled by default. Input is unmapped reads in *.fastq, *.bam, or *.cram format. DRAGEN produces an aligned and sorted BAM or CRAM file. To mark duplicate reads at the same time, set ‑-enable‑duplicate‑marking to true.

    Output Options

    The following command line options for output are mandatory:

    • --output-directory <out_dir>—Specifies the output directory for generated files.

    • --output-file-prefix <out_prefix>-Specifies the output file prefix. DRAGEN appends the appropriate file extension onto this prefix for each generated file.

    • -r [--ref-dir ]—Specifies the reference hash table.

    The following examples do not include these mandatory options.

    For mapping and aligning, the output is sorted and compressed into BAM format by default before saving to disk. The user can control the output format from the map/align stage with the --output-format <SAM|BAM|CRAM> option. If the output file exists, the software issues a warning and exits. To force overwrite if the output file already exists, use the -f [ --force ] option.

    For example, the following commands output to a compressed BAM file, and then forces overwrite:

    dragen ... -f

    dragen ... -f --output-format bam

    To generate a BAI-format BAM index file (*.bai), set --enable-bam-indexing to true.

    The following example outputs to a SAM file, and then forces overwrite:

    dragen ... -f --output-format sam

    The following example outputs to a CRAM file, and then forces overwrite:

    dragen ... -f --output-format cram

    DRAGEN only outputs lossless CRAM files. All QNAMEs and BAM tags are preserved in the CRAM.

    Alignment tags

    DRAGEN can generate mismatch difference (MD) tags, as described in the BAM standard. The feature is turned off by default because there is a small performance cost to generate these strings. To generate MD tags, set --generate-md-tags to true.

    DRAGEN can also annotate additional information about alignments in a ZS:Z tag. The following are valid tag values:

    Tag
    Tag meaning

    By default, DRAGEN writes a ZS:Z:PAI tag in the output BAM for alignments that map completely inside insertions encoded in population based alternate contigs. To write ZS:Z alignment status tags for all other types described above, set --generate-zs-tags to true (false by default). These tags are only generated in the primary alignment and when a read has suboptimal alignments qualifying for secondary output (even if none were output because --Aligner.sec-aligns was set to 0).

    To generate SA:Z tags, set --generate-sa-tags to true (the default). These tags provide alignment information (position, cigar, orientation) of groups of supplementary alignments, which are useful in structural variant calling.

    To generate pair score in a ps:i tag, set --generate-ps-tags to true (false by default for DNA, true for RNA). The pair score is used in DRAGEN for computing MAPQ and can be used to check how well alignment candidate pairs score against each other.

    DRAGEN can also output mate alignment tags. To generate the mate cigar (in the MC:Z tag), set --generate-mc-tags to true (this is the default). To generate the mate mapping quality (in the MQ:i) tag, set --generate-mq-tags to true (this is the default). To generate mate sequence (in the R2:Z tag) and mate base qualities (in the Q2:Z tag), set --generate-r2-tags to true (default is false) and set --generate-q2-tags to true (default is false) respectively. Please note that when enabled, R2:Z and Q2:Z tags are emitted only for improperly paired read alignments with fragment length atleast 1000 bp. Also, our methylation pipelines currently do not support the output of mate alignment tags.

    DRAGEN also outputs a graph alignment tag ga:Z --generate-ga-tags (true by default for DNA, false for RNA) when applicable. This tag is used to describe the best alt contig alignment which improved the score of a primary-contig alignment at its liftover position. It can also be used to describe read alignments to alt contigs for which there is no liftover and the primary alignment is unmapped. For example, cases when the read maps best to an alt contig describing a novel long-insertion that is not present in the reference. In addition, read alignments that have been marked as unmapped because they map to auto-detected decoy contigs not present in the original user-provided FASTA also have their alignments described in the ga tag.

    The ga tag uses the same format as the SA tag used to describe supplementary alignments.

    CRAM Output

    When CRAM is selected as output, DRAGEN generates a CRAM file with the following features:

    • CRAM format V3.0 is produced by default, V3.1 can be enabled by using the option --cram-version 3.1

    • The CRAM is lossless. Lossy compression is never employed and not optional

    • Quality score compression is lossless. Read names are preserved

    • Only the GZIP compression algorithm is employed for maximum compatibility. bgzip, lzma not employed. rANS is used for quality scores

    The following list of default settings are used for the CRAM output

    CRAM option
    Value
    Description

    Input Options

    DRAGEN can process reads in FASTQ format or BAM/CRAM format. DRAGEN supports the following compression options for FASTQ input files.

    • Uncompressed

    • gzip or bgzip compression

    • ORA compression. To use ORA compression, you must provide an ORA reference and reference directory. See ORA Compression and Decompression.

    If your input FASTQ files are gzipped, DRAGEN automatically decompresses the files using hardware-accelerated decompression, and then streams the reads into the mapper. If your files end in *.ora, DRAGEN automatically decompresses the files using ORA decompression, and then streams the reads into the mapper. The same FASTQ command-line options apply for all compression formats.

    FASTQ Input Files

    FASTQ input files can be single-ended or paired-end, as shown in the following examples.

    • Single-ended in one FASTQ file (-1 option)

    • Paired-end in two matched FASTQ files(-1 and -2 options)

    • Paired-end in a single interleaved FASTQ file(--interleaved (-i) option)

    Both bcl2fastq and the DRAGEN BCL command use a common file naming convention, as follows:

    <SampleID>_S<#>_<Lane>_<Read>_<segment#>.fastq.gz

    Older versions of bcl2fastq and DRAGEN could segment FASTQ samples into multiple files to limit file size or to decrease the time to generate them.

    For Example:

    These files do not need to be concatenated to be processed together by DRAGEN. To map/align any sample, provide the first file in the series (-1 <FileName>_001.fastq). DRAGEN reads all segment files in the sample consecutively for both of the FASTQ file sequences specified using the -1 and -2 options for paired-end input and for compressed fastq.gz files. To turn the behavior off, set ‑‑enable-auto-multifile to false on the command line.

    DRAGEN can also optionally read multiple files by the sample name given in the file name, which can be used to combine samples that have been distributed across multiple BCL lanes or flow cells. To enable this feature, set the --combine-samples-by-name option to true

    If the FASTQ files specified on the command-line use the Casava 1.8 file naming convention shown above and additional files in the same directory share that sample name, those files and all their segments are processed automatically. Note that sample name, read number, and file extension must match. Index barcode and lane number can differ.

    To avoid impacting system performance, input files must be located on a fast file system.

    Multiple FASTQ Input Files

    To process multiple FASTQ input files as one sample, it is recommended that you use the --fastq-list <csv file name> option to specify the name of a CSV file containing the list of FASTQ files, instead of using the --combine-samples-by-name option.

    For example:

    Using a CSV file avoids having to concatenate the FASTQ files, for cases where there are multiple FASTQ files for a sample such as top-up scenarios or where FASTQ files are split across lanes. It also allows you to name the FASTQ input files, input from multiple subdirectories, and add BAM tags specified explicitly for each read group. DRAGEN automatically generates a CSV file of the correct format during BCL conversion to FASTQ. The CSV file is named fastq_list.csv and contains an entry for each FASTQ file or paired-end file pair produced during the run.

    FASTQ CSV File Format

    The first line of the CSV file specifies the title of each column, and is followed by one or more data lines. All lines in the CSV file must contain the same number of comma-separated values and should not contain white space or other extraneous characters.

    Column titles are case-sensitive. The following column titles are required:

    • RGID--Read Group

    • RGSM--Sample ID

    • RGLB--Library

    • Lane--Flow cell lane

    Each FASTQ file referenced in the CSV list can be referenced only once. All values in the Read2File column must be either nonempty and reference valid files, or they must all be empty.

    When generating a BAM file using fastq-list input, one read group is generated per unique RGID value. The BAM header contains RG tags for the following read groups:

    • ID (from RGID)

    • SM (from RGSM)

    • LB (from RGLB)

    You can specify additional tags for each read group by adding a column title. The column title must be only four upper-case characters and begin with RG. For example, to add a PU (platform unit) tag, add a column named RGPU and specify the value for each read group in this column. All column titles must be unique.

    A fastq-list file can contain files for more than one sample. If a fastq-list file contains only one unique RGSM entry, then no additional options need to be specified, and DRAGEN processes all files listed in the fastq-list file. If there is more than one unique RGSM entry in a fastq-list file, --fastq-list-sample-id <SampleID> must be used in addition to --fastq-list <filename> to process only a specific sample from the CSV file. Only the entries in the fastq-list file with an RGSM value that match the specified SampleID are processed.

    • Independent processing and output for multiple individual samples in one run is not supported.

    • To process all listed files together as one sample, regardless of the RGSM value, the option --fastq-list-all-samples=true can be used instead of --fastq-list-sample-id.

    Note

    For a single run, only one BAM and VCF output file are produced because all input read groups are expected to belong to the same sample. To process multiple samples independently from one BCL conversion run, DRAGEN must be run multiple times using different values for the `--fastq-list-sample-id` option.

    There is no option to specify groupings or subsets of RGSM values for more complex filtering, but the fastq-list file can be modified to achieve the same effect.

    The following is an example FASTQ list CSV file with the required columns:

    If you use the --tumor-fastq-list option for somatic input, use the --tumor-fastq-list-sample-id SampleID> option to specify the sample ID for the corresponding FASTQ list, as shown in the following example:

    Tumor-Normal Pairs Input

    If using fastq_lists or tumor_fastq_lists comprising of multiple samples (RGSMs) in somatic mode, you can use a loop to iterate through the two lists to create tumor-normal pairs for testing. Create a *.txt file with the RGSM of each normal sample to be tested (one per line), and then create a separate *.txt file with the RGSM of the tumor samples to be tested. Make sure that the tumor sample RGSM is listed in the same order as the corresponding normal samples and to include a blank line after the last sample.

    You can use the following example script to perform testing in somatic mode. Each iteration takes one entry from the tumor samples list and one entry from the normal samples list (from top to bottom) to create a tumor-normal pair as input for the DRAGEN run.

    The following are examples of the FASTQ lists and samples lists used as input for the script.

    FASTQ ORA Input Files

    You can use the same options as the other FASTQ input file types for ORA files. To use the ORA file, replace the FASTQ file name with the ORA file name and specify the ORA reference directory using --ora-reference.

    See ORA Compression and Decompression for more information on ORA reference files.

    The following command represents paired-end in two matched ORA FASTQ files (-1 and -2 options).

    BAM Input Files

    BAM files can be used as input to the mapper/aligner. By default, --enable-map-align is true. When a BAM file input is provided with map/align enabled, DRAGEN ignores any alignment or duplicate marking information contained in the input file, reads are re-mapped and the new alignments are fed downstream to the variant callers. Any existing flags in the input BAM are erased when reads are re-mapped. BAM re-mapping is supported for multiple BAM inputs at a time, such as in paired tumor-normal input to somatic variant calling. Outputting the re-mapped BAM(s) can be enabled by setting --enable-map-align-output=true.

    Alternatively, existing alignments in the BAM file can be used as input to the variant callers by setting the --enable-map-align option to false.

    If the input file contains paired-end reads, it is important to specify that the input data should be sorted so that pairs can be processed together. Other pipelines would require you to re-sort the input data set by read name. DRAGEN vastly increases the speed of this operation by pairing the input reads, and sending them on to the mapper/aligner when pairs are identified. Use the --pair-by-name option to enable or disable this feature (the default is true).

    Specify single-ended input in one BAM file with the (-b) and --pair-by-name=false options, as follows:

    Specify paired-end input in one BAM file with the (-b) and \--pair-by-name=true options, as follows:

    CRAM Input

    You can use CRAM files as input to the DRAGEN mapper/aligner and variant caller. The DRAGEN functionality available when using CRAM input is the same as when using BAM input. Supported CRAM input file formats are v3.0 and v3.1.

    By default, the CRAM compressor and decompressor uses the DRAGEN reference specified with the --ref-dir option. CRAM compression is reference based, and the reference used for compression is not part of the CRAM file. Therefore, the CRAM input file must have been created with the same reference than what is provided to DRAGEN for the analysis.

    DRAGEN supports the re-alignment of a CRAM input that was created with a different reference in one step. Re-aligning a CRAM file that was created with a different reference requires use of the --cram-reference option. This option will make the CRAM decompressor use the specified reference.

    • --cram-reference can be either a fasta file, or a DRAGEN hash table folder.

    • If pointing to a fasta file, the fasta .fai index file must be present next to the fasta file

    • CRAM output will always be compressed using the --ref-dir reference

    Example: CRAM was created with hg19, re-analysis with hg38

    The following options are used for providing a CRAM input to either mapper/aligner or variant caller:

    • --cram-input--The name and path for the CRAM file

    • --cram-input--One usage example is paired-end input in a single CRAM file. In addition, set the --pair-by-name option to true.

    Multiple BAM or CRAM Input Files

    To provide multiple BAM input files, you can use the --bam-list <csv file name> option to specify the name of a CSV file containing the list of BAM files. For example:

    To provide multiple CRAM input files, you can use the --cram-list <csv file name> option.

    BAM or CRAM CSV Input File Format

    The first line of the CSV file specifies the header containing the title for each column and each subsequent line is a data line. All lines in the CSV file must contain the same number of comma-separated values and should not contain white space or any other extraneous characters.

    An example BAM CSV file:

    Column titles are case sensitive. The following column titles are required:

    • BamFile -- path to BAM file

    Please note that only the "BamFile" column is supported as this time. Extra fields may be specified in the CSV file but they will not be processed by DRAGEN.

    CRAM CSV input follows the same format above, with "CramFile" as the column title instead.

    Restrictions and Limitations:

    DRAGEN bam-list and cram-list are intended to mirror manually merging BAM or CRAM files via a utility such as samtools or MergeSamFiles (Picard). As a result, using bam-list or cram-list is analogous to having a single merged BAM or CRAM input file. Please note that some callers (i.e. DRAGEN variant calling) are unable to process a bam-list or cram-list that is composed of input files containing multiple samples.

    In the case where identical read group IDs appear across multiple files and you want to treat them as distinct read groups, you can use the --prepend-filename-to-rgid=true option to distinguish between read groups.

    If enabled, the resulting output BAM or CRAM file will contain all read groups from the input BAM or CRAM files passed in the CSV list file.

    Tumor-Normal Pairs Input

    You can also use --tumor-bam-list <csv file name> or --tumor-cram-list <csv file name> when running with tumor-only or tumor-normal inputs to DRAGEN. The CSV file has the same format as the options described above.

    BCL Input Files

    BCL is the output format of Illumina sequencing systems. Under limited circumstances, DRAGEN can read directly from BCL for map-align operations, saving the time needed for conversion to FASTQ.

    DRAGEN can read directly from BCL in the following circumstances:

    • Only one lane is input as part of a run (specified on the command-line).

    • The lane has only a single sample specified in the SampleSheet.csv file. When converting BCL to FASTQ is required, DRAGEN provides a BCL to FASTQ converter (see DRAGEN BCL Data Conversion).

    The following example command is for BCL input with only one lane of input:

    For additional BCL conversion options, see Input File Types.

    Handling of N bases

    One of the techniques that DRAGEN uses to optimize handling sequences can lead to the overwriting the base quality score assigned to N base calls.

    When you use the --fastq-n-quality and --fastq-offset options, the base quality scores are overwritten with a fixed base quality. The default values for these options are 2 and 33 to match the Illumina minimum quality of 35 (ASCII character ‘#’).

    Read Names for Paired-End Reads

    By a common convention, read names can include suffixes, such as /1 or /2), which indicate the end of a pair the read represents. For BAM input using the --pair-by-name option, DRAGEN ignores these suffixes to find matching pair names. By default, DRAGEN uses the forward slash character as the delimiter for these suffixes and ignores the /1 and /2 when comparing names. By default, DRAGEN strips these suffixes from the original read names.

    DRAGEN has the following options to control how suffixes are used:

    • To change the delimiter character, for suffixes, use the --pair-suffix-delimiter option. Valid values for this option include forward-slash (/), dot (.), and colon (:).

    • To preserve the entire name, including the suffixes, set --strip-input-qname-suffixes to false.

    • To append a new set of suffixes to all read names, set --append-read-index-to-name to true. The delimiter is determined by the --pair-suffix-delimiter

    Gene Annotation Input Files

    When processing RNA-Seq data, you can supply a gene annotations file by using the --annotation-file option. Providing this file improves the accuracy of the mapping and aligning stage (see [Input Files]{.underline}). The file should conform to the GTF/GFF format specification and should list annotated transcripts that match the reference genome being mapped against. The similar GFF3 format is currently not supported, due to inconsistent contig naming between GENCODE and Ensembl. See the RNA user guide section for more details on potential issues and workarounds.

    DRAGEN can take the SJ.out.tab file (see [SJ.out.tab]{.underline}) as an annotations file to help guide the aligner in a two-pass mode of operation.

    Networked Streaming

    AWS S3, Azure Blob Storage, and AWS Presigned URL Input Streaming

    DRAGEN can stream input files directly from an AWS S3 bucket, Azure Blob storage account, or by using AWS presigned URLs (presigned URLs are not supported for Azure Blob storage at this time). With streaming, input files are not required to be downloaded locally prior to being processed. The files are streamed over the network directly into the DRAGEN processor.

    Input streaming is most beneficial for large input files. DRAGEN supports input streaming for BAMs and compressed FASTQ files. For FASTQ files, input streaming can be used in all the configurations, including single-end FASTQs, paired-end FASTQs, and FASTQ lists.

    Input streaming is supported for the following use cases:

    • Mapping/aligning of FASTQ and BAM.

    • Germline and somatic small variant calling from BAM (without remapping).

    For other file types that are significantly smaller in size, download them locally before running the analysis.

    Streaming FASTQ Input Using AWS S3

    Streaming FASTQ Input Using Azure Blob Storage Account

    Streaming FASTQ Input Using Presigned URLs (for AWS only)

    Streaming BAM Input Using AWS S3

    Streaming BAM Input Using Azure Blob Storage Account

    Streaming BAM Input Using Presigned URLs (for AWS only)

    AWS S3, Azure Blob Storage, Output Streaming

    DRAGEN can stream its output to an AWS S3 Bucket or an Azure Blob Storage Account Container. Output streaming is beneficial for large output files and for sharing results.

    Streaming output to AWS S3

    Streaming output to Azure Blob Storage Account

    Security and Permissions

    To stream input files or write to a cloud providers storage, you must have permission to access the remote files.

    AWS S3

    S3 requires AWS authentication and credentials. The authentication should already be set up on the instance you are running, for example, via IAM policies.

    Azure Blob Storage Account

    Azure requires authentication and environment variables. DRAGEN supports two cases: (1) Using managed identities and (2) Storage account access keys.

    To use managed identities you must run DRAGEN on an Azure instance. The instance must have Contributor permissions (read/write) on the Storage Account it wants to read and write to. If the instance has a single managed identity, only the AZ_ACCOUNT_NAME=<azure-storage-account-name> environment variable is required. For multiple managed identities, you must also provide the AZR_IDENT_CLIENT_ID=<client-id> environment variable, with the client id of the identity that can access your storage bucket. This can be found on the Azure Portal.

    With storage account access keys, DRAGEN can write to an Azure bucket both on and off Azure instances. For this use case, find the and set the environment variables AZ_ACCOUNT_NAME=<azure-storage-account-name> and AZ_ACCOUNT_KEY=<account-key>.

    Presigned URL (AWS only)

    An AWS presigned URL most likely has a query string attached to it, which provides the authentication credentials or necessary tokens to grant permission to the S3 bucket (e.g., https://bucket-name.amazonaws.com/path/to/folder?querystring). Currently, streaming input to DRAGEN Azure presigned URLs is not supported.

    Sample Sex

    Use the --sample-sex command line option to control the sex karyotype input used in downstream components, such as variant callers. If a sample sex karyotype input is not specified using the command line, the sex karyotype is automatically determined. The sex karyotype input is converted to a reference sex karyotype for use in variant calling. Other components might support sex karyotype input. Refer to the corresponding section for the component you are using.

    The --sample-sex option supports the following values. Values are not case-sensitive.

    • none: No sex karyotype input. Components use a default reference sex karyotype.

    • auto: The sex karyotype is estimated by the Ploidy Estimator. If using CNV calling, sex karyotype is determined using a separate sex estimation module. If DRAGEN cannot estimate the sex karyotype, then components do not have a sex karyotype input. This behavior is then the same as none. auto is the default value.

    • female

    The following example command lines use --sample-sex to specify the sex karyotype.

    If the value is none, female, or male, the Ploidy Estimator could still run and produce output, but variant callers will not use any estimated sex karyotype that is different than the sex karyotype provided via the command-line.

    The sex karyotype input is converted to the reference sex karyotype for the different components as follows. See the relevant component section for more information on how --sample-sex is used.

    Reference Sex Karyotype

    Sex Karyotype Input
    CNV Caller
    DRAGEN-STR
    Ploidy Caller
    Small Variant Caller
    SV Caller
    • For sex karyotype input of None, CNV/Ploidy Caller independently check the coverage ratio of X and Y to determine the reference sex karyotype. Detection of minimal Y coverage will yield XY, otherwise XX.

    Preservation or Stripping of BQSR Tags

    The Picard Base Quality Score Recalibration (BQSR) tool produces output BAM files that include tags BI and BD. BQSR calculates these tags relative to the exact sequence for a read. If a BAM file with BI and BD tags is used as input to mapper/aligner with hard clipping enabled, the BI and/or BD tags can become invalid.

    The recommendation is to strip these tags when using BAM files as input. To remove the BI and BD tags, set the --preserve-bqsr-tags option to false. If you preserve the tags, DRAGEN warns you to disable hard clipping.

    Read Group Options

    DRAGEN assumes that all the reads in a given FASTQ belong to the same read group. DRAGEN creates a single @RG read group descriptor in the header of the output BAM file, with the ability to specify the following standard BAM attributes:

    Attribute
    Argument
    Description

    If any of these arguments are present, DRAGEN adds an RG tag to all the output records to indicate that they are members of a read group. The following example shows a command line that includes read group parameters:

    When using the --fastq-list option to input multiple read groups, BAM tags (and others) are specified for each read group by adding columns to the fastq_list.csv file. Each column heading consists of four capital letters and each begins with 'RG'. For each column, each read group's values for that column are propagated to the output BAM file in an identically named tag.

    License Options

    To suppress the license status message at the end of the run, use the --lic-no-print option. The following shows an example of the license status message:

    Autogenerated MD5SUM for BAM and CRAM Output Files

    An MD5SUM file is generated automatically for BAM and CRAM output files. The MD5SUM file has the same name as the output file, with an .md5sum extension appended (eg, whole_genome_run_123.bam.md5sum). The MD5SUM file is a single-line text file that contains the md5sum of the output file, which exactly matches the output of the Linux md5sum command.

    The MD5SUM calculation is performed as the output file is written, so there is no measurable performance impact (compared to the Linux md5sum command, which can take several minutes for a 30x BAM).

    Configuration Files

    Command line options can be stored in a configuration file. The location of the default configuration file is <INSTALL_PATH>/config/dragen-user-defaults.cfg. You can override this file by using the --config-file (-c) option to specify a different file. The configuration file used for a given run supplies the default settings for that run, any of which can be overridden by command line options.

    The recommended approach is to use the dragen-user-defaults.cfg file as a template to create default settings for different use cases. Copy dragen-user-defaults.cfg, rename the copy, then modify the new file for the specific use-case. Best practice is to put options that rarely change into the configuration file and to specify options that vary from run to run on the command line.

    Licensing

    DRAGEN utilizes quota based licensing for a majority of features. More information can be found in the .

    Re-map and Run Variant Caller (*.bam to *.vcf)
  • Run BCL Converter (BCL to *.fastq)

  • Run RNA Map/Align (*.fastq to *.bam)

  • Variant caller mode To execute variant caller mode, set the --enable-variant-caller option to true, and set --enable-map-align option to false. The input must be a mapped and aligned BAM/CRAM file. DRAGEN produces a VCF file. DRAGEN will force-enable re-sorting of the BAM, because a number of read statistics and estimates are required for the Variant Caller to operate effectively. Setting --enable-sort to false will be overridden. BAM files cannot be duplicate marked in the DRAGEN pipeline prior to variant calling if they have not already been marked. Use the end-to-end mode of operation to take advantage of the mark-duplicates feature.

  • RNA-Seq data To enable processing of RNA-Seq--based data, set --enable-rna to true. DRAGEN uses the RNA spliced aligner during the mapper/aligner stage. DRAGEN dynamically switches between the required modes of operation..

  • Bisulfite MethylSeq data To enable processing of Bisulfite MethylSeq data, set the --enable-methylation-calling option to true. DRAGEN automates the processing of data for Lister (directional) and Cokus (nondirectional) protocols to generate a single BAM with bismark-compatible tags. Alternatively, you can run DRAGEN in a mode that produces a separate BAM file for each combination of the C->T and G->A converted reads and references. To enable this mode of processing, you need to build a set of reference hash tables with --ht-methylated enabled, and run DRAGEN with the appropriate ‑‑methylation-protocol setting.

  • All input BAM tags are preserved

  • The reference used to compress the CRAM file, is the DRAGEN Hash Table provided during the map/align run. When decompressing the CRAM with a FASTA file and 3rd party tools, the FASTA that was used to generate the Hash Table must be used.

  • A CRAM index is produced in .crai format

  • CRAM output is only possible when sort is enabled. CRAM alignments will always be positionally sorted

  • noref

    0

    Do not use non-referenced based encoding

    multiseq

    -1

    Do not use multiple references per slice

    unsorted

    0

    Do not use unsorted mode

    use_bz2

    0

    Do not compress using bzip2

    use_lzma

    0

    Do not compress using lmza

    use_rans

    1

    Use rANS for quality score compression

    binning

    NONE

    Qual score binning not used

    preserve_aux_order

    1

    Preserve all aux tags and order (incl RG,NM,MD)

    preserve_aux_size

    0

    Aux tag sizes not preserved ('i', 's', 'c')

    lossy_read_names

    0

    Preserve read names

    lossy

    0

    Do not enable Illumina 8 quality-binning system

    ignore_md5

    0

    Enable all checking of checksums

    decode_md

    0

    Do not (re)generate MD and NM tags

    cram_version

    3.0

    Default is CRAM v3.0.

    Read1File--Full path to a valid FASTQ input file
  • Read2File--Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty.

  • option. By default, the delimiter is a slash, so
    /1
    and
    /2
    are added to the names.
    : Sex karyotype input is XX.
  • male: Sex karyotype input is XY.

  • XX

    XXYY

    XXX

    XX

    XX

    XX

    XXYY

    XXYY

    XXXX

    XX

    XX

    XX

    XXYY

    XXYY

    XXXXX

    XX

    XX

    XX

    XXYY

    XXYY

    XY

    XY

    XY

    XY

    XY

    XXYY

    XXY

    XY

    XX

    XY

    XXYY

    XXYY

    XXXY

    XY

    XX

    XY

    XXYY

    XXYY

    XXXXY

    XY

    XX

    XY

    XXYY

    XXYY

    XYY

    XY

    XY

    XY

    XXYY

    XXYY

    XXYY

    XY

    XX

    XY

    XXYY

    XXYY

    XXXYY

    XY

    XX

    XY

    XXYY

    XXYY

    XYYY

    XY

    XY

    XY

    XXYY

    XXYY

    XXYYY

    XY

    XX

    XY

    XXYY

    XXYY

    XYYYY

    XY

    XY

    XY

    XXYY

    XXYY

    None

    XX/XY

    XX

    XX/XY

    XXYY

    XXYY

    SM

    --RGSM

    Sample.

    CN

    --RGCN

    Name of the sequencing center that produced the read.

    DS

    --RGDS

    Description.

    DT

    --RGDT

    Date the run was produced.

    PI

    --RGPI

    Predicted mean insert size.

    ZS:Z:R

    Multiple alignments with similar score were found.

    ZS:Z:NM

    No alignment was found.

    ZS:Z:QL

    An alignment was found but it was below the quality threshold.

    ZS:Z:NRD

    Alignment is to an auto-added decoy contig (not present in input FASTA).

    ZS:Z:PAI

    Alignment is to an insertion encoded in a population based alternate contig (not present in input FASTA).

    SEQS_PER_SLICE

    2000

    Max sequences per slice

    BASES_PER_SLICE

    SEQS_PER_SLICE*500

    Max bases per slice

    SLICE_PER_CNT

    1

    Max slices per container

    embed_ref

    0

    X0

    XX

    XY

    XX

    XXYY

    XXYY

    XX

    XX

    XX

    ID

    --RGID

    Read group identifier. If you include any of the read group parameters, RGID is required. It is the value written into each output BAM record.

    LB

    --RGLB

    Library.

    PL

    --RGPL

    Platform/technology used to produce the reads. The BAM standard allows for values CAPILLARY, LS454, ILLUMINA, SOLID, HELICOS, IONTORRENT and PACBIO.

    PU

    --RGPU

    DRAGEN Recipes
    Prepare a Reference Genome
    Storage Account Access Key
    Licensing Reference Section

    Do not embed reference sequence

    XX

    Platform unit, eg, flowcell-barcode.lane.

    dragen --bcl-conversion-only true --bcl-input-directory <BCL_DIRECTORY> \
    --output-directory <OUT_DIRECTORY>
    dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
    --output-file-prefix <FILE_PREFIX> [options] -1 <FASTQ1> \
    [-2 <FASTQ2>] --enable-rna true
    dragen --build-hash-table true --ht-reference <REF_FASTA> \
    --output-directory <REF_DIRECTORY>  [options]
    dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
    --output-file-prefix <FILE_PREFIX> [options] -1 <FASTQ1> \
    [-2 <FASTQ2>] --RGID <RG0> --RGSM <SM0> --enable-variant-caller true
    dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
    --output-file-prefix <FILE_PREFIX> [options] \
    -1 <FASTQ1> [-2 <FASTQ2>]  \
    --RGID <RG0> --RGSM
    dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
    --output-file-prefix <FILE_PREFIX> [options] -b <BAM> \
    --enable-map-align false \
    --enable-variant-caller true
    dragen -r <REF_DIR> -1 <fastq> \
    --output-directory <OUT_DIR> -output-file-prefix <OUTPUT_PREFIX> \
    --RGID <RGID> --RGSM <RGSM>
    dragen -r <REF_DIR> -1 <fastq1> -2 <fastq2> \
    --output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX> \
    --RGID <RGID> --RGSM <RGSM>
    dragen -r <REF_DIR> -1 <INTERLEAVED_FASTQ> -i \
    --RGID <RGID> --RGSM <RGSM>
    RDRS182520_S1_L001_R1_001.fastq.gz
    
    RDRS182520_S1_L001_R1_002.fastq.gz
    
    ...
    
    RDRS182520_S1_L001_R1_008.fastq.gz
    dragen -r <ref_dir> --fastq-list <CSV_FILE> \
    -fastq-list-sample-id <Sample_ID> -output-directory <OUT_DIR> 
    --output-file-prefix <OUT_PREFIX>
    RGID,RGSM,RGLB,Lane,Read1File,Read2File
    CACACTGA.1,RDSR181520,UnknownLibrary,1,/staging/RDSR181520_S1_L001_R1_001.fastq,
    /staging/RDSR181520_S1_L001_R2_001.fastq
    AGAACGGA.1,RDSR181521,UnknownLibrary,1,/staging/RDSR181521_S2_L001_R1_001.fastq,
    /staging/RDSR181521_S2_L001_R2_001.fastq
    TAAGTGCC.1,RDSR181522,UnknownLibrary,1,/staging/RDSR181522_S3_L001_R1_001.fastq,
    /staging/RDSR181522_S3_L001_R2_001.fastq
    AGACTGAG.1,RDSR181523,UnknownLibrary,1,/staging/RDSR181523_S4_L001_R1_001.fastq,
    /staging/RDSR181523_S4_L001_R2_001.fastq
    dragen -r <ref_dir> --tumor-fastq-list <csv_file> \
    --tumor-fastq-list-sample-id <Sample_ID> \
    --output-directory <out_dir> \
    --output-file-prefix <out_prefix> --fastq-list <csv_file_2> \
    --fastq-list-sample-id <Sample_ID_2>
    #!/bin/bash
    
    HT="/staging/HT/"
    tumor_fastq_list="/staging/inputs/tumor_fastq_list.csv"
    normal_fastq_list="/staging/inputs/normal_fastq_list.csv"
    
    tumor_samples_list="/staging/inputs/tumor_samples_list.txt"
    normal_samples_list="/staging/inputs/normal_samples_list.txt"
    
    while read -u 3 -r tumor_RGSM && read -u 4 -r normal_RGSM; do
    output_dir="/staging/results/${tumor_RGSM}_${normal_RGSM}"
    mkdir -p ${output_dir}
    
    dragen \
    -r ${HT} \
    --tumor-fastq-list ${tumor_fastq_list} \
    --tumor-fastq-list-sample-id ${tumor_RGSM} \
    --fastq-list ${normal_fastq_list} \
    --fastq-list-sample-id ${normal_RGSM} \
    --output-directory ${output_dir} \
    --output-file-prefix ${tumor_RGSM}_${normal_RGSM}
    done 3<${tumor_samples_list} 4<${normal_samples_list}
    
    
    Sample fastq_list.csv content:
    
    RGPL,RGID,RGSM,RGLB,Lane,Read1File,Read2File
    DRAGEN_RGPL,DRAGEN_RGID_N1.1,normal-1,ILLUMINA,1,/staging/inputs/normal-1_S1_L001_R1_001.fastq.gz,/staging/inputs/normal-1_S1_L001_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_N1.2,normal-1,ILLUMINA,2,/staging/inputs/normal-1_S1_L002_R1_001.fastq.gz,/staging/inputs/normal-1_S1_L002_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_N2.1,normal-2,ILLUMINA,1,/staging/inputs/normal-2_S1_L001_R1_001.fastq.gz,/staging/inputs/normal-2_S1_L001_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_N2.2,normal-2,ILLUMINA,2,/staging/inputs/normal-2_S1_L002_R1_001.fastq.gz,/staging/inputs/normal-2_S1_L002_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_N3.1,normal-3,ILLUMINA,1,/staging/inputs/normal-3_S1_L001_R1_001.fastq.gz,/staging/inputs/normal-3_S1_L001_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_N3.2,normal-3,ILLUMINA,2,/staging/inputs/normal-3_S1_L002_R1_001.fastq.gz,/staging/inputs/normal-3_S1_L002_R2_001.fastq.gz
    
    Sample tumor_fastq_list.csv content:
    
    RGPL,RGID,RGSM,RGLB,Lane,Read1File,Read2File
    DRAGEN_RGPL,DRAGEN_RGID_T1.1,tumor-1,ILLUMINA,1,/staging/inputs/tumor-1_S1_L001_R1_001.fastq.gz,/staging/inputs/tumor-1_S1_L001_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_T1.2,tumor-1,ILLUMINA,2,/staging/inputs/tumor-1_S1_L002_R1_001.fastq.gz,/staging/inputs/tumor-1_S1_L002_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_T2.1,tumor-2,ILLUMINA,1,/staging/inputs/tumor-2_S1_L001_R1_001.fastq.gz,/staging/inputs/tumor-2_S1_L001_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_T2.2,tumor-2,ILLUMINA,2,/staging/inputs/tumor-2_S1_L002_R1_001.fastq.gz,/staging/inputs/tumor-2_S1_L002_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_T3.1,tumor-3,ILLUMINA,1,/staging/inputs/tumor-3_S1_L001_R1_001.fastq.gz,/staging/inputs/tumor-3_S1_L001_R2_001.fastq.gz
    DRAGEN_RGPL,DRAGEN_RGID_T3.2,tumor-3,ILLUMINA,2,/staging/inputs/tumor-3_S1_L002_R1_001.fastq.gz,/staging/inputs/tumor-3_S1_L002_R2_001.fastq.gz
    
    Sample normal_samples_list content
    
    normal-1
    normal-2
    normal-3
    
    Sample tumor_samples_list content
    
    tumor-1
    tumor-2
    tumor-3
    
    dragen -r <REF_DIR> -1 <fastq.ora1> -2 <fastq.ora2> \
    --ora-reference <ORADATA_DIR> \
    --output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX> \
    --RGID <RGID> --RGSM <RGSM>
    dragen -r <ref_dir> -b <bam> --output-directory <out_dir> \
    --output-file-prefix <out_prefix> --pair-by-name false
    dragen -r <ref_dir> -b <bam> --output-directory <out_dir> \
    --output-file-prefix <out_prefix> --pair-by-name true
    dragen -r <ref_dir HG38> --cram-input <cram> --output-directory <out_dir> \
    --output-file-prefix <out_prefix> --cram-reference <ref_dir HG19>
    dragen -r <ref_dir HG38> --cram-input <cram> --output-directory <out_dir> \
    --output-file-prefix <out_prefix> --cram-reference <hg19.fa>
    dragen -r <ref_dir> --cram-input <cram> --output-directory <out_dir> \
    --output-file-prefix <out_prefix> --pair-by-name true
    dragen -r <ref_dir> --bam-list <CSV_FILE> \
    --output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX>
    BamFile
    /path/to/bam/one
    /path/to/bam/two
    dragen --bcl-input-dir <BCL_ROOT> --bcl-only-lane <num> -r <ref_dir> \
    --output-directory <out_dir> --output-file-prefix <out_prefix>
    dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -1 s3://s3-bucket-name/path/to/object_1.fastq.gz \
      -2 s3://s3-bucket-name/path/to/object_2.fastq.gz \
      --RGID object_ID \
      --RGSM sample_name \
      --output-directory /staging/examples/ \
      --output-file-prefix streaming
    AZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -1 https://storage-account-name.blob.core.windows.net/path/to/object_1.fastq.gz \
      -2 https://storage-account-name.blob.core.windows.net/path/to/object_2.fastq.gz \
      --RGID object_ID \
      --RGSM sample_name \
      --output-directory /staging/examples/ \
      --output-file-prefix streaming
    dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -1 https://bucket-name.amazonaws.com/path/to/object_1.fastq.gz?querystring \
      -2 https://bucket-name.amazonaws.com/path/to/object_2.fastq.gz?querystring \
      --RGID object_ID \
      --RGSM sample_name \
      --output-directory /staging/examples/ \
      --output-file-prefix streaming
    dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -b s3://s3-bucket-name/path/to/object_1.bam \
      --output-directory /staging/examples/ \
      --output-file-prefix streaming
    AZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -b https://storage-account-name.blob.core.windows.net/path/to/object_1.bam \
      --output-directory /staging/examples/ \
      --output-file-prefix streaming
    dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -b https://bucket-name.amazonaws.com/path/to/object_1.bam?querystring \
      --output-directory /staging/examples/ \
      --output-file-prefix streaming
    dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -1 SRA056922.fastq \
      --RGID object_ID \
      --RGSM sample_name \
      --output-directory s3://s3-bucket-name/path/to/output \
      --intermediate-results-dir /staging/examples \
      --output-file-prefix streaming
    AZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f \
      -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
      -1 SRA056922.fastq \
      --RGID object_ID \
      --RGSM sample_name \
      --output-directory https://storage-account-name.blob.core.windows.net/path/to/output \
      --intermediate-results-dir /staging/examples \
      --output-file-prefix streaming
    --sample-sex FEMALE
    
    --sample-sex MALE
    
    --sample-sex NONE
    dragen --RGID 1 --RGCN Broad --RGLB Solexa-135852 \
    --RGPL Illumina --RGPU 1 --RGSM NA12878 \
    -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
    -1 SRA056922.fastq --output-directory /staging/tmp/ \
    --output-file-prefix rg_example
    LICENSE_MSG| =====================================================
    LICENSE_MSG| License report
    LICENSE_MSG|   Genome status [ACxxxxxxxxxxx] : used 1263.9 Gbases
    since 2018-Feb-15 (1263886160894 bases, unlimited)
    LICENSE_MSG|   Genome  bases [ACxxxxxxxxxxx] : 202000000
    LICENSE_MSG|   Genome  bases [total]         : 202000000
    dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
    --output-file-prefix <FILE_PREFIX> [options] -b <BAM> \
    --enable-map-align true \
    --enable-variant-caller true