Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
DRAGEN analysis is available on multiple platforms.
DRAGEN on-premises server
DRAGEN on-premises server offers highly accurate secondary analysis in a fraction of time compared with a traditional CPU-based system. - Analyze and store data locally - Supports varying levels of command line interface - Replace up to 30 traditional compute instances - Fully process a 34× whole human genome in ~30 minutes. (1) - One unit supports two NovaSeq 6000 Systems running at full capacity
DRAGEN analysis on Illumina Connected Analytics
Couples the accuracy and speed of the DRAGEN with the ability to customize analysis pipeline to operationalize informatics on a secure platform.
DRAGEN on BaseSpace Sequence Hub (BSSH)
Push button analysis capability in an intuitive, easy-to-use interface with compliance, and storage features of BaseSpace Sequence Hub and Amazon Web Services (AWS).
DRAGEN onboard NovaSeq X Series
- Flexibly runs multiple secondary analysis pipelines in parallel. - Performs up to four simultaneous applications per flow cell in a single run. - Brings up to 5x lossless data compression, and analysis with supported applications - Provides savings on analysis, which over five years can exceed the price of the sequencer
DRAGEN onboard NextSeq 1000 and NextSeq 2000 Systems
(1) HG002 from PrecisionFDA truth challenge V2 run with DRAGEN analysis v4.0 on DRAGEN server v4, all callers
(2) When run according to sample recommendations
Illumina DRAGEN (Dynamic Read Analysis for GENomics) secondary analysis was developed to address important challenges associated with analyzing NGS (Next Generation Sequencing) data for a range of applications, including genome, exome, transcriptome, and methylome studies. DRAGEN secondary analysis processes NGS data and enables tertiary analysis to drive insights. The available tools make up a highly accurate, comprehensive, and efficient solution that enables labs of all sizes and disciplines to do more with their genomic data.
Product highlights
Accurate results:
Pangenome reference genome and machine learning drive unprecedented accuracy
99.89% accuracy score with the Precision FDA Truth Challenge V2 benchmark data (2,3)
Comprehensive platform:
Analyze NGS data from whole genomes, exomes, methylomes, and transcriptomes
Available on platform of choice and scalable based on needs
Efficient analysis:
Process a 34x genome in ~ 30 minutes, with all supported callers with DRAGEN server v4 (1)
Reduce FASTQ file sizes up to 5x with DRAGEN ORA Compression
References:
Illumina data on file, 2022.
Illumina DRAGEN Secondary Analysis is the first single platform to achieve 99.89% accuracy based on . Details here . Accessed March 22, 2023
PrecisionFDA Truth Challenge V2: Calling Variants from Short and Long Reads in Difficult-to-Map Regions. . Accessed November 3, 2020.
- Provides access to select DRAGEN analysis informatics pipelines - Enables users to generate results in as little as two hours - Uses intuitive pipeline algorithms to reduce reliance on external informatics experts
DRAGEN onboard MiSeq i100 Series
Intuitive, ultra-rapid analysis including DRAGEN BCL convert, DRAGEN Library QC, DRAGEN small WGS and DRAGEN Microbial Enrichment Plus. - Rapid results with comprehensive secondary analysis generated in two hours or less (2) - Highly efficient workflow with a single user touchpoint to VCF and/or html report and no intermediate file transfers - Exceptionally easy with an intuitive interface for non-expert users
DRAGEN on AWS, Azure
DRAGEN supports the FPGA enabled instance types of AWS, Azure. Rpm installers and the Kernel driver can be installed on images managed by the user, and DRAGEN can be run by purchasing a license.
DRAGEN on AWS and Azure Marketplace
Pre-configured Amazon Machine Images (AMI) and Azure Virtual Machines with DRAGEN installed can be accessed from the respective marketplace offerings in a Pay-As-You-Use model.
DRAGEN on GCP
DRAGEN is made available on the Google Cloud Platform. Pre-configured instances with DRAGEN installed can be accessed through the GCP application interface. Limited availability. Please reach out to your Illumina representative for access.
N/A
N/A
DRAGEN ORA Compression
DRAGEN ORA compression is optimized for high compression ratios of FASTQ files, as well as rapid compression and decompression, all while preserving data integrity.
N/A
Compression Ratio Run Time
DRAGEN Map + Align
The DRAGEN Map + Align can be run as a standalone or as part of DRAGEN’s suite of pipelines
N/A
Mapping metrics Duration Metrics Coverage Metrics
DRAGEN Germline
The DRAGEN Germline Pipeline provides end-to-end NGS analysis, including advanced error model calibration for increased accuracy, and repeat expansion detection and genotyping through Illumina Expansion Hunter.
SNV/Indel CNV SV Repeat Expansions
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN Somatic
The DRAGEN Somatic Pipeline includes tumor-only and tumor–normal modes, designed for detecting somatic variants in tumor samples. Both modes make no ploidy assumptions, enabling detection of low-frequency alleles.
SNV/Indel CNV SV TMB MSI HLA
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN Enrichment
The DRAGEN Enrichment Pipeline combines DRAGEN’s germline and somatic callers into a pipeline designed specifically for analyzing enrichment samples. Includes a full suite of enrichment metrics and reporting.
SNV/Indel CNV SV
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN RNA
The DRAGEN RNA Pipeline performs transcriptome analysis starting with splice junction discovery and alignment, followed by rapid alignment and splice junction mapping and quantification. For differential expression, Illumina recommends the DRAGEN Differential Expression app on BaseSpace Sequence Hub.
Gene fusion SNV/Indel
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN Single Cell RNA
The DRAGEN Single Cell RNA pipeline performs demultiplexing, cell-barcode and UMI error correction, sequence alignment, and quantification of gene expression.
N/A
Mapping Metrics Duration Metrics Coverage Metrics Callability Report Cell Metrics
DRAGEN Joint Genotyping
The DRAGEN Joint Genotyping/Population Pipeline calls variants jointly across multiple genomes and scales to large cohorts of samples at expedited speeds with uncompromising accuracy.
SNV/Indel CNV SV Repeat Expansions
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN Methylation
The DRAGEN Methylation Pipeline performs alignment, methyl calling, and calculates alignment and methylation metrics.
N/A
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN Reference Builder
Accepts FASTA files, and builds the proprietary reference used by the DRAGEN apps.
N/A
N/A
DRAGEN TruSight Oncology 500 ctDNA Analysis Software
Secondary analysis support for Illumina’s TruSight Oncology 500 ctDNA. Available on the local DRAGEN Server version 3 and later.
SNV/Indel CNV DNA fusions MSI TMB
Mapping metrics Duration Metrics Coverage Metrics Variant Metrics Callability Report
DRAGEN Imputation
The DRAGEN Imputation pipeline is an end to end user friendly tool that enables scalable low pass whole genome sequencing analysis
N/A
Impute ≤100 samples simultaneously 1.7x faster compared to original GLIMPSE code
DRAGEN analysis can be used in numerous fields in the biological sciences.
Genetic Diseases
Reduce time required for genomic analysis, with high accuracy and comprehensiveness
Oncology
Analyze tumor-only and tumor/normal samples with accuracy, comprehensiveness, and efficiency
Cell and Molecular Biology
Advance understanding of cellular mechanisms with rapid analysis pipelines for bulk and single cell samples
Population Genomics
Accurately and efficiently analyze sequenced genomes at scale. Accelerate re-analysis as computational tools improve over time
Infectious Disease
Detect and characterize infectious diseases with a comprehensive solution
Agrigenomics
Efficiently analyze animals and plants of varying genomic complexities with custom reference
DRAGEN Demultiplexing
Rapid demultiplexing of NGS analysis
DRAGEN v4.4 introduces support for DRAGEN server apps. These apps, comprised of Docker images, Nextflow workflows, a CLI shell script, and packaged resource bundles, can be downloaded and installed on the on-premises server. The packaged resource bundles include all the resource files required to run the application, such as the hash table(s), various noise baseline files, bed files.
Server apps make it easy to run complex workflows such as Tumor Normal somatic analysis by simplifying the management of external resources and applying the correct command line parameters for the selected analysis type. The DRAGEN server can support multiple installed server apps and DRAGEN on-prem for command line use at the same time.
We recommend using the BSSH Run Planner tool to minimize errors in creating the sample sheet. For instruments such as NovaSeq X, sample sheets created in the BSSH Run Planer tool is automatically downloaded into the run folder on the instrument.
Common output files for cloud and local pipelines are described in the Analysis Output.
On DRAGEN server, Nextflow logs are contained in the Work folder in a hierarchical folder structure organized by the tasks in the pipeline_trace.txt. These files are prefixed with "." and hidden from normal view.
📂 Work — (DRAGEN server only) - Contains information and files related to Nextflow execution
📄 .command.log - Contains Nextflow pipeline step execution log.
📄 .command.out - Contains Nextflow pipeline step standard output log.
📄 .command.err - Contains Nextflow pipeline step standard error log.
📄 .exit.code - Contains Nextflow pipeline step execution exit code.
The pipeline only supports starting from FASTQ, BAM or CRAM in the current release. The sample sheet below only contains the minimally required sections for starting the analysis. It is not a valid sample sheet for other purposes.
Users can visit the Sample Sheet guidelines section to learn additional details on required fields and values as they fill-in their sample information, or download a template from Sample Sheet Template.
[Header],,,,,,,,,,
FileFormatVersion,2,,,,,,,,,
RunName,DRAGEN TN Start From FASTQ Only,,,,,,,,,
InstrumentType,NovaSeq,,,,,,,,,
InstrumentPlatform,NovaSeq,,,,,,,,,
[TN_Data],,,,,,,,,,
Sample_ID,Specimen_Type,Sample_Type,Case_ID,Sample_Description,Sample_Classification
tumorSample,FFPE,DNA,SampleA,Description1,Tumor
normalSample,FFPE,DNA,SampleA,Description2,NormalDRAGEN Heme WGS Tumor Only Pipeline, henceforth referred as the Heme Pipeline, is a comprehensive and unbiased whole genome sequencing solution to replace conventional cytogenetic and panel sequencing approaches for detecting all types of mutation using a limited amount of DNA. It can be applied to detect clinically actionable mutations for cancer spanning a wide range of genomic events, e.g., structural variants (SV), Copy Number Alterations (CNA), small variants (SNV/insertion/deletion/delins) and internal tandem duplications (ITD) and DUX4 variants using Heme samples.
The Heme pipeline includes a DNA-only workflow designed to analyze whole genome sequencing data generated on supported instruments. It may be run as a local off-instrument solution installable on a DRAGEN server or accessible through the Illumina Connected Analytics (ICA) cloud environment. The Heme pipeline is for Research Use Only (RUO).
The Illumina Connected Insights (ICI) Local platform can be used to interpret and visualize analysis results from a clinical research workflow pipeline on a local DRAGEN server. See .
The DRAGEN DNA Pipeline accelerates the secondary analysis of NGS data by harnessing the tremendous power available on the DRAGEN Platform. The pipeline includes highly optimized algorithms for mapping, aligning, sorting, duplicate marking, and haplotype variant calling. In addition to haplotype variant calling, the pipeline supports calling of copy number and structural variants as well as detection of repeat expansions and targeted calls.
Pressing Ctrl+C during a DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.
CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.
When running the analysis software using SSH, Illumina recommends using additional software to prevent unexpected termination of analysis. Illumina recommends screen and tmux.
The Heme pipeline depends on the DRAGEN Application Manager (DAM). For issues related to the DRAGEN Application Manager installation, refer to the DRAGEN Application Manager Installation Guide.
Ensure DRAGEN App Manager is running properly.
Ensure Docker is running properly. For docker configuration help, please check the DRAGEN Application Manager installation guide and docker.org documentation.
In CIFS (SMB 1.0), the mounted volume may have a permission check issue and cause the Nextflow workflow to exit prematurely when a non-root user account is used for analysis, unless the filesystem permission check is disabled. The workaround is to use newer SMB protocols and configure Windows Active Directory for analysis with non-root users.
To increase the coverage of a sample using multiple FASTQ files, the FASTQ files must follow the Illumina naming convention. The current limit is up to 16 FASTQ files from 8 lanes based on available flow cell types.
If there are more than 16 FASTQ files, then use cat or other command line utility to concatenate the FASTQ files as a single FASTQ file to get around the file number restriction.
Basic ICA Subscription
Basic ICI Subscription (if desired)
Additional information is available from the ICA support site.
The pipeline supports mix flow cells where different assays are sequenced in the same flow cell.
CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.
When running the analysis software using SSH, Illumina recommends using additional software to prevent unexpected termination of analysis. Illumina recommends screen and tmux.
The pipeline depends on the DRAGEN Application Manager. For issues related to the DRAGEN Application Manager installation, refer to the DRAGEN Application Manager installation guide.
Ensure DRAGEN App Manager is running properly.
Ensure Docker is running properly. For docker configuration help, please check the DRAGEN Application Manager installation guide and docker.org documentation.
📂 Heme_Nextflow_logs—(ICA only) - Contains information related to the execution of the pipeline as a whole and for specific nodes (when an analysis is split across multiple nodes). It contains files used to execute parts of the workflow on different nodes as well as records of the Nextflow execution on those nodes.
Nextflow output folders differ across platforms.
Superb performance based on the DRAGEN BioIT platform Release 4.4.4
Supports starting the analysis from BCL, FASTQ, BAM or CRAM as inputs.
Flexible custom configurable options on top of well established DRAGEN recipes for Heme WGS analysis.
Available on local DRAGEN servers and Illumina Connected Analytics (ICA)
Seamless integration with Illumina Connected Insights (ICI) for tertiary interpretation
Illumina DNA PCR Free Prep Kit
Illumina DNA Prep Kit
Custom LPKs
NovaSeq 6000 or 6000Dx in RUO mode
NovaSeq X or NovaSeq X plus
Note Unsupported instruments can still be analyzed, but a warning will be generated.
NovaSeq 6000 or 6000Dx S4
NovaSeq X or NovaSeq X plus 10B, 25B


No
Only Header and TN_Data sections required in SampleSheet.csv
All the clinical research workflow pipelines support only v2 sample sheet and requires index2 to be in forward orientation, with bcl-convert SoftwareVersion >= 4.4. The pipelines are stil compatible with legacy sample sheets where the BCLConvert_Settings section has SoftwareVersion < 4.4. Sample sheet v1 is no longer supported.
The clinical research workflow pipelines in the ICA cloud support post-processing scripts to be executed after the completion of the pipeline analysis.
The clinical research workflow pipelines support automatic data ingestion or manual upload into ICI for variant interpretation after the analysis is completed in ICA, or using a DRAGEN on-premises server locally.
Yes
Mixed flow cell, auto-launch
Copy the run or FASTQ folder to the DRAGEN server into the staging folder with the following recommended organization: /staging/runs/{RunID}. You can copy the run folder onto the DRAGEN server using Linux commands such as rsync. The sample sheet within the run folder is used unless otherwise specified through the command line.
Run folder must be intact.
If the analysis output folder path is different from the default, provide the analysis output folder path.
Before running the analysis, confirm that the output directory for the software to write to is empty and does not include results of previous analyses.
The DRAGEN server provides an NVMe SSD in the /staging directory to use as the software output directory. Network-attached storage is required for long-term storage.
When running the Heme pipeline, use the default settings or set the -analysisFolder command line option to a directory in /staging to make sure the DRAGEN server processes read and write data on the NVMe SSD.
Before beginning analysis, develop a strategy to copy data from the DRAGEN server to a network‑attached storage. Delete output data on the DRAGEN server as soon as possible.
The following are the run folder output size estimates and the minimum free space requirements for fastq.gz or fastq.ora output format.
When launching the analysis, the software checks that the minimum disk space required is available. If the minimum disk space is not available, the software shows an error message and prevents analysis from starting. If disk space is exhausted during a run, the run shows an error and stops analyzing.
Moving or modifying files during an analysis may cause the analysis to fail or provide incorrect results.
Analysis of data stored on network file system may be slow when there are multiple DRAGEN servers reading and writing to the network file system simultaneously. However, it is advisable to use a network filesystem to stream large datasets from NFS when data transfer to local /staging is taking a significant amount of time, especially for NovaSeq X 25B flow cells. Discuss with your system administrator for of the DRAGEN server.
This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.
On the ICA (Illumina Connected Analytics) user interface (UI) to the software, you can specify the Custom Parameters Config File and Custom Resources Directory directly. Supported customizable options are described below.
heme_custom_param.config Contentcustom_resources_Heme_dir Folder Structure on ICAℹ️ Note: Custom resource files and the custom configuration file must be uploaded to the same ICA project where the run is created. You can use the
icav2client or other supported methods. See for details.
This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.
On the ICA (Illumina Connected Analytics) user interface (UI) to the software, you can specify the Custom Parameters Config File and Custom Resources Directory directly. Supported customizable options are described below.
solid_custom_param.config Contentcustom_resources_Heme_dir Folder Structure on ICAℹ️ Note: Custom resource files and the custom configuration file must be uploaded to the same ICA project where the run is created. You can use the
icav2client or other supported methods. See for details.
The DRAGEN Heme WGS Tumor Only Pipeline is launched with the bash script called run_Heme_WGS_TO_{version}.sh, which is installed in the /usr/local/bin directory. The bash script is executed on the command line and runs the software using DRAGEN Application Manager. For a full list of command-line options, refer to .
To launch an analysis, you must provide the --inputType and --inputFolder arguments. The --inputType argument can be bcl, fastq, bam, or cram. When starting from a sequencing system run folder containing BCL files, --inputType must be bcl and --inputFolder is the absolute path to the full run folder. When starting from FASTQ, BAM, or CRAM files --inputFolder may also be a comma separated list of folders. If more than one input folder is specified, the --sampleSheet argument must also be provided with the absolute path to a valid Sample Sheet (refer to
Analysis output is written to /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}_Analysis_{datetimestamp} by default. To write to a different output directory, run the bash script with --analysisFolder <FULL_PATH_TO_ANALYSIS_FOLDER>.
The --demultiplexOnly flag runs the pipeline through FASTQ Generation only, and these outputs can be used for splitting a run into smaller batch analyses with --inputType fastq and the --sampleIDs argument.
The Illumina Connected Insights (ICI) platform can be used to interpret and visualize analysis results from the Heme pipeline. Analysis results can be provided to ICI via a manual upload for local analyses and via auto-ingestion for Illuminia Connected Analytics (ICA) analyses.
Access to Illumina Connected Analytics
Access to Illumina Connected Insights
Refer to the ICI support site page for information on
The DRAGEN Heme WGS Tumor Only Pipeline is launched with the bash script called run_Solid_WGS_TN_{version}.sh, which is installed in the /usr/local/bin directory. The bash script is executed on the command line and runs the software using DRAGEN Application Manager. For a full list of command-line options, refer to .
To launch an analysis, you must provide the --inputType and --inputFolder arguments. The --inputType argument can be fastq, bam, or cram. The --inputFolder may be the absolute path to the input folder or it may be a comma separated list of path. If more than one input folder is specified, the --sampleSheet argument must also be provided with the absolute path to a valid Sample Sheet (refer to ). If the --sampleSheet argument is not provided, the software checks for a file named SampleSheet.csv in the input folder.
Analysis output is written to /staging/DRAGEN_Solid_WGS_Tumor_Normal_Pipeline_{version}_Analysis_{datetimestamp} by default. To write to a different output directory, run the bash script with --analysisFolder <FULL_PATH_TO_ANALYSIS_FOLDER>.
Sample Sheet templates for the Heme pipeline for standalone DRAGEN server and ICA manual launch analysis can be found in the table below. For auto-launch compatible sample sheets, use BaseSpace Run Planner.
The Heme pipeline is compatible with several instruments and assay workflows (standard, XP), each of which have implications for the sample sheet.
DRAGEN Solid WGS Tumor Normal Pipeline, henceforth referred as the Solid WGS TN Pipeline, is a comprehensive and unbiased whole genome sequencing solution for detection of all types of mutation in matched tumor and normal samples. It can be applied to detect clinically actionable mutations for cancer spanning a wide range of genomic events, e.g., structural variants (SV), copy number alterations (CNA), small variants (SNV/insertion/deletion/delins).
The Solid WGS TN pipeline includes a DNA-only workflow designed to analyze whole genome sequencing data generated on supported instruments. It may be run as a local off-instrument solution installable on a DRAGEN server or accessible through the Illumina Connected Analytics (ICA) cloud environment. The Solid WGS TN pipeline is for Research Use Only (RUO).
The TN pipeline may contain additional user defined fields such as Sex, Tumor Type or Case ID for use with variant interpretation in ICI.
The following sample sheet requirements describe required and optional fields for TN pipeline. It must contain fhe follwing sections.
The analysis fails if the sample sheet requirements are not met.
The pipeline may be downloaded and installed on a local DRAGEN server. A download utility may be obtained from the Illumina download site, and the download utility will manage all the dependencies. Once the required installers are downloaded, the software may be installed by running the installers.
With the NovaSeq X 25B flow cells, the amount of data is on the order of terabytes, which may take a few hours or more to copy to the /staging folder on the local DRAGEN server. Using NFS storage directly for input and output is recommended in this case.
On the BSSH Run Planner, custom parameters and custom resource files can also be specified during Run Planning.
Custom resource files must be uploaded to BaseSpace under the same project to be selectable during run planning. Supported customizable options are described in the Custom Configuration Support section of each application.
See for additional details.
ICI supports variant interpretation with advance visualization capabilities. It is available in the cloud or on a local DRAGEN server.
Superb performance based on the DRAGEN BioIT platform Release 4.4.4
Supports starting the analysis from FASTQ (.gz or .ora format), BAM or CRAM as inputs
Flexible custom configurable options on top of well established DRAGEN recipes for Solid WGS TN analysis.
Available on local DRAGEN servers and Illumina Connected Analytics (ICA)
Seamless integration with Illumina Connected Insights (ICI) for tertiary interpretation
No specific requirements on LPKs since the pipeline does not support starting from BCL in the curent release.
No specific requirements on instruments since the pipeline does not support starting from BCL in the curent release.

5300
Other Instruments
~2000
4000
2500
NovaSeq 6000/6000Dx (RUO) S4 Flow Cell
~2000
4000
2500
NovaSeq X 10B
~2000
4000
2500
NovaSeq X 25B
~4250
8500
--sampleSheet argument is not provided, the software checks for a file named SampleSheet.csv in the input folder.

## custom parameters
vc_output_evidence_bam = false
qc_detect_contamination = true
aligner_clip_pe_overhang = 0
## custom reference files
vc_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
sv_systematic_noise = '/sv/WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz'
vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'custom_resources_Solid/
├── snv
│ ├── WGS_Solid_hg38_v1.0_systematic_noise.snv.bed.gz
│ └── somatic_hotspots_GRCh38.vcf.gz
└── sv
└── WGS_Solid_FF_solid_hg38_v1.0_systematic_noise.sv.bedpe.gz## custom parameters
somatic_vc_output_evidence_bam = false
germline_qc_detect_contamination = true
germline_aligner_clip_pe_overhang = 0
## custom reference files
somatic_sv_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
somatic_sv_systematic_noise = '/sv/WGS_FF_solid_hg38_v1.0_systematic_noise.sv.bedpe.gz'
somatic_vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'custom_resources_Solid/
├── snv
│ ├── WGS_Solid_hg38_v1.0_systematic_noise.snv.bed.gz
│ └── somatic_hotspots_GRCh38.vcf.gz
└── sv
└── WGS_Solid_FF_solid_hg38_v1.0_systematic_noise.sv.bedpe.gzDRAGEN Phase 3 or 4 server
DRAGEN License
Network storage server
DRAGEN phase 4 server is recommended especially for datasets from NovaSeq X instruments. The server has 12 TB of intermediate data storage space for full processing of a NovaSeq X 25B flow cell.
The DRAGEN phase 3 server has 6 TB of intermediate data storage space, which can accommodate for flow cells from the NovaSeq 6000 or 6000 Dx instruments.
The Heme pipeline uses the standard DRAGEN license without requiring any special licenses.
The Heme pipeline is designed to stream data from a network file server onto the DRAGEN server, complete the analysis using the /staging area of the high performance SSD and then stream the analysis output back to the network file server.
The network file server may be mounted to the DRAGEN server using the NFS or CIFS protocol (SMB 1.0). SMB 2.0 or higher is recommended with Active Directory support if the SMB protocol is used.
If starting from BCL (*.bcl) files, the Heme pipeline requires the run folder to contain certain files and folders.
The run folder contains data from the sequencing run, make sure that the folder contains the following files:
Config folder
Configuration files
Data folder
*.bcl files
Images folder
[Optional] Raw sequencing image files.
Interop folder
Interop metric files.
Logs folder
[Optional] Sequencing system log files.
RTALogs folder
Real-Time Analysis (RTA) log files.
The following inputs are required for running the using FASTQ (*.fastq) files.
Full path to an existing FASTQ folder.
The FASTQ folder structure conforms to the folder structure in FASTQ File Organization..
The sample sheet is in the FASTQ folder path, or you can set the path to the sample sheet with the --sampleSheet override command line option.
Make sure there is sufficient disk space for the analysis to complete. Refer to the --help command line argument details for disk space requirements.
Use BCL Convert to produce FASTQ files for the Heme pipeline. Using bcl2fastq does not produce the same results and is discouraged.
Store FASTQ files in individual subfolders that correspond to a specific Sample_ID. Keep file pairs together in the same folder. Alternatively, store the FASTQ files in one flat folder structure where the FASTQ files are stored in one folder.
The Heme pipeline requires separate FASTQ files per sample. Do not merge FASTQ files.
The instrument generates two FASTQ files per flow cell lane, so that there are eight FASTQ files per sample.
Sample1_S1_L001_R1_001.fastq.gz
Sample1 represents the Sample ID.
The S in S1 means sample, and the 1 in S1 is based on the order of samples in the sample sheet, so S1 is the first sample.
L001 represents the flow cell lane number.
The R in R1 means Read, so R1 refers to Read 1.
Local Dragen Server
4.4.4.62
run_Heme_WGS_TO_{version}.sh
/usr/local/bin
See
ICA
a11697ba-1144-4dc6-9e22-f21dff29f747
icav2
ICA Pipelines
See
ICA
urn:ilmn:ica:pipeline:a11697ba-1144-4dc6-9e22-f21dff29f747#Heme_WGS_TO_v4_4_4_62
supported browser
ICA UI
See
{version} is used to represent the software version number in Table 1 above. Similarly, <pipeline_run_script> is used to indicate the client program name in this document.
The software may be downloaded and installed by following the installation guide. It may be executed using a local DRAGEN server or on a local computer which launches the analysis in the ICA cloud environment.
The command line program may be used to launch an analysis by using the <pipeline_run_script> with the appropriate options.
start from bcl
start from one or more input folders when using FASTQ, BAM or CRAM files
Multiple folders may be specified as input folders in comma separated values when using FASTQ, BAM or CRAM files as input.
Pressing Ctrl+C during a DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.
Here is an example of starting an analysis using the ICA client by providing the necessary command parameters and specify a particuar storage size for analysis in ICA.
The same analysis example above may be completed using the ICA UI by logging into the appropriate domain of your company and project where the Heme pipeline is set up.
Find more information in the ICA Cloud App Launch Guide.
Coming soon.
Coming soon.
Coming soon.
Sample sheet templates contain all required fields, including index sequences in the proper orientation for all indexes from a given library prep kit. The templates are provided as a starting point for creating a sample sheet manually when launching analysis on a standalone DRAGEN server or on ICA using manual launch.
For interactive run planning or to create a sample sheet for ICA Autolaunch, use BaseSpace Run Planner to create valid sample sheets for either local or cloud analysis. To set up a run in BaseSpace run planner, refer to Sample Sheet Creation in BaseSpace Run Planner.
Users can visit the Sample Sheet guidelines section to learn additional details on required fields and values as they fill-in their sample information. Use the lookup table below to select and download the sample sheet template that matches your instrument, assay, and workflow configuration:
NovaSeq 6000Dx (RUO)
Standard or XP
-
NovaSeq 6000
-
NovaSeq X
Standard or XP
-
-
*Lane numbers cannot exceed what is supported by the flow cell in use.
Local Dragen Server
4.4.4.53
run_Solid_WGS_TN_{version}.sh
/usr/local/bin
See
ICA
c18e9e69-0a74-4c43-a419-a62cb7c6abc0
icav2
ICA Pipelines
See
ICA
urn:ilmn:ica:pipeline:c18e9e69-0a74-4c43-a419-a62cb7c6abc0#Solid_WGS_TN_v4_4_4_53
supported browser
ICA UI
See
{version} is used to represent the software version number in Table 1 above. Similarly, <pipeline_run_script> is used to indicate the client program name in this document.
The software may be downloaded and installed by following the installation guide. It may be executed using a local DRAGEN server or on a local computer which launches the analysis in the ICA cloud environment.
The command line program may be used to launch an analysis by using the ${CLI_program} with the appropriate options.
start from one or more input folders when using FASTQ, BAM or CRAM files
Multiple folders may be specified as input folders in comma separated values when using FASTQ, BAM or CRAM files as input.
Pressing Ctrl+C during a Solid_WGS_TN_DRAGEN step stops the currently running analysis and might cause an FPGA error. To recover from an FPGA error, shut down and restart the server.
Here is an example of starting an analysis using the ICA client by providing the necessary command parameters and specify a particuar storage size for analysis in ICA.
The same analysis example above may be completed using the ICA UI by logging into the appropriate domain of your company and project where the pipeline is set up.
Find more information in the ICA Cloud App Launch Guide.
FileFormatVersion
2
v2 sample sheet format
Sample_ID
Required
The unique ID to identify a sample. The sample ID is included in the output file names. Sample IDs are not case sensitive. Sample IDs must have the following characteristics:
- Unique for the run.
- 1–70 characters.
- No spaces.
- Alphanumeric characters with underscores and dashes. If you use an underscore or dash, enter an alphanumeric character before and after the underscore or dash. eg, Sample1-T5B1_022515.
- Cannot be called all, default, none, unknown, undetermined, stats, or reports.
- Must match a Sample_ID listed in the [BCLConvert_Data] section. Each sample must have a unique combination of Lane (if applicable), sample ID, and index ID or the analysis will fail.
Case_ID
Required
A unique ID that links the same biological samples from the same individual. It is used for variant interpretation in downstream software such as the Illumina Connected Insights software
Sample_Type
Required
Possible value is DNA.
Sample_Classification
Required
To ensure a successful analysis, follow these guidelines:
Avoid any blank lines at the end of the sample sheet; these can cause the analysis to fail.
When running local analysis using the command line save the sample sheet in the sequencing run folder with the default name SampleSheet.csv, or choose a different name and specify the path in the command-line options.
For command-line options, refer to Table 1: Shell Script Command-Line Options for details.
CAUTION: Do not run analyses as the root user as it can lead to permissions issues when managing data generated by the software.
The DRAGEN secondary analysis software utilizes a highly reconfigurable Field Programmable Gate Array (FPGA) card and is available on a preconfigured DRAGEN server that can be seamlessly integrated into bioinformatics workflows. The platform can be loaded with highly optimized algorithms for many different NGS secondary analysis pipelines, including the following:
Whole genome
Exome
RNA-Seq
Methylome
Cancer
All user interaction is accomplished via DRAGEN software that runs on the host server and manages all communication with the FPGA card. This user guide summarizes the technical aspects of the system and provides detailed information for all DRAGEN command line options. If you are working with DRAGEN for the first time, Illumina recommends that you first read the Getting Started section, which provides a short introduction to DRAGEN, including running a test of the server, generating a reference genome, and running example commands.
DRAGEN DNA Pipeline
The DRAGEN DNA Pipeline massively accelerates the secondary analysis of NGS data. For example, the time taken to process an entire human genome at 30x coverage is reduced from approximately 10 hours (using the current industry standard, BWA-MEM+GATK-HC software) to approximately 20 minutes. Time scales linearly with coverage depth.
These pipelines harness the tremendous power of the DRAGEN server and include highly optimized algorithms for mapping, aligning, sorting, duplicate marking, and haplotype variant calling. They also use platform features such as hardware-accelerated compression and optimized BCL conversion, together with the full set of platform tools.
Unlike all other secondary analysis methods, DRAGEN DNA Applications do not reduce accuracy to achieve speed improvements. Accuracy for both SNPs and INDELs is improved over that of BWA-MEM+GATK-HC in side-by-side comparisons.
In addition to haplotype variant calling, the pipeline supports calling of copy number and structural variants as well as detection of repeat expansions.
DRAGEN secondary anaylsis includes an RNA-seq (splicing-aware) aligner, as well as RNA-specific analysis components for gene expression quantification and gene fusion detection.
The DRAGEN RNA Pipeline shares many components with the DNA Pipeline. Mapping of short seed sequences from RNA-Seq reads is performed similarly to mapping DNA reads. In addition, splice junctions (the joining of noncontiguous exons in RNA transcripts) near the mapped seeds are detected and incorporated into the full read alignments.
DRAGEN secondary analysis uses hardware accelerated algorithms to map and align RNA-Seq--based reads faster and more accurately than popular software tools. For instance, it can align 100 million paired-end RNA-Seq--based reads in about three minutes. With simulated benchmark RNA-Seq data sets, its splice junction sensitivity and specificity are unsurpassed.
The DRAGEN Methylation Pipeline provides support for automating the processing of bisulfite sequencing data to generate a BAM with the tags required for methylation analysis and reports detailing the locations with methylated cytosines.
<pipeline_run_script> --help # list all supported parameters
<pipeline_run_script> --inputType bcl \
--inputFolder /staging/input-folder \
--analysisFolder /staging/output-folder<pipeline_run_script> --inputType <fastq|bam|cram> \
--inputFolder /staging/input-folder-1,/staging/input-folder-2 \
--analysisFolder /staging/output-foldericav2 projectpipelines start nextflow ${PIPELINE_ID} \
--project-id ${ANY_PROJECT_ID} \
--storage-size Large \
-o json \
--input ${ANY_SAMPLE_SHEET} \
--input ${ANY_INPUT_DIR} \
--parameters inputType:'bcl' \
--parameters referenceGenome:'hg38' \
--parameters oraCompressionEnabled:'true' \
--parameters sampleIds:'1267-Prostate-Del-R1,741-Lung-SNV-R1' \
--user-reference ${ANY_USER_REFERENCE}<pipeline_run_script> --inputType <fastq|bam|cram> \
--inputFolder /staging/input-folder-1,/staging/input-folder-2 \
--analysisFolder /staging/output-foldericav2 projectpipelines start nextflow ${PIPELINE_ID} \
--project-id ${ANY_PROJECT_ID} \
--storage-size Large \
-o json \
--input ${ANY_SAMPLE_SHEET} \
--input ${ANY_INPUT_DIR} \
--parameters inputType:'fastq' \
--parameters referenceGenome:'hg38' \
--parameters sampleIds:'Sample1,Sample2' \
--user-reference ${ANY_USER_REFERENCE}RunInfo.xml file
Run information.
RunParameters.xml file
Run parameters.
SampleSheet.csv file
Sample information. If you want to use a sample sheet that is not in the run folder or a sample sheet named something other than SampleSheet.csv, provide the full path.
Possible values are Tumor or Normal.
Specimen_Type
Required
Possible values are FFPE (Formalin-Fixed, Paraffin-Embedded), FF (Fresh Frozen) for Tumor sample classification. No restrictions on a sample classification of Normal
Sex
Optional
Possible values are Male, Female or Unknown
Tumor_Type
Optional
Support tumor type code based on the SNOMED ontology
Sample_Description
Optional
Free text description for the sample
- Local



--sampleOrCaseIDs
No
The comma-delimited sample IDs (or CaseID) that are processed by the run. For example, Sample_1,Sample_2.
--referenceGenome
No
Specify the reference genome to use for alignment. Possible values: hg38 or hs37d5_chr. Default is hg38.
--disableOraCompression
No
Specify to disable Ora compression.
--customResourceDir
No
Provide custom resource directory path.
--customConfig
No
Provide custom config file path.
--keepFullWorkDir
No
Copy entire work dir to analysis output folder. Default behavior is to copy only nextflow logs.
--version
No
Displays the version of the software, and then exits.
--help
No
Displays the help text.
--inputType
Yes
Possible values include fastq, bam, cram.
--inputFolder
Yes
Input folder containing {input type} files. Multiple folders can be specified as a comma separated list.
--sampleSheet
No
Full path to the sample sheet file. If the sample sheet is named SampleSheet.csv and is located in the single input folder (depending on how the analysis is initiated), this command is not required.
--analysisFolder
No
Full path to the alternative analysis folder. Default is /staging/DRAGEN_Solid_WGS_Tumor_Normal_Pipeline_{version}_Analysis_{datetimestamp} if not specified. This folder must have enough available free space for the analysis and be on an NVMe SSD partition to achieve high performance.
--sampleIDs
No
The comma-delimited sample IDs that are processed by the run. For example, Sample_1,Sample_2.
--referenceGenome
No
Specify the reference genome to use for alignment. Possible values: hg38 or hs37d5_chr. Default is hg38.
--disableOraCompression
No
Specify to disable Ora compression.
--demultiplexOnly
No
Demultiplex to generate FASTQ files only without further analysis.
--customResourceDir
No
Provide custom resource directory path.
--customConfig
No
Provide custom config file path.
--keepFullWorkDir
No
Copy entire work dir to analysis output folder. Default behavior is to copy only nextflow logs.
--version
No
Displays the version of the software, and then exits.
--help
No
Displays the help text.
--inputType
Yes
Possible values include bcl, fastq, bam, cram.
--inputFolder
Yes
Input folder containing {input type} files. Multiple {input type, except bcl} folders can be specified as a comma separated list.
--sampleSheet
No
Full path to the sample sheet file. If the sample sheet is named SampleSheet.csv and is located in the run or fastq folder (depending on how the analysis is initiated), this command is not required.
--analysisFolder
No
Full path to the alternative analysis folder. Default is /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}Analysis{datetimestamp} if not specified. This folder must have enough available free space for the analysis and be on an NVMe SSD partition to achieve high performance.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--vc-target-bed $PATH
Restrict the variants called to a target bed. For WTS, a bed file specifying the gene-coding regions should be provided to avoid calling erroneous variants in non-coding regions due to noisy reads.
--rna-library-type
Set the library according to the read orientations. Set to 'A' to auto detect the correct read orientation. Alternatively select 'IU', 'ISR', 'ISF', 'U', 'SR', or 'SF'.
--rna-splice-variant-normals $PATH
Optional setting list of normal splice variants that will be used filter false positive calls. The file should be a tab separated file with the following first four columns: (1) contig name, (2) first base of the splice junction (1-based), (3) last base of the splice junction (1-based), (4) strand (0: undefined, 1: +, 2: -).
--rna-splice-variant-regions $PATH
Target region bed file. Required for panels. The name of the region must be specified in the fourth column.
--rna-gf-enriched-regions $PATH
For panels, the list of enriched genes should be set, either as a list of genes or a list of regions in BED format.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-rna true
--annotation-file $GTF #GTF or GFF3 format
--enable-map-align true #required for RNA/scRNA
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# RNA Quantification
--enable-rna-quantification true
--rna-library-type A #see 'RNA Quant'
--rna-quantification-gc-bias true
# RNA Splice Variants
--enable-rna-splice-variant true
# RNA Gene Fusions
--enable-rna-gene-fusion true --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
To change the barcode or binning index positions, use --scrna-barcode-position and --scrna-umi-position. These settings should be provided in the form <startPos>_<endPos> for each barcode. Connect multiple barcode sequence positions with a '+'.
For example, a library with the cell-barcode split into three blocks of 9 bp separated by fixed linker sequences and an 8 bp UMI would be set to: --scrna-barcode-position 0_8+21_29+43_51, and --scrna-umi-position 52_59.
The following table list some optional settings:
--enable-single-cell-rna true
Option to enable single-cell rna mode.
--scrna-barcode-position
See example above or refer to
--scrna-umi-position
See example above or refer to
--single-cell-threshold
Cell filtering can be set to ['fixed', 'ratio', or 'inflection'].
--scrna-barcode-sequence-list
A known barcode sequence list can be optionally provided.
--umi-source
Optionally override the default barcode/BI source, valid option inclde ['read1', 'read2', 'qname', 'fastq'].
For more details on single-cell RNA options, refer to the DRAGEN Single-Cell RNA User Guide.
Create a Project: Project can be specific for the DRAGEN Heme WGS Tumor Only v4.4.4 Pipeline or it can contain multiple Pipelines and/or Tools. For information on creating Projects, refer to the Projects section in Illumina Connected Analytics help. ICA standard storage is used by default as soon as the Project is saved. To connect a different storage source, set it up before creating your Project. For details and options, refer to the Storage section in Illumina Connected Analytics help.
Edit Project and Add Bundle: Edit the Project and add the bundle titled, "Heme WGS TO v4.4.4 (XX)." XX is a 2-letter code designating the region from which you are launching the analysis. Adding the Bundle automatically adds the pipeline and associated resource files and datasets to the Project. For information on Bundles, refer to the Bundles section in Illumina Connected Analytics help. After adding the Bundle to the Project, an example dataset becomes available in the Demo_Data folder for the Project.
Upload the sequencing data: For information on viewing and uploading data, refer to the Data section in .
Start Analysis: In the Project, navigate to Pipelines, select the Heme WGS TO v4_4_4_x Pipeline, and then select "Start New Analysis". Set up the new analysis by configuring the parameters listed in the . When the required files are completed, start analysis.
Download Results: After analysis is complete, navigate to results in the configured output location.
Please see the Illumina Support Shorts for guidance on how to set up and run DRAGEN Heme WGS Tumor Only analysis on ICA.
To launch an analysis via the ICA user interface, configure a DRAGEN Heme WGS Tumor Only pipeline analysis with the following parameters.
User Reference
The analysis run name
User Tags
Text labels to help index the analysis.
Notify me when task is completed
Option to receive an email notification when analysis is complete.
Output Folder
The path to the analysis output folder. The default path is the project output folder.
Entitlement Bundle
Automatically populated from the project details.
Samplesheet
Select a sample sheet in CSV format for the analysis.To note: Sample Sheet selection is optional if starting from a run folder, and required when submitting a FASTQ folder.
For information about using pipelines, refer to Illumina Connected Analytics support site page.
Create a Project: Project can be specific for the DRAGEN Solid WGS Tumor Normal v4.4.4 Pipeline or it can contain multiple Pipelines and/or Tools). For information on creating Projects, refer to the Projects section in Illumina Connected Analytics help.
ICA standard storage is used by default as soon as the Project is saved. To connect a different storage source, set it up before creating your Project. For details and options, refer to the Storage section in Illumina Connected Analytics help.
Edit Project and Add Bundle: Edit the Project and add the bundle titled, "Solid WGS TN v4.4.4 (XX)." XX is a 2-letter code designating the region from which you are launching the analysis. Adding the Bundle automatically adds the pipeline and associated resource files and datasets to the Project. For information on Bundles, refer to the Bundles section in Illumina Connected Analytics help.
After adding the Bundle to the Project, an example dataset becomes available in the Demo_Data folder for the Project.
Upload the sequencing data: For information on viewing and uploading data, refer to the Data section in Illumina Connected Analytics help.
Start Analysis: In the Project, navigate to Pipelines, select the Solid WGS TN v4_4_4_x Pipeline, and then select "Start New Analysis". Set up the new analysis by configuring the parameters listed in the table below. When the required files are completed, start analysis.
Download Results: After analysis is complete, navigate to results in the configured output location.
Please see the Illumina Support Shorts for guidance on how to set up and run DRAGEN Solid WGS Tumor Normal analysis on ICA.
To launch an analysis via the ICA user interface, configure a DRAGEN Solid WGS Tumor Normal pipeline analysis with the following parameters.
User Reference
The analysis run name
User Tags
Text labels to help index the analysis.
Notify me when task is completed
Option to receive an email notification when analysis is complete.
Output Folder
The path to the analysis output folder. The default path is the project output folder.
Entitlement Bundle
Automatically populated from the project details.
Samplesheet
Select a sample sheet in CSV format for the analysis.To note: Sample Sheet selection is optional if starting from a run folder, and required when submitting a FASTQ folder.
For information about using pipelines, refer to Illumina Connected Analytics support site page.
For more information about using ICA and BaseSpace Sequence Hub or running a pipeline Analysis Software analysis on ICA, refer to the relevant support pages on the Illumina support site.
DRAGEN provides tests you can run to make sure that your DRAGEN system is properly installed and configured. Before running the tests, make sure that the DRAGEN server has adequate power and cooling, and is connected to a network that is fast enough to move your data to and from the machine with adequate performance.
Please refer to the Server Site Prep & Installation Guide when installing a new system.
The software can be installed on an on-premises server by executing the .run installer for the desired version. Installers are made available for all releases at the DRAGEN Software Support Site page.
Installation procedure:
Download the desired installer from the support website and unzip the package
The archive integrity can be checked using: ./<dragen .run file> --check
Install the appropriate release based on your Linux OS with the command: sudo sh <dragen .run file>
The .run file includes a script that administers un-installation of an existing software, integrity checking of the package and files, installation of the new DRAGEN software version. The DRAGEN software is installed in part by use of the Linux RPM Package Manager (rpm). Several rpm packages comprise the installation of a single DRAGEN software version. The RPM packages also configure the system for dragen, like raised user ulimits, and the .run script starts services needed for functionality, such as the Licensing daemon dragen_licd, and the hugepages daemon, dragend_hp.
NOTE: Root privileges are required for the installation.
Up to DRAGEN Software v4.2, only one version of the DRAGEN software can be installed at a time. Executing the .run file will remove any existing installed version and (re)install the new version.
After installation, the application and associated files are available at /opt/edico.
The single version installer will add /opt/edico to the Linux $PATH, so that the user can just call dragen without specifying the full path.
Starting with DRAGEN Software v4.3 and later, multiple compatible versions of the DRAGEN software can be installed at a time. Executing the .run file will add the new version to the system.
After installation, the application files are available at /opt/dragen/{version}/bin and FPGA files are located at /opt/bitstream/{bitstream version}.
The multi-version installer will NOT add /opt/dragen/{version}/bin to the Linux $PATH, since multiple versions can be present at a given time. User should manage the desired paths to the specific version they want to run. When this guide provides command line examples, it will assume that the Linux $PATH is set to correct dragen version, and we will just refer to dragen <options>
Notes on multi-version installation:
Installers released for DRAGEN v4.2 and earlier are single version packages
Single version packages and multi-version packages can not be mixed
Installation of a prior single version package will remove all the multi-version packages
Installation of a multi-version package will remove any installed single version package
Example:
dragen and resource filesThroughout this guide we will refer to <INSTALL_PATH> which will be either of the locations above
DRAGEN requires license(s) for most functionality, please refer to the for guidance on how to install and/or review your current licenses.
After turning on the server, you can make sure that your DRAGEN server is functioning properly by running <INSTALL_PATH>/self_test/self_test.sh, which does the following:
Automatically indexes chromosome M from the hg19 reference genome
Loads the reference genome and index
Maps and aligns a set of reads
Saves the aligned reads in a BAM file
Each server ships with the test input FASTQ data for this script, which is located in <INSTALL_PATH>/self_test. The system check takes approximately 25--30 minutes.
The following example shows how to run the script and shows the output from a successful test.
If the output BAM file does not match expected results, then the last line of the above text is as follows:
SELF TEST RESULT : FAIL
If you experience a FAIL result after running this test script immediately after turning on your DRAGEN server, contact Illumina Technical Support.
When you are satisfied that your DRAGEN system is performing as expected, you are ready to run some of your own data through the machine, as follows:
Load the reference table for the reference genome
Determine location of input and output files
Process input data
Before a reference genome can be used with DRAGEN, it must be converted from FASTA format into a custom binary format for use with the DRAGEN hardware. For more information, see .
The reference hash table specified on the command line is automatically loaded onto the board the first time you process data with a pipeline. You can manually load the hash table for your reference genome by using the following command:
dragen -r <reference_hash-table_directory>
Make sure that the reference hash table directory is on the fast file IO drive.
The default location for the hash table for hg19 is as follows.
/staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
The command to load reference genome hg19 from the default location is as follows.
dragen -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
This command loads the binary reference genome into memory on the DRAGEN board, where it is used for processing any number of input data sets. You do not need to reload the reference genome unless you restart the system or need to switch to a different reference genome. It can take up to a minute to load a reference genome.
DRAGEN checks whether the specified reference genome is already resident on the board. If it is, then the upload of the reference genome is automatically skipped. You can force reloading of the same reference genome using the force-load-reference (-l) command line option.
The command to load the reference genome prints the software and hardware versions to standard output. For example:
After the reference genome has been loaded, the following message is printed to standard output:
The DRAGEN Pipeline is very fast, which requires careful planning for the locations of the input and output files. If the input or output files are on a slow file system, then the overall performance of the system is limited by the throughput of that file system. It is recommended that inputs and outputs are streamed directly from/to a mounted external storage system.
The DRAGEN system is preconfigured with at least one fast file system consisting of a set of fast SSD disks grouped with RAID-0 for performance. This file system is mounted at /staging. This name was chosen to emphasize the fact that this area was built to be large and fast, but is not redundant. Failure of any of the file system's constituent disks leads to the loss of all data stored there.
During processing, DRAGEN generates and reads back temporary files. With DRAGEN, it is highly recommended to always direct temporary files to the fast SSD (or /staging) by using the --intermediate-results-dir option. If the --intermediate-results-dir option is not provided, temporary files are written to the --output-directory. DRAGEN recommends streaming inputs and outputs using an mounted external storage system.
To analyze FASTQ data, use the dragen command. For example, the following command can be used to analyze a single-ended FASTQ file:
For detailed information on the command line options, see .
For recommended command lines in typical use cases, see .
Autolaunch requires additional BaseSpace Sequence Hub and sample sheet settings.
Autolaunch uses the BaseSpace Sequence Hub (BSSH) run planning tool to create and export a v2 format sample sheet to enable streaming of sequencing run data to the project and requires the following additional settings. See Figure 1 below.
Access to BaseSpace Sequence Hub.
ICA Run Storage is enabled under BaseSpace Sequence Hub settings.
Refer to the BaseSpace Sequence Hub support site page for information on .
Autolaunch requires a v2 format sample sheet with specific parameters that instruct the BSSH project to automatically initiate a Heme pipeline analysis in ICA. Use the run planning option in BaseSpace Sequence Hub to generate the sample sheet. The exported sample sheet is automatically populated with the required fields. Using an invalid sample sheet can result in failed runs and analyses.
Refer to Table 1 below for descriptions of the added fields. Enter the following required run parameters in BaseSpace Sequence Hub Run Planning:
For more information on run planning, refer to the the
The BaseSpace Sequence Hub setting for run monitoring and storage must be selected on the instrument to use Heme pipeline Analysis Software analysis Autolaunch. For information on preparing your instrument for DRAGEN Heme App for Whole-Genome Sequencing Analysis Software Autolaunch, refer to the documentation for your instrument.
Use Run Planning in BaseSpace Sequence Hub to create and export a sample sheet.
Import the sample sheet to the instrument and start the sequencing run. Data is uploaded to BaseSpace Sequence Hub and then pushed to ICA. You can monitor the run in BaseSpace Sequence Hub.
When sequencing and the upload completes, analysis autolaunches in ICA. You can monitor the status of the analysis in BaseSpace Sequence Hub or ICA
If necessary, requeue the analysis via the run's Summary page in BaseSpace Sequence Hub. Refer to the BaseSpace Sequence Hub support site page for more information on requeuing an analysis.
Table 1. Additional Sample Sheet Fields for Autolaunch
Autolaunch-compatible sample sheets contain the following fields specific to autolaunch configuration.
The Heme pipeline is a DNA only analysis software based on the DRAGEN Secondary Analysis Software. Even though it includes some of the default settings from the DNA Somatic Tumor-Only Heme WGS DRAGEN recipe, it uses a distinct recipe with different options. A user has the ability to override specific parameters via a custom configuration file.
An example command is provided that highlights the input and output used in DragenCaller step of the Heme Pipeline, which may be found in the log file. Any parameter options not displayed on the command line would be using the default value for the DRAGEN variant caller module. The detailed parameters and default arguments for the individual modules within the DragenCaller step may be found in the replay.json output. See DRAGEN Command Line Options for detailed explanations of the parameters.
The Heme pipeline supports two reference genomes for the DRAGEN Map/Aligner - hg38 and hs37d5_chr.
The hs37d5_chr genome is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.
involves aligning sequencing reads derived from DNA libraries to a reference genome prior to variant calling.
The pipeline currently does not support UMI libraries by default. Please use the to generate the collapsed BAM as input, if so desired.
DRAGEN continues to use these final alignments as input for various variant calls such as gene amplification (copy number) calling, small variant calling (SNV, indel, MNV, delin), and DNA library quality control.
DRAGEN supports calling SNVs, indels, MNVs, and delins in tumor-only samples by using mapped and aligned DNA reads from a tumor sample as input. Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. DRAGEN insertions and deletions are validated with lengths of at least 0–25 bp and more than 25 bp can be supported. In addition, DRAGEN also uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp can then be reassembled into complex variants (MNVs and delins). The tumor-only pipeline produces a VCF file containing both germline and somatic variants that can be further analyzed to identify tumor mutations. The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.
DRAGEN small variant calling includes the following steps:
Detects regions with sufficient read coverage (callable regions).
Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).
Assembles de novograph haplotypes are assembled from reads (haplotype assembly).
Extracts possible somatic or germline calls (events) from column wise pileup analysis.
Additional information is available at .
The DRAGEN copy number variant caller performs amplification, reference, and deletion calling for CNV targets within the assay. It counts the coverage of each target interval on the panel, uses a preprocessed panel of normal samples to normalize target counts, corrects for GC coverage bias, and calculates scores of a CNV event from observed coverage and makes copy number calls.
Additional information is available at .
The DRAGEN Structural Variant (SV) Caller is described . The DUX4 rearrangement caller is described .
The Variant Deduplication is described
The contamination analysis step detects foreign human DNA contamination using the SNP error file and pileup file that are generated during the small variant calling and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. In contaminated samples, the variant allele frequencies in SNPs shift from the expected values of 0%, 50%, or 100%. The algorithm collects all positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation. The contamination score is the sum of all the log likelihood scores across the predefined SNP positions with minor allele frequency < 25% in the sample and are not likely due to CNV events.
The larger the contamination score, the more likely there is foreign DNA contamination. A sample is considered to be contaminated if the contamination score is above predefined quality threshold. The contamination score was found to be high in samples with highly rearranged genomes or HRD samples. 1% of HRD samples found to be above the threshold with no evidence for actual contamination.
The Illumina Annotation Engine performs annotation of small variants, CNVs, and exon-level CNVs. The inputs are gVCF files and the outputs are annotated JSON files.
The Heme pipeline currently does not support annotation of gVCF files. Please use the to perform tertiary analysis.
Not Supported in the current release. Please use the .
Not supported in the current release. Please use the .
A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.
This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.
Customizability: Easily adaptable to different post-processing requirements.
Reusability: Can be used in multiple pipelines, reducing development effort.
Data transformation: Can be used to transform or modify output data in various ways.
A config file which has Post-Processing parameters and values
A bash script , that implements desired functioanlity
Any other custom resources/files that will be required by the bash script
Docker container having dependencies to run the bash script
Upload and configure
Modify config file; Set postProcessing_container to the uploaded conatiner
Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.
Configure ICA Web-UI on 'Start Analysis' Page:
A Post-Processing bash script is a , which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.
Autolaunch requires additional BaseSpace Sequence Hub and sample sheet settings.
Autolaunch uses the BaseSpace Sequence Hub (BSSH) run planning tool to create and export a v2 format sample sheet to enable streaming of sequencing run data to the project and requires the following additional settings. See Figure 1 below.
Access to BaseSpace Sequence Hub.
ICA Run Storage is enabled under BaseSpace Sequence Hub settings.
Refer to the BaseSpace Sequence Hub support site page for information on .
Autolaunch requires a v2 format sample sheet with specific parameters that instruct the BSSH project to automatically initiate a pipeline analysis in ICA. Use the run planning option in BaseSpace Sequence Hub to generate the sample sheet. The exported sample sheet is automatically populated with the required fields. Using an invalid sample sheet can result in failed runs and analyses.
Refer to Table 1 below for descriptions of the added fields. Enter the following required run parameters in BaseSpace Sequence Hub Run Planning:
For more information on run planning, refer to the the
The BaseSpace Sequence Hub setting for run monitoring and storage must be selected on the instrument to use pipeline Analysis Software analysis Autolaunch. For information on preparing your instrument for DRAGEN App for Whole-Genome Sequencing Analysis Software Autolaunch, refer to the documentation for your instrument.
Use Run Planning in BaseSpace Sequence Hub to create and export a sample sheet.
Import the sample sheet to the instrument and start the sequencing run. Data is uploaded to BaseSpace Sequence Hub and then pushed to ICA. You can monitor the run in BaseSpace Sequence Hub.
When sequencing and the upload completes, analysis autolaunches in ICA. You can monitor the status of the analysis in BaseSpace Sequence Hub or ICA
If necessary, requeue the analysis via the run's Summary page in BaseSpace Sequence Hub. Refer to the BaseSpace Sequence Hub support site page for more information on requeuing an analysis.
Table 1. Additional Sample Sheet Fields for Autolaunch
Autolaunch-compatible sample sheets contain the following fields specific to autolaunch configuration.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN RNA/scRNA runs, it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
PIPseq mode batch option to automatically set the barcode/BI source, the barcode and binning index positions and the barcode sequence list options.
By default the barcode/BI is read from read 1 and the transcript is obtained from read 2.
To change the barcode or binning index positions, use --scrna-barcode-position and --scrna-umi-position. These settings should be provided in the form <startPos>_<endPos> for each barcode. Connect multiple barcode sequence positions with a '+'.
For example, a library with the cell-barcode split into three blocks of 9 bp separated by fixed linker sequences and an 8 bp UMI would be set to: --scrna-barcode-position 0_8+21_29+43_51, and --scrna-umi-position 52_59.
The following table list some optional settings:
For more details on PIPseq pipeline options, refer to the
When the analysis run completes, the software generates an analysis output in a folder named /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}_Analysis_{datetimestamp}, unless a specific location is specified on the command line. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID. Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
To support the varied designs of amplicon panels and the specific requirements of different analysis types (e.g., SNV, CNV, SV, MSI, RNA fusion, RNA splice variants, and RNA 3'/5' imbalance ratio), panel-specific parameter settings have been integrated into the command-line options. Each supported Pillar panel has a dedicated option, and the details for these RNA panels are listed in the table below:
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
# Mapper
--enable-rna true
--annotation-file $GTF #GTF or GFF3 format
--enable-map-align true #required for RNA/scRNA
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# Single Cell
--enable-single-cell-rna true
--umi-source qname #default='qname'
--scrna-barcode-position $BARCODE_POS
--scrna-umi-position $UMI_POS #see notes
--scrna-barcode-sequence-list $PATH #optional
--single-cell-threshold ratio #['fixed', 'ratio', inflection']
--single-cell-threshold-filterby umi #['umi', 'read'] --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH /opt/edico/bin/dragen \
--ref-dir /staging/dragen-app-manager/resources/Illumina_hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11_r5.0-1 \
--output-directory DragenCaller/Sample-001 \
--output-file-prefix Sample-001 \
--events-log-file DragenCaller/Sample-001/events.csv \
--vc-systematic-noise=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/snv/IDPF_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz \
--vc-enable-germline-tagging=true \
--variant-annotation-data=/staging/dragen-app-manager/resources/Illumina_variant_annotation_data-tmb_annotations_4.4.4-1/tmb_annotations \
--vc-germline-tag-hotspots=false \
--logging-to-output-dir=true \
--gc-metrics-enable=true \
--enable-metrics-json=true \
--enable-map-align=true \
--enable-sort=true \
--enable-duplicate-marking=true \
--enable-variant-caller=true \
--heme-sv=true \
--sv-systematic-noise=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/sv/WGS_FF_Heme_hg38_v3.1.0_systematic_noise.sv.bedpe.gz \
--heme-cnv=true \
--cnv-population-b-allele-vcf=/staging/dragen-app-manager/resources/Illumina_heme-wgs-to-resources_4.4.4.2/cnv/hg38_1000G_phase1.snps.high_confidence.vcf.gz \
--enable-variant-deduplication=true \
--vc-output-evidence-bam=false \
--qc-detect-contamination=true \
--enable-dux4-caller=true \
--max-base-quality=63 \
--tumor-fastq-list Sample-001.fastq_list.csv \
--tumor-fastq-list-sample-id Sample-001 \
--forceNote - Post-Processing feature is avaialable only for ICA Environment.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
# Mapper
--enable-rna true
--annotation-file $GTF #GTF or GFF3 format
--enable-map-align true #required for RNA/scRNA
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# Single Cell PIPseq
--scrna-enable-pipseq-mode true
--single-cell-threshold ratio #['fixed', 'ratio', inflection'] Input Directory
The run folder or FASTQ folder that contains files to analyze.
Input Type
Select input type of analysis will perform on. Options to select include bcl, fastq, bam and cram
Sample or Pair IDs
Optional subset of Sample IDs or Pair IDs to analyze.
Reference Genome
Select the reference genome. hs37d5_chr is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.
Enable Ora Compression
Enable Ora Compression (True or False). Only applicable when Input Type is bcl
Enable Post Processing
Enable Post Processing (True or False) to run custom scripts at the end of pipeline
Storage Size
The storage size to allocate for the analysis. The default and recommended value is Large.
Custom Parameters Config File
Optional. Select Custom Parameters Config File that override default config
Custom Resources Directory
Optional. Select Custom Resources Directory to use with Custom Parameters Config File
CAUTION - This parameter ...
Optional. Those configuration with this comment is only applies to auto-launch DRAGEN Solid WGS Tumor Normal analysis from FASTQs after BCL. Please don't set it if start analysis from ICA UI
Input Directory
The run folder or FASTQ folder that contains files to analyze.
Input Type
Select input type of analysis will perform on. Options to select include bcl, fastq, bam and cram
Sample or Pair IDs
Optional subset of Sample IDs or Pair IDs to analyze.
Reference Genome
Select the reference genome. hs37d5_chr is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.
Enable Ora Compression
Enable Ora Compression (True or False). Only applicable when Input Type is bcl
Enable Post Processing
Enable Post Processing (True or False) to run custom scripts at the end of pipeline
Storage Size
The storage size to allocate for the analysis. The default and recommended value is Large.
Custom Parameters Config File
Optional. Select Custom Parameters Config File that override default config
Custom Resources Directory
Optional. Select Custom Resources Directory to use with Custom Parameters Config File
CAUTION - This parameter ...
Optional. Those configuration with this comment is only applies to auto-launch DRAGEN Heme WGS Tumor Only analysis from FASTQs after BCL. Please don't set it if start analysis from ICA UI
Calibrates read base qualities to account for background noise.
Computes read likelihoods for each read/haplotype pair.
Performs mutation calling by summing the genotype probabilities across all reads/haplotype pairs.
Performs additional filtering to improve variant calling accuracy, including using a systematic noise file. The systematic noise file indicates the statistical probability of noise at specific positions in the genome. This noise file is constructed using clean (normal) samples. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.

After installing a multi-version package, see a list of installed versions at any time by running /usr/bin/dragen_versions
To remove any multi-version package, call yum remove on its Path
Adding PATH="/opt/dragen/{version}/bin:$PATH" to the last line of .bashrc file avoids the need to set the path upon each server login
Asserts that the alignments exactly match the expected results
4.3 and later
/opt/dragen/{version}
/opt/edico/
4.2 and earlier
/opt/edico/
/opt/edico/
Enable postprocessing, Set it to 'true'
Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above
Add 'Custom Resources Directory', set it to the custom-resource directory above.
postProcessing_container
Docker Container URI , Must be present/uploaded to ICA
postProcessing_cpusMemoryConfig
Compute Option to Use, allowed values given below
postProcessing_shellScript
File name of shell-script
single_threaded_low_mem (default)
CPUs: 2, Mem(GB): 8
single_threaded_medium_mem
CPUs: 4, Mem(GB): 16
single_threaded_high_mem
CPUs: 8, Mem(GB): 32
multi_threaded_low_mem
CPUs: 16, Mem(GB): 64
multi_threaded_medium_mem
CPUs: 32, Mem(GB): 128
multi_threaded_high_mem
CPUs: 64, Mem(GB): 128
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--scrna-enable-pipseq-mode
Option to enable PIPseq mode.
--scrna-barcode-position
See example above or refer to scRNA PIPseq
--scrna-umi-position
See example above or refer to scRNA PIPseq
--single-cell-threshold
Cell filtering can be set to ['fixed', 'ratio', or 'inflection'].
--scrna-barcode-sequence-list
A known barcode sequence list can be optionally provided.
--umi-source
Optionally override the default barcode/BI source, valid option inclde ['read1', 'read2', 'qname', 'fastq'].
Panel Code
Sample Type
Default variant caller enabled
Command Line Options
oncoReveal Heme
Heme
P-HFU-01
RNA
RNA fusion
--amplicon-enable-rna-heme
oncoReveal Fusion LBx
Fusion LBx
P-LBX-03
cfRNA
RNA fusion, RNA splice-variant
--amplicon-enable-cfrna-lbxfusion
oncoReveal Multi-Cancer RNA Fusion v2
Multi-Cancer with Fusion
SF-V2
RNA
RNA fusion, RNA splice-variant, RNA 3'/5' imbalance-ratio
--amplicon-enable-rna-multicancer
For more detail on the amplicon pipeline, please refer to DRAGEN Amplicon Pipeline
For DRAGEN RNA amplicon runs, it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--enable-duplicate-marking false
The Amplicon Pipeline disables duplicate marking. In amplicon assays, fragments originate from a limited number of unique start and end positions, making conventional duplicate detection inappropriate.
--rna-gf-enriched-regions $PATH
Fusion calling parameters are automatically set in RNA amplicon mode but can be overridden in the command line. If fusion targets are not listed in the amplicon BED file, users can explicitly set it to a file containing fusion gene IDs or symbols.
Panel Name
Short Name
$ dragen_versions
The output format of this command may change. Use --json for machine readable output.
Dragen Version Size (MB) Install Date Path
4.3.2 1378.03 2024-03-10 18:26:17 /opt/dragen/4.3.2
4.4.3 1381.41 2024-03-18 20:56:39 /opt/dragen/4.4.3
4.3.5 1379.25 2024-03-11 15:20:24 /opt/dragen/4.3.5
Bitstream Version Size (MB) Install Date Path
07.031.732 (0x18101306) 598.95 2024-03-10 18:26:03 /opt/bitstream/07.031.732
07.031.745 (0x18101306) 598.95 2024-03-18 20:56:18 /opt/bitstream/07.031.745
To remove a dragen version, call `yum remove` on its Path.$ /opt/dragen/4.3.4/self_test/self_test.sh
#############################################################
Logging to /var/log/dragen/self_test.1714627157_160164.0.details.log
Using dragen executables in /opt/dragen/4.3.4/bin
Using board(s): 0
#############################################################
Running tests for board 0 (u200)
Using scratch directory /tmp/self_test.4BO0pfPST9/0
-------------------------------------------------------------
Board 0 test 1, FPGA MEMORY TEST
Loading DIAG bitstream
Running fpga memory test, this will take ~13 minutes
Board 0 test 1, FPGA MEMORY TEST: PASS
-------------------------------------------------------------
Board 0 test 2, BAR REGISTER ACCESS
Board 0 test 2, BAR REGISTER ACCESS: PASS
-------------------------------------------------------------
Board 0 test 3, FPGA TEMP REG ACCESS
FPGA Temperature: 27C (Max Temp: 36C, Min Temp: 22C)
Board 0 test 3, FPGA TEMP REG ACCESS: PASS
-------------------------------------------------------------
Board 0 test 4, BOARD SERIAL # REG ACCESS
Serial Number: 2130069BM05V
Board 0 test 4, BOARD SERIAL # REG ACCESS: PASS
-------------------------------------------------------------
Board 0 test 5, DRAGEN GENOME LICENSE
Board 0 test 5, DRAGEN GENOME LICENSE: PASS
-------------------------------------------------------------
Board 0 test 6, CPLD DATE TEST
cpld date is n/a
Board 0 test 6, CPLD DATE TEST: PASS
-------------------------------------------------------------
Board 0 test 7, ENCRYPTION KEY EXISTENCE TEST
Board 0 test 7, ENCRYPTION KEY EXISTENCE TEST: PASS
-------------------------------------------------------------
Board 0 test 8, PARTIAL RECONFIGURATION
DNA-MAPPER: ok
RNA-MAPPER: ok
HMM: ok
ZIP: ok
UNZIP: ok
DIAG: ok
Board 0 test 8, PARTIAL RECONFIGURATION: PASS
-------------------------------------------------------------
Board 0 test 9, HASH TABLE GENERATION
Board 0 test 9, HASH TABLE GENERATION: PASS
-------------------------------------------------------------
Board 0 test 10, MAP AND ALIGNER
running mapper aligner: ok
unmapped input records percentages: ok
md5sum check dbam sorted: pass
Board 0 test 10, MAP AND ALIGNER: PASS
-------------------------------------------------------------
Board 0 test 11, VARIANT CALLER E2E
running variant caller: ok
md5sum check dbam sorted: ok
md5sum check VCF: ok
Board 0 test 11, VARIANT CALLER E2E: PASS
#############################################################
SELF TEST COMPLETED
SELF TEST RESULT : PASS
#############################################################
Log file at /var/log/dragen/self_test.1714627157_160164.0.details.log
DRAGEN Host Software Version 01.001.035.01.00.30.6682 and
Bio-IT Processor Version 0x1001036DRAGEN finished normallydragen \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 /staging/test/data/SRA056922.fastq \
--output-directory /staging/test/output \
--output-file-prefix SRA056922_dragen \
--RGID DRAGEN_RGID \
--RGSM DRAGEN_RGSM
postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'
#========================================================#
# This is a SAMPLE Script only for illustration purpose #
# Modify it, according to your specific Use Case #
#========================================================#
#must create this folder to save output files
mkdir -p "${params.postProcessing.stepName}"
cd "${params.postProcessing.stepName}"
#BAMs are located in 'analysis/results' folder
resultsdir="${params.analysisDir}/Results"
#this file must be uploaded to custom-resources-dir
genomefa="${params.customResourceDir}/genome.fa"
sleep_interval=30 # seconds
max_attempts=3
#set sample ids
sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")
for sample_id in "\${sample_ids[@]}"; do
counter=0
while : ; do
if [ "\$counter" -eq "\$max_attempts" ]; then
echo "WARNING! \${sample_id}.bam was NOT found!"
break
fi
counter=\$((counter + 1))
bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
if [ -z "\$bam_file" ]; then
echo "Attempt \$counter : Waiting for \${sample_id}.bam"
sleep \$sleep_interval
else
#process and break
filename=\$(basename -s .bam \$bam_file)
samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
break
fi
done
done
exit 0--fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# RNA amplicon
--enable-rna-amplicon true
--amplicon-target-bed $PATH
# Mapper
--enable-rna true
--annotation-file $GTF #GTF or GFF3 format
--enable-map-align true #required for RNA/scRNA
--enable-map-align-output true #optionally save the output BAM
# RNA Splice Variants
--enable-rna-splice-variant true
# RNA Gene Fusions
--enable-rna-gene-fusion true
--rna-gf-enriched-regions $PATH #see 'RNA Fusion' auto-generated from amplicon target bed
# RNA 3'/5' imbalance-ratio #optional for panels that support 3'/5' imbalance-ratio
--amplicon-enable-imbalance-ratio true --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH View the analysis output results in either BaseSpace Sequence Hub or ICA.
- 1–70 characters.
- Alpha numeric characters with underscores, No dashes and spaces. If you enter an underscore, dash, or space, enter an alphanumeric character before and after.
Cloud_Heme_Settings
SoftwareVersion
The Heme software version
No
StartsFromFastq
Set the value to TRUE or FALSE. If autolaunching from BCL files, this must be set to FALSE.
Yes
Cloud_Data
Sample_ID
The same sample ID used in the Cloud_Heme_Data section.
No
ProjectName
The BaseSpace Sequence Hub project name.
No
LibraryName
Combination of sample ID and index values in the No following format: sampleID_Index_Index2.
No
LibraryPrepKitName
The Library Prep Kit used.
No
IndexAdapterKitName
The Index Adapter Kit used.
No
Cloud_Settings
GeneratedVersion
The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.
No
CloudWorkflow
ica_workflow_1
Yes
Cloud_Heme_Pipeline
This value is a universal record number (URN). The valid value is defined in
Yes
Secondary Analysis
BaseSpace Sequence Hub / Illumina Connected Analytics
Application
DRAGEN Heme App for Whole-genome Sequencing
Cloud_Heme_Data
Sample_ID
The unique ID to identify a sample. Must match a Sample_ID used in the Heme_Data section.
Yes
Sample_Type
Sample type.
No
Sample_Description
Must meet the following requirements:

No
View the analysis output results in either BaseSpace Sequence Hub or ICA.
- 1–70 characters.
- Alpha numeric characters with underscores, No dashes and spaces. If you enter an underscore, dash, or space, enter an alphanumeric character before and after.
Cloud_TN_Settings
SoftwareVersion
The software version
No
StartsFromFastq
Set the value to TRUE or FALSE. If autolaunching from BCL files, this must be set to FALSE.
Yes
Cloud_Data
Sample_ID
The same sample ID used in the Cloud_TN_Data section.
No
ProjectName
The BaseSpace Sequence Hub project name.
No
LibraryName
Combination of sample ID and index values in the No following format: sampleID_Index_Index2.
No
LibraryPrepKitName
The Library Prep Kit used.
No
IndexAdapterKitName
The Index Adapter Kit used.
No
Cloud_Settings
GeneratedVersion
The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.
No
CloudWorkflow
ica_workflow_1
Yes
Cloud_TN_Pipeline
This value is a universal record number (URN). The valid value is defined in
Yes
Secondary Analysis
BaseSpace Sequence Hub / Illumina Connected Analytics
Application
DRAGEN App for Whole-genome Sequencing
Cloud_TN_Data
Sample_ID
The unique ID to identify a sample. Must match a Sample_ID used in the TN_Data section.
Yes
Sample_Type
Sample type.
No
Sample_Description
Must meet the following requirements:

No
This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed.
📂 Results - Contains the final result files from the pipeline.
📄 MetricsOutput.tsv - Contains summary metrics for all samples.
📂 Sample1
📄 Sample1_MetricsOutput.tsv—Contains summary metrics for the specific sample.
📄 Sample1.tumor.baf.bedgraph.gz —Contains the BED graph representation of the B-allele frequency (if available).
📄 Sample1.sv.small_indel_dedup.filtered.vcf.gz — Contains DNA structural variants excluding the indels already present in the hard-filtered.vcf file after applying the DragenSvExtraFilters.
📄 Sample1.hard-filtered.vcf.gz—Contains small variants VCF.
📄 Sample1.cnv.vcf.gz —Contains copy number variants VCF.
📂 Logs_Intermediates - Contains all intermediate files for each step of the pipeline.
📂 SampleSheetValidation
📂 ResourceVerification
📂 RunQc(only when started from BCLs)
📂 work - Contains Nextflow execution details for debugging purpose.
📂 errors - Contains an Errors.tsv file if any pipipeline analysis step failed.
📄 SampleSheet.csv - User input sample sheet as provided.
📄 pipeline_trace.txt - Contains Nextflow pipeline step execution status.
📄 timeline_${timestamp}.html - Contains Nextflow pipeline task timeline information.
📄 report_${timestamp}.html - Contains Nextflow pipeline task execution details.
📄 receipt - Contains pipeline analysis CLI parameters and execution environment information.
📄 payload.json - Contains pipeline analysis setup parameters and execution environment information.
📄 nextflow.log - Contains Nextflow pipeline execution log.
📄 analysis.log - Contains Nextflow pipeline standard output.
This section describes the summary output files generated during analysis.
File name: MetricsOutput.tsv
The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each sample.
Run metrics from the analysis module indicate the quality of the sequencing run. Review the following metrics to assess run data quality:
PCT_Q30_R1
Percentage of bases with a quality score ≥ 30 from Read 1.
≥ 80.0 (≥85.0 for NovaSeq X Plus)
PCT_Q30_R2
Percentage of bases with a quality score ≥ 30 from Read 2.
≥ 80.0 (≥85.0 for NovaSeq X Plus)
The values in the Run Metrics section are listed as NA in the following situations:
The analysis was started from FASTQ files.
The analysis was started from BCL files, and the InterOp files are missing or corrupt.
Review the following metrics to assess sample data quality:
TUMOR_ESTIMATED_SAMPLE_CONTAMINATION (NA)
NA
The estimated fraction of reads in a sample that may be from another human source
TUMOR_MAPPED_READS_PCT (%)
NA
Percent of mapped reads in the tumor sample
TUMOR_INSERT_LENGTH_MEDIAN (count)
NA
Median insert length of tumor sample
TUMOR_Q30_BASES_EXCL_DUPS_AND_CLIPPED_BASES (bp)
NA
User Reference
The custom name of the analysis for later identification.
Yes
Empty
User Tags
Tags for the analysis to help with categorization and identificaion, enhancing organization and searchability.
No
Empty
Notification
Add a user to be notified when the analysis completes.
No
Samplesheet
The SampleSheet.csv for the analysis
Yes
SampleSheet.csv in Input Folder
Input Directory
The input folder that contains [bcl, fastq, bam, cram] files to analyze. Multiple input [fastq, bam, cram] folders can be specified.
Yes
No folder selected
Custom Parameters Config File
The custom parameters config file for the analysis.
No
Input Type
The type of files in the Input Folder(s): bcl, fastq, bam, cram.
Yes
bcl
Reference Genome
The reference genome used for the analysis: [hs37d5_chr, hg38].
Yes
hg38
Enable Ora Compression
Compress fastq files using ora compression. [Only applies when Input Type is bcl].
No
Storage Size
The storage size to allocate for the analysis. The minimum required value is Large.
Yes
Large
Other available options for the storage size:
"1.2TB" if option selected is Small
"2.4TB" if option selected is Medium
"7.2TB" if option selected is Large
"16TB" if option selected is XLarge
"32TB" if option selected is 2XLarge
"64TB" if option selected is 3XLarge
Note It is recommended to reserve storage size twice the size of the BCL run folder, or the input fastq.gz or bam files, four times the size of the cram file (cram is 30-70% of the bam), and 8 times the size of the fastq.ora (fastq.ora is about 25% of fastq.gz).
Customer may use the icav2 client to launch analysis from the CLI. The specific parameters supported may be obtained from the Project Pipeline details under the XML configuration tab.
For information about using pipelines, refer to the Illumina Connected Analytics documentation
A sample sheet is required for each analysis with the pipeline. A sample sheet is a comma-separated value (*.csv) file format used by Illumina instruments, platforms, and analysis pipelines to store settings and data for sequencing and analysis. The pipeline is compatible with the sample sheet v2. For general information on the sample sheet v2, refer to Illumina Connected Software Sample Sheet.
A full sample sheet includes multiple sections, including a [BCLConvert_Settings] section with a list of samples and their index sequences, along with additional information required to run the pipeline in the [{app}_Data] section. For example, the Library Prep Kit is a required field in the sample sheet for the DRAGEN Heme WGS Tumor Only Pipeline. Both Illumina library prep kits or custom library prep kits are supported.
On the other hand, the DRAGEN Solid WTS Tumor Normal Pipeline may only required a minimal sample sheet with only [Header] section and a [TN_Data] section when starting the analysis from FASTQ. This partial sample sheet is not valid when starting analysis from a run folder.
When running analysis on a standalone DRAGEN server or ICA, a valid sample sheet can be created by:
BaseSpace Run Planner (preferred), see for details.
Downloading and modifying a sample sheet template following the requirements, see for details.
When running analysis on a standalone DRAGEN server or on ICA, a minimal sample sheet for starting from FASTQ, BAM or CRAM can be created by:
Modify a sample sheet template following the requirements, see product specific templates for more information.
Note: A minimal sample sheet may be invalid for other purposes. It is always advisable to use a valid sample sheet generated from the BaseSpace Run Planner.
The Run Planning section of this guide is available for specific instructions to plan a run and set up a valid sample sheet for the pipeline when supported.
With v2 sample sheet, and DRAGEN 4.4+, it is now required for users to specify index2 orientation in forward orientation only. For additional information, see .
As indicated in the following Table, the index2 orientation is always Forward orientation for simplicity. The two new flags introduced are especially useful when custom LPKs are used and when a consistent index2 orientation is desired for all run folders. The IndexOrientation field is present from BaseSpace run planner generated sample sheet, and indicates that the sample sheet index2/i5 sequences are in Forward orientation.
Bcl-convert SoftwareVersion must be >=4.4.
* indicates the situation where the IsReverseComplement flag in the RunInfo.xml is overriden by the RunInfoIndex2ReverseComplement value. NA means that IsReverseComplement flag for the index2 is not present in the RunInfo.xml file.
** indicates that legacy run folders may use the two paired flags to ensure that index2 Forward orientation is consistently applied.
For backward compatibility, when the bcl-convert version specified is less than 4.4, the index2 orientation may vary depending on the instrument. In BaseSpaces run planner generated sample sheet, the IndexOrientation may still indicate Forward, but it is ignored in this situation.
Bcl-convert SoftwareVersion must be <4.4.
*indicates the situation where the IsReverseComplement flag in the RunInfo.xml is different depending on the control software version.
A separate lightweight downloader for Windows, macOS, and Linux operating systems is available at the DRAGEN Installer Download Site.
Choose the downloader appropriate for your platform, when executed it will prompt you to provide a path to download the assets to. The required software packages will be downloaded into the dragen_pipelines directory under the path provided at the prompt. If the path provided was used for a previous execution of the downloader, any incomplete downloads will be resumed, existing files will be checksummed, and any files with invalid checksums will be re-downloaded.
The downloaded directory content may be moved to the installation target DRAGEN server using a USB key with at least 128 GB of free space or by copying to Network Storage which is reachable from the target DRAGEN Server.
Additional download information is available at the download site.
📂 dragen_pipelines
dragen-app-manager-1.0.14-1.x86_64-el8-offline.run
README
📂 Solid_WGS_TN_4.4.4.53
DRAGEN and DRAGEN Application Manager
The pipeline requires DRAGEN v4.4.4 or higher. If upon installation of the app this version of DRAGEN (or higher) is not installed, the software shall install this version of DRAGEN.
The pipeline also requires DRAGEN Application Manager to be installed, and an installer is included. DRAGEN Application Manager configuration is controlled by the config.toml file located in /etc/dragen-app-manager directory. See for additional information.
Minimum System Operating Requirements
Hardware
v3 DRAGEN server or v4 DRAGEN server
mkfifo is enabled on the network-attached storage (NAS).
Software
The software installed by default on the DRAGEN server includes the following items:
DRAGEN server software. Refer to sample sheet settings for the DRAGEN version number.
Oracle Linux 8
Storage
DRAGEN server v3 provides a 6.4 TB NVMe SSD. This SSD is located at the /staging directory and is suitable for storing only one or two runs of the analysis pipeline.
DRAGEN server v4 provides 12.8 TB via a 2 x 6.4 TB NVMe U.2 SSD configuration.
Consider the following when making data storage decisions.
A NovaSeq 6000 sequencing run that uses an S4 flow cell can produce up to 3 TB of output. ▫ The pipeline can produce an additional 4-6 TB of analysis output. For optimal performance when writing to a non-default directory, specify an analysis folder location on /staging, this ensures that the DRAGEN-related processes read and write data to the DRAGEN Server's high-speed NVMe SSD.
Installing the pipeline requires root privileges.
Contact Illumina Customer Care to request a link to the Downloader or visit and confirm that the Genome DRAGEN license is enabled for your server.
Follow the instructions for DRAGEN license installation provided by Illumina Customer Care or refer to the DRAGEN server documentation.
Copy the directory structure from the downloader directory to the target DRAGEN server (or a path accessible with sudo privileges)
The self-test script, present after app installation, checks the following functions:
All required services are running.
All resources are in place.
The analysis workflow image can be launched.
The pipeline can run successfully on a test dataset.
To run the self-test script, execute:
If the self-test prints a failure message, contact Illumina Technical Support, and provide the output file found in /staging/check_Solid_WGS_TN_{version}_{datetimestamp}.tgz.
When running an analysis on the DRAGEN server via SSH, Illumina recommends that you use a terminal multiplexer utility, which allows you to resume analysis in the event of a disconnection from the DRAGEN server.
To uninstall the pipeline, run the following command as the root user (or with sudo privileges):
Executing the uninstall script removes the following assets:
All scripts, including:
run_Solid_WGS_TN_{version}.sh
check_Solid_WGS_TN_{version}.sh
uninstall_Solid_WGS_TN_{version}.sh
If the uninstall script is executed with the -r or --removeResources flag, dependencies of the application being uninstalled will be removed if no other applications depend on them.
You are not required to uninstall DRAGEN Application Manager, Docker, or the DRAGEN server software.
To remove Docker, review the install instructions for your operating system in the Docker documentation
This document describes how to use the Custom Configuration Support feature for the pipeline software. This feature allows users to customize a specific set of DRAGEN command-line options to override the default values pre-defined in the pipeline.
customConfig and customResourceDirUsers can customize pipeline behavior and file inputs using:
--customConfig : path to a custom configuration file listing customized parameter values.
--customResourceDir : path to a directory containing custom resource files.
Both options should be used together if file-based overrides are required.
For file parameters (parameters that require a file), users must specify relative paths in the customConfig file. The software will join customResourceDir and the relative path to form the full file path.
Additionally, the value assigned to a file parameter must be enclosed in single quotes ('').
heme_custom_param.config Contentcustom_resources_Heme_dir Folder StructurecustomConfig Template (with default value)ℹ️ Note: For CRAM Input Reference Genome, a list of commonly-used human reference FASTA files can be downloaded from the Illumina support site:
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
To manually launch an analysis, configure a pipeline Analysis Software pipeline analysis run in ICA with the following parameters.
Other available options for the storage size:
"1.2TB" if option selected is Small
"2.4TB" if option selected is Medium
"7.2TB" if option selected is Large
"16TB" if option selected is XLarge
Note: It is recommended to reserve storage size twice the size of the BCL run folder, or the input fastq.gz or bam files, four times the size of the cram file (cram is 30-70% of the bam), and 8 times the size of the fastq.ora (fastq.ora is about 25% of fastq.gz).
For information about using pipelines, refer to the Illumina Connected Analytics documentation.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN RNA/scRNA runs, it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
To enable RNA amplicon, set:
--enable-rna-amplicon true, and
--amplicon-target-bed $PATH.
If RNA amplicon mode is enabled and the amplicon bed file already includes the gene name, then it is not required to set the ENRICH options option, since DRAGEN will read the enriched genes names from the amplicon BED file (fifth column).
The BaseSpace Sequence Hub Run Planning tool is available, and is used to generate a valid sample sheet in v2 format for use on a supported sequencer for both ICA and Standalone DRAGEN Server analysis options. Filling out the form on the user interface will produce a exportable sample sheet with the required fields filled in. Refer to for descriptions of fields that appear in ICA sample sheets.
The sections below represent each step in the BaseSpace Run Planning tool.
Note that NovaSeq X Series has a different run set up configuration screen than other instrument platforms. The software supports multi analysis, and in order to complete run setup on NovaSeq X Series, enter the appropriate Read 1, Read 2, Index 1 and Index 2 described in the instructions below.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-rna true
--annotation-file $GTF #GTF or GFF3 format
--enable-map-align true #required for RNA/scRNA
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# RNA Quantification
--enable-rna-quantification true
--rna-library-type A #see 'RNA Quant'
--rna-quantification-gc-bias true
# RNA Splice Variants
--enable-rna-splice-variant true
--rna-splice-variant-regions $PATH
# RNA Gene Fusions
--enable-rna-gene-fusion true
--rna-gf-enriched-regions $PATH #see 'RNA Fusion' 📂 FastqGeneration (only when started from BCLs)
📂 FastqValidation
📂 DragenCaller
📂 AdditionalSarjMetrics
📂 SampleAnalysisResults
📂 MetricsOutput
📂 DragenSvExtraFilters
📄 passing_sample_steps.json
Bases with a Phred quality score of 30 or higher excluding uplicated reads and clipped bases
AVERAGE_AUTOSOMAL_COVERAGE_OVER_GENOME (count)
NA
Average coverage or sequencing depth across the autosomes (chromosomes 1-22)
GC_NORMALIZED_COVERAGE_AT_GCS_20_39 (count)
NA
Normalized sequencing coverage in genomic regioins with GC content between 20% and 39%
GC_NORMALIZED_COVERAGE_AT_GCS_60_79 (count)
NA
Normalized sequencing coverage in genomic regioins with GC content between 60% and 79%
No user selected
Output Folder
The path to the analysis output folder.
No
Project output folder
No file selected
Custom Resources Directory
The custom resoruces directory used for the analysis.
No
No folder selected
true
Enable Post Processing
Use the post-processing scripts at the end of the pipeline analysis.
No
false
Sample IDs
Optional subset of Sample IDs or Pair IDs to analyze. A comma-separated list.
No
Empty

Forward
Y**
N**
NA
Forward
When SbsConsumableVersion >=3
NovaSeq 6000Dx
Forward
Y
N
Y
Forward
When non-SP flow cell is used
Forward
Y
N
N*
Forward
When SP flow cell is used and control software is <2.4
NovaSeq X
Forward
Y
N
N
Forward
When non-SP flow cell is used
Y*
Forward
When SP flow cell is used and control software is >2.4
N*
Reverse
When SP flow cell is used and control software is <2.4
NovaSeq X
Y
Forward
SoftwareVersion
Required
if SoftwareVersion >=4.4, index2 orientation must be forward; Otherwise, legacy behavior is supported
RunInfoIndex2ReverseComplement
Optional
Allowed values Y/N. if SoftwareVersion >=4.4; paired presence required with Index2ColumnReverseComplement. This value overrides the RunInfo.xml isReverseComplement = Y/N flag for index2 orientation in case of conflict.
Index2ColumnReverseComplement
Optional
Allowed Values Y/N. If softwareVersion >=4.4; paired presence required with RunInfoIndex2ReverseComplement. This value indicates whether the index2 column sequence is reverse complement or not.
NovaSeq 6000
Forward
N**
N**
NA
Forward
When SbsConsumableVersion <3
NovaSeq 6000
NA
Forward
When SbsConsumableVersion <3
NA
Reverse
When SbsConsumableVersion >=3
NovaSeq 6000Dx
Y
Forward
Output Folder
The path to the analysis output folder.
No
Project output folder
Custom Resources Directory
The custom resoruces directory used for the analysis.
No
No folder selected
Enable Post Processing
Use the post-processing scripts at the end of the pipeline analysis.
No
false
Sample IDs
Optional subset of Sample IDs or Pair IDs to analyze. A comma-separated list.
No
Empty
"64TB" if option selected is 3XLarge
User Reference
The custom name of the analysis for later identification.
Yes
Empty
User Tags
Tags for the analysis to help with categorization and identificaion, enhancing organization and searchability.
No
Empty
Notification
Add a user to be notified when the analysis completes.
No
Samplesheet
The SampleSheet.csv for the analysis
Yes
SampleSheet.csv in Input Folder
Input Directory
The input folder that contains [bcl, fastq, bam, cram] files to analyze. Multiple input [fastq, bam, cram] folders can be specified.
Yes
No folder selected
Custom Parameters Config File
The custom parameters config file for the analysis.
No
Input Type
The type of files in the Input Folder(s): bcl, fastq, bam, cram.
Yes
bcl
Reference Genome
The reference genome used for the analysis: [hs37d5_chr, hg38].
Yes
hg38
Enable Ora Compression
Compress fastq files using ora compression. [Only applies when Input Type is bcl].
No
Storage Size
The storage size to allocate for the analysis. The minimum required value is Large.
Yes
Large
No user selected
No file selected
true
install_Solid_WGS_TN_v4.4.4.53.run
Solid_WGS_TN_4.4.4.53.iapp
README
📂 common
solid-wgs-tn-resources_4.4.4.2.ires
dpf-core_1.0.0.36.ires
dpf-templates_4.4.4.52.ires
dpf-docker-images_4.4.4.52.ires
dragen-4.4.4-12.multi.el8.x86_64.run
hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11-r5.0-1.ires
hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires
hs37d5_chr-cnv.graph.hla.methyl_cg.rna-11-r5.0-1.ires
hs37d5_chr-cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires
variant_annotation_data-tmb_annotations-4.4.4-1.ires
Network-attached storage is required for long-term storage of sequencing runs and pipeline output.
Managing data storage is your responsibility.
Illumina recommends developing a strategy to copy data from the DRAGEN server to network-attached storage.
Delete output data on the DRAGEN server as soon as possible. For additional information on data output and storage, refer to Illumina Instrument Control Computer Security and Networking.
Ensure the installer has the correct privileges by running chmod +x install_Solid_WGS_TN_v{version}.run
Launch the installer with root privileges sudo /path/to/install_Solid_WGS_TN_v{version}.run
If DRAGEN Application Manager is not already installed, the installer will exit and direct you to the path to the DRAGEN Application Manager installer
The application installed under DRAGEN Application Manager
Solid_WGS_TN_{version}_Downloader_unix
x86_64 platform with glibc 2.25+
Solid_WGS_TN_{version}_Downloader_mac
arm64 macOS
Solid_WGS_TN_{version}_Downloader_windows.exe
64-bit Windows 10+
included
Yes
CRAM Input Reference Genome
cram_reference
Mapper
file
included
Yes
Aligner Clip Paired End Reads Overhang
aligner_clip_pe_overhang
Mapper
0,1,2
0
Yes
Enable Map Align
enable_map_align
Mapper
true / false
true
Yes
SV Somatic Hotspot BED File
sv_somatic_ins_tandup_hotspot_regions_bed
Structural VC
file
included
Yes
SV Systematic Noise File
sv_systematic_noise
Structural VC
file
included
Yes
Output SNV Evidence BAM
vc_output_evidence_bam
Debug
true / false
false
Yes
QC Detect Contamination
qc_detect_contamination
QC
true / false
true
Yes
VC Systematic Noise File
vc_systematic_noise
Variant Caller
file
included
Yes
VC Somatic Hotspots File
vc_somatic_hotspots
Variant Caller
file
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
--vc-target-bed $PATH
Restrict the variants called to a target bed. For WTS, a bed file specifying the gene-coding regions should be provided to avoid calling erroneous variants in non-coding regions due to noisy reads.
--rna-library-type
Set the library according to the read orientations. Set to 'A' to auto detect the correct read orientation. Alternatively select 'IU', 'ISR', 'ISF', 'U', 'SR', or 'SF'.
--rna-splice-variant-normals $PATH
Optional setting list of normal splice variants that will be used filter false positive calls. The file should be a tab separated file with the following first four columns: (1) contig name, (2) first base of the splice junction (1-based), (3) last base of the splice junction (1-based), (4) strand (0: undefined, 1: +, 2: -).
--rna-splice-variant-regions $PATH
Target region bed file. Required for panels. The name of the region must be specified in the fourth column.
--rna-gf-enriched-regions $PATH
For panels, the list of enriched genes should be set, either as a list of genes or a list of regions in BED format.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
For more information, see CNV Calling.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
Step 1. Generate CNV target counts of individual samples from the sequencing run.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
/usr/local/bin/check_Solid_WGS_TN_{version}.sh/usr/local/bin/uninstall_Solid_WGS_TN_{version}..shrun_Heme_WGS_TO_{version}.sh \
--inputType bcl \
--inputFolder /heme_input_bcl \
--customConfig /path/heme_custom_param.config \
--customResourceDir custom_resources_Heme_dir# custom parameters
vc_output_evidence_bam = false
qc_detect_contamination = true
aligner_clip_pe_overhang = 0
# custom reference files
vc_systematic_noise = '/snv/WGS_hg38_v1.0_systematic_noise.snv.bed.gz'
sv_systematic_noise = '/sv/WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz'
vc_somatic_hotspots = '/snv/somatic_hotspots_GRCh38.vcf.gz'custom_resources_Heme/
├── snv
│ ├── WGS_hg38_v1.0_systematic_noise.snv.bed.gz
│ └── somatic_hotspots_GRCh38.vcf.gz
└── sv
└── WGS_FF_Heme_hg38_v1.0_systematic_noise.sv.bedpe.gz#vc_systematic_noise = ''
#enable_map_align = true
#sv_systematic_noise = ''
#vc_output_evidence_bam = false
#qc_detect_contamination = true
#vc_somatic_hotspots = ''
#sv_somatic_ins_tandup_hotspot_regions_bed = ''
#cram_reference = ''
#aligner_clip_pe_overhang = 0--fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON. See 'In-run PON' section below.
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST Run Name
Required
Run Name can contain 255 alphanumeric characters, dashes, underscores, periods, and spaces; and must start with an alphanumeric, a dash or an underscore.
Run Description
Optional
Run Description can contain 255 characters except square brackets, asterisks, and commas.
Instrument Platform
Required
Choose from supported instruments:
NovaSeq 6000/6000Dx
NovaSeq X Series
Secondary Analysis
Required
Note: On NovaSeq X Series, this page is called "Configuration 1". The right hand corner of the UI displays the Read 1, Read 2, Index 1 and Index 2 entered on the previous run settings screen.
Application*
Required
the pipeline name
Description
Optional
Optional text field
Library Prep Kit
Required
- Illumina DNA Prep Kit (IDP)
Required
Users can manually enter sample information, or download a template file to bulk upload sample information. Users can import the completed template or a compatible sample sheet.
Read Lengths: Read 1 and Read 2
Required Not applicable on NovaSeq X Series
Auto filled with the standard values, but can be optionally overwritten.
Override Cycles
Required on NovaSeq X Series
Entered based on Run Settings read lengths & index 1 / index 2
Lane Usage
Not applicable on NovaSeq X Series or NextSeq 1000 / 2000
Checkbox allows users to apply the same lane across samples.
Lane
Required if Lane Usage is unchecked Not applicable on NextSeq 1000 / 2000
Once all details are captured and pass validation, the user can review the details on the Run Review screen. From here they can choose to edit details in previous screens or export the sample sheet. Once completed, press the Cancel button to finish run planning.
Note: once leaving this screen, the run and sample sheet will not be accessible.
For NovaSeqX Plus users, the run can be saved as a draft or as a planned run (via “Save as Draft” and “Save as Planned” buttons respectively). Either selection will save the run to the Planned Runs screen on BaseSpace. There is no option to export the sample sheet on this screen.
The Planned Runs screen lists all planned or drafted runs. Users can set drafted runs to planned, export the sample sheet, and edit or delete a run on this screen.
Once the run is saved as Planned, it will appear on the NovaSeq X Series instrument where it can be selected for sequencing.
For more information on run planning, refer to the BaseSpace Sequence Hub support site page.
Please review these guided examples of TSO 500 analysis workflows that include a step of setting up a run in BaseSpace Run Planning tool:
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
For more information see: 5-Base Pipeline.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
For more information, see CNV Calling.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
For more information see: .
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
For more information, see .
For CNV PON requirements and generation options see .
Step 1. Generate CNV target counts of individual samples from the sequencing run.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
A separate lightweight downloader for Windows, macOS, and Linux operating systems is available at the DRAGEN Installer Download Site.
Choose the downloader appropriate for your platform, when executed it will prompt you to provide a path to download the assets to. The required software packages will be downloaded into the dragen_pipelines directory under the path provided at the prompt. If the path provided was used for a previous execution of the downloader, any incomplete downloads will be resumed, existing files will be checksummed, and any files with invalid checksums will be re-downloaded.
The downloaded directory content may be moved to the installation target DRAGEN server using a USB key with at least 128 GB of free space or by copying to Network Storage which is reachable from the target DRAGEN Server.
Additional download information is available at the download site.
📂 dragen_pipelines
dragen-app-manager-1.0.14-1.x86_64-el8-offline.run
README
📂 Heme_WGS_TO_4.4.4.62
DRAGEN and DRAGEN Application Manager
The pipeline requires DRAGEN v4.4.4 or higher. If upon installation of the app this version of DRAGEN (or higher) is not installed, the software shall install this version of DRAGEN.
The pipeline also requires DRAGEN Application Manager to be installed, and an installer is included. DRAGEN Application Manager configuration is controlled by the config.toml file located in /etc/dragen-app-manager directory. See for additional information.
Minimum System Operating Requirements
Hardware
v3 DRAGEN server or v4 DRAGEN server
mkfifo is enabled on the network-attached storage (NAS).
Software
The software installed by default on the DRAGEN server includes the following items:
DRAGEN server software. Refer to sample sheet settings for the DRAGEN version number.
Oracle Linux 8
Storage
DRAGEN server v3 provides a 6.4 TB NVMe SSD. This SSD is located at the /staging directory and is suitable for storing only one or two runs of the analysis pipeline.
DRAGEN server v4 provides 12.8 TB via a 2 x 6.4 TB NVMe U.2 SSD configuration.
Consider the following when making data storage decisions.
A NovaSeq 6000 sequencing run that uses an S4 flow cell can produce up to 3 TB of output. ▫ The Heme pipeline can produce an additional 4-6 TB of analysis output. For optimal performance when writing to a non-default directory, specify an analysis folder location on /staging, this ensures that the DRAGEN-related processes read and write data to the DRAGEN Server's high-speed NVMe SSD.
Installing the Heme pipeline requires root privileges.
Contact Illumina Customer Care to request a link to the Downloader or visit and confirm that the Genome DRAGEN license is enabled for your server.
Follow the instructions for DRAGEN license installation provided by Illumina Customer Care or refer to the DRAGEN server documentation.
Copy the directory structure from the downloader directory to the target DRAGEN server (or a path accessible with sudo privileges)
The self-test script, present after app installation, checks the following functions:
All required services are running.
All resources are in place.
The analysis workflow image can be launched.
The Heme pipeline can run successfully on a test dataset.
To run the self-test script, execute:
The following output will show if installation is completed successfully.
If the self-test prints a failure message, contact Illumina Technical Support, and provide the output file found in /staging/check_Heme_WGS_TO_{timestamp}.tgz.
When running an analysis on the DRAGEN server via SSH, Illumina recommends that you use a terminal multiplexer utility, which allows you to resume analysis in the event of a disconnection from the DRAGEN server.
To uninstall the Heme pipeline, run the following command:
Executing the uninstall script removes the following assets:
All scripts, including:
run_Heme_WGS_TO_{version}.sh
check_Heme_WGS_TO_{version}.sh
uninstall_Heme_WGS_TO_{version}.sh
If the uninstall script is executed with the -r or --removeResources flag, dependencies of the application being uninstalled will be removed if no other applications depend on them.
You are not required to uninstall DRAGEN Application Manager, Docker, or the DRAGEN server software.
To remove Docker, review the install instructions for your operating system in the Docker documentation
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
For more information see: .
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
For more information, see .
For futher details refer to .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
For more information, see .
For CNV PON requirements and generation options see .
For Targeted Caller PON requirements and generation options see .
CNV and Targeted Caller require separate PON files, but the intermediate counts files can be generated in the same DRAGEN command line invocation. Follow the steps below to generate the CNV and Targeted Caller PON files. Note that Targeted Caller is only supported with the Illumina CS/PGx Custom Enrichment Research Panel.
Step 1. Generate CNV target counts and Targeted exome counts of individual samples from the sequencing run.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
Step 3. Targeted Caller PON file generation.
$TARGETED_PON_COUNTS_LIST is a text file with one line for each path to a Targeted Caller exome counts file generated in step 1 (<output-file-prefix>.targeted.exome.counts.json.gz). Individual exome counts files are merged into a single <output-file-prefix>.targeted.pon.json.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN Targeted Caller using the --targeted-pon option.
A systematic noise file corresponding to one of the pre-built pangenome references can be downloaded from the [DRAGEN Software Support Site page]https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
For more information see: .
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
For more information, see .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
For more information, see .
For futher details refer to .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
For more information see: .
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
The pipeline has fields that are required in addition to general sample sheet requirements. Follow the steps below to create a valid samplesheet.
The following sample sheet requirements describe required and optional fields for the pipeline. Depending on the deployment (standalone DRAGEN server, ICA with auto-launch, ICA with manual launch), certain sections and required values can deviate from the standard requirements. These deviations are noted in the information below.
The analysis fails if the sample sheet requirements are not met.
Use the following steps to create a valid sample sheet.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
# CNV
--enable-cnv true
--cnv-enable-self-normalization true --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON. See 'In-run PON' section below.
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF #optional to enable germline ASCN
--cnv-enable-self-normalization true
# HLA genotyper
--enable-hla true
# Targeted caller
--enable-targeted true
# Star allele
--enable-star-allele true
# PGX
--enable-pgx true #PGX
# Short tandem repeats
--repeat-genotype-enable true
# Multi-Region Joint Detection (MRJD)
--enable-mrjd true
--mrjd-enable-high-sensitivity-mode true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON. See 'In-run PON' section below.
# HLA genotyper
--enable-hla true
# Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel)
--enable-targeted true
--targeted-pon $PATH #Targeted PON. See 'In-run PON' section below.
--targeted-systematic-noise $PATH #Targeted systematic noise file
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-min-supporting-reads 1 #Default=2
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
# CNV
--enable-cnv true
--cnv-enable-self-normalization true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF #optional to enable germline ASCN
--cnv-enable-self-normalization true
# HLA genotyper
--enable-hla true
# Targeted caller
--enable-targeted true
# Star allele
--enable-star-allele true
# PGX
--enable-pgx true #PGX
# Short tandem repeats
--repeat-genotype-enable true
# Multi-Region Joint Detection (MRJD)
--enable-mrjd true
--mrjd-enable-high-sensitivity-mode true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-min-supporting-reads 1 #Default=2
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true BaseSpace/Illumina Connected Analytics (to generate sample sheet for cloud analysis)
Local
Read 1
Required on Instrument Platform NovaSeq X Series
Fill with value 151 or custom values
Index 1
Required on Instrument Platform NovaSeq X Series
Fill the value depending on the Library Prep Kit used: 10
Index 2
Required on Instrument Platform NovaSeq X Series
Fill the value depending on the Library Prep Kit used: 10
Read 2
Required on Instrument Platform NovaSeq X Series
Fill with value 151 or custom values
Sample Container ID
Optional
Unique Identifier for the container that holds the sample
- Illumina DNA PCR Free Prep Kit (IDPFP)
Index Adapter Kit
Optional
- IDT for Illumina DNA/RNA UD Indexes Set A B C D, Tagmentation (both IDP and IDPFP)
Optional
- Illumina DNA/RNA UD Indexes Set A B C D, Tagmentation (IDP)
Specify lanes for each sample. The unmarked checkbox at the top of the dropdown selects all lanes.
Case ID
Optional
The identifier used to pair DNA and RNA samples in a run. The field is mandatory whether a sample is part of a pair, or not.
To note: The Sample ID field in the generated samplesheet will be auto-filled based on the Pair ID values captured. “_dna” and “_rna” (for DNA and RNA samples respectively) will be appended to the Pair ID value to create the Sample ID.
Index ID
Required
Index set ID options are based on selected Index Adapter Kit
Project
Optional
Optional field to describe the associated project
Starts from Fastq
Required
True or False
If auto-launching the pipeline from BCL files, set the value to False. If auto-launching the pipeline from FASTQ after auto-launching BCL Convert, set the value to True.
DNA Barcode Mismatches Index 1**
DNA Barcode Mismatches Index 2**
RNA Barcode Mismatches Index 1**
RNA Barcode Mismatches Index 2**
Required on NovaSeq X
Default value is set to 1.
These fields are required by NovaSeq X and represent BCL Convert settings for index diversity checks when demultiplexing. These values are not used in the pipeline analysis.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
install_Heme_WGS_TO_v4.4.4.62.run
Heme_WGS_TO_4.4.4.62.iapp
README
📂 common
dpf-core_1.0.0.36.ires
dpf-templates_4.4.4.52.ires
dpf-docker-images_4.4.4.52.ires
dragen-4.4.4-12.multi.el8.x86_64.run
heme_wgs_to_resources_4.4.4.2.ires
hg38-alt_masked.cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires
hs37d5_chr-cnv.hla.methyl_cg.methylated_combined.rna-11-r5.0-1.ires
variant_annotation_data-tmb_annotations-4.4.4-1.ires
Network-attached storage is required for long-term storage of sequencing runs and Heme pipeline output.
Managing data storage is your responsibility.
Illumina recommends developing a strategy to copy data from the DRAGEN server to network-attached storage.
Delete output data on the DRAGEN server as soon as possible. For additional information on data output and storage, refer to Illumina Instrument Control Computer Security and Networking.
Ensure the installer has the correct privileges by running chmod +x install_Heme_WGS_TO_v{version}.run
Launch the installer with root privileges sudo /path/to/install_Heme_WGS_TO_v{version}.run
If DRAGEN Application Manager is not already installed, the installer will exit and direct you to the path to the DRAGEN Application Manager installer
The application installed under DRAGEN Application Manager
Heme_WGS_TO_{version}_Downloader_unix
x86_64 platform with glibc 2.25+
Heme_WGS_TO_{version}_Downloader_mac
arm64 macOS
Heme_WGS_TO_{version}_Downloader_windows.exe
64-bit Windows 10+
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--cnv-enable-cyto-output true
Enable Cytogenetics-compatible output (default true), see Cytogenetics Modality. Only available with the Germline ASCN caller.
--cnv-enable-mosaic-calling true
Enable MOSAIC-calling mode (default true). Only available with the Germline ASCN caller.
--enable-mrjd
If set to true, MRJD is enabled for the DRAGEN pipeline.
--mrjd-enable-high-sensitivity-mode
If set to true, MRJD high sensitivity mode is enabled for the DRAGEN pipeline. See the MRJD section in the user guide for information on variant types reported in MRJD default mode and high-sensitivity mode (default=false).
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
/usr/local/bin/check_Heme_WGS_TO_{version}.shChecking system configuration...OK!
Now running a test execution of the pipeline.
This could take up to 15 minutes...
Verifying analysis output.
Successfully validated test analysis results.
SUCCESS!
DRAGEN Heme WGS Tumor Only Pipeline is correctly configured and ready for use./usr/local/bin/uninstall_Heme_WGS_TO_{version}.sh--fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
# Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel)
--targeted-generate-exome-counts true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--targeted-pon-counts-list $TARGETED_PON_COUNTS_LIST --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH --umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--cnv-enable-cyto-output true
Enable Cytogenetics-compatible output (default true), see Cytogenetics Modality. Only available with the Germline ASCN caller.
--cnv-enable-mosaic-calling true
Enable MOSAIC-calling mode (default true). Only available with the Germline ASCN caller.
--enable-mrjd
If set to true, MRJD is enabled for the DRAGEN pipeline.
--mrjd-enable-high-sensitivity-mode
If set to true, MRJD high sensitivity mode is enabled for the DRAGEN pipeline. See the MRJD section in the user guide for information on variant types reported in MRJD default mode and high-sensitivity mode (default=false).
Download the sample sheet v2 template that matches the instrument & assay run.
In the Sequencing Settings section, enter the following required parameters:
LibraryPrepKits
Required
Accepted values are: IlluminaDNAPrep or IlluminaDNAPCRFree
In the BCL Convert Settings section, enter the following required parameters:
SoftwareVersion
Required
The DRAGEN component software version. The pipeline requires 4.4.4 or higher. To ensure you are using the latest compatible version, refer to the software release notes.
AdapterRead1
Required
If using 10 bp indexes with UDP: CTGTCTCTTATACACATCTCCGAGCCCACGAGAC Analysis fails if the incorrect adapter sequences are used
AdapterRead2
Required
If using 10 bp indexes with UDP: CTGTCTCTTATACACATCTGACGCTGCCGACGA Analysis fails if the incorrect adapter sequences are used
AdapterBehavior
Optional
In the BCL Convert Data section, enter the following parameters for each sample.
Sample_ID
Required
Must match a Sample_ID listed in the [Heme_Data] section section.
Index
Required
Index 1 sequence valid for Index_ID assigned to matching Sample_ID in the [Heme_Data] section.
Index2
Required
Index 2 sequence valid for Index_ID assigned to matching Sample_ID in the [Heme_Data] section.
Lane
Only for NovaSeq 6000 XP, NovaSeq 6000Dx, or NovaSeq X workflows
In the [Heme_Data] section, enter the following parameters:
[Heme_Data] Section header changes depending on the deployment: Section header changes depending on the deployment:
Standalone DRAGEN Server and ICA with Manual Launch: Heme_Data
ICA with Auto-launch: Cloud_Heme_Data
Sample_ID
Required
The unique ID to identify a sample. The sample ID is included in the output file names. Sample IDs are not case sensitive. Sample IDs must have the following characteristics:
- Unique for the run.
- 1–70 characters.
- No spaces.
- Alphanumeric characters with underscores and dashes. If you use an underscore or dash, enter an alphanumeric character before and after the underscore or dash. eg, Sample1-T5B1_022515.
- Cannot be called all, default, none, unknown, undetermined, stats, or reports.
- Must match a Sample_ID listed in the [BCLConvert_Data] section. Each sample must have a unique combination of Lane (if applicable), sample ID, and index ID or the analysis will fail.
Sample_Type
Optional
Enter DNA
Case_ID
Optional
A unique ID that links the same biological samples from the same individual. It is used for variant interpretation in downstream software such as the Illumina Connected Insights software
Sample_Description
Optional
To ensure a successful analysis, follow these guidelines:
Avoid any blank lines at the end of the sample sheet; these can cause the analysis to fail.
When running local analysis using the command line save the sample sheet in the sequencing run folder with the default name SampleSheet.csv, or choose a different name and specify the path in the command-line options.
Refer to the following requirements to create sample sheets for running the analysis on ICA with Auto-launch. For sample sheet requirements common between deployments see Standard Sample Sheet Requirements. Samples sheets can be created using BaseSpace Run Planning Tool or manually by downloading and editing a sample sheet template
To auto-launch analysis from the sequencer run folder, ensure the StartsFromFastq and SampleSheetRequested fields are set to FALSE. To auto-launch analysis from FASTQs after BCL Convert auto-launch, StartsFromFastq and SampleSheet Requested fields must be set to TRUE
Refer to [Heme_Data] Section for this section's requirements.
SoftwareVersion
Not Required
The Heme pipeline software version
StartsFromFastq
Required
Set the value to TRUE or FALSE. To auto-launch from BCL files, set to FALSE. To auto-launch from FASTQ files after auto-launch of BCL Convert, set to TRUE.
SampleSheetRequested
Required
Set the value to TRUE or FALSE. To auto-launch from BCL files, set to FALSE. To auto-launch from FASTQ files after auto-launch of BCL Convert, set to TRUE.
Sample_ID
Not Required
The same sample ID used in the Cloud_HemeS_Data section.
ProjectName
Not Required
The BaseSpace project name.
LibraryName
Not Required
Combination of sample ID and index values in the following format: sampleID_Index_Index2
LibraryPrepKitName
Required
The Library Prep Kit used.
GeneratedVersion
Not Required
The cloud GSS version used to create the sample sheet. Optional if manually updating a sample sheet.
CloudWorkflow
Not Required
Ica_workflow_1
Cloud_Heme_Pipeline
Required
This value is a universal record number (URN). The valid values are described in the
BCLConvert_Pipeline
Required
The value is a URN in the following format: urn:ilmn:ica:pipeline: <pipeline-ID>#<pipeline-name>
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments.
For DRAGEN germline runs, it is recommended to use the pangenome hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
For more information see: .
DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.
Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
For more information, see .
For CNV PON requirements and generation options see .
For Targeted Caller PON requirements and generation options see .
CNV and Targeted Caller require separate PON files, but the intermediate counts files can be generated in the same DRAGEN command line invocation. Follow the steps below to generate the CNV and Targeted Caller PON files. Note that Targeted Caller is only supported with the Illumina CS/PGx Custom Enrichment Research Panel.
Step 1. Generate CNV target counts and Targeted exome counts of individual samples from the sequencing run.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
Step 3. Targeted Caller PON file generation.
$TARGETED_PON_COUNTS_LIST is a text file with one line for each path to a Targeted Caller exome counts file generated in step 1 (<output-file-prefix>.targeted.exome.counts.json.gz). Individual exome counts files are merged into a single <output-file-prefix>.targeted.pon.json.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN Targeted Caller using the --targeted-pon option.
A systematic noise file corresponding to one of the pre-built pangenome references can be downloaded from the [DRAGEN Software Support Site page]https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).
The software is a DNA only analysis pipeline based on the DRAGEN Secondary Analysis Software. Even though it includes some of the default settings from the DNA Somatic Tumor-Normal Solid WGS DRAGEN recipe, it uses a distinct recipe with different options. A user has the ability to override specific parameters via a custom configuration file.
The software performs germline variant calling on the normal sample, and reports the following variants:
SNV (annotated)
CNV (annotated)
SV (annotated)
Targeted callers (cyp2b6, cyp2d6, cyp21a2, gbna, hba, lpa, rh and smn)
Expansion hunter
VNTR
The software perform somatic variant calling on the tumor sample and reports the following variants:
SNV (annotated)
MNV
CNV (annotated, requires germline SNV and CNV VCF)
SV (annotated, with variant deduplication)
An example command is provided that highlights the input and output used in DragenCaller step of the software, which may be found in the DRAGEN run log file. Any parameter options not displayed on the command line would be using the default value for the DRAGEN variant caller module. The detailed parameters and default arguments for the individual modules within the DragenCaller step may be found in the replay.json output. See for detailed explanations of the parameters.
The pipeline supports two reference genomes for the DRAGEN Map/Aligner - hg38 and hs37d5_chr.
The hs37d5_chr genome is the hg19 reference genome with the Chromosome Y PAR masked. It includes the NC_012920 mitocondria genome. The contigs have the chr prefix added, but without the native alternate loci names.
involves aligning sequencing reads derived from DNA libraries to a reference genome prior to variant calling.
The software currently supports both tumor and normal samples with UMI. Please use the to get details on the options.
DRAGEN continues to use these final alignments as input for various variant calls such as gene amplification (copy number) calling, small variant calling (SNV, indel, MNV, delin), and DNA library quality control.
DRAGEN supports calling SNVs, indels, MNVs, and delins in tumor-only samples by using mapped and aligned DNA reads from a tumor sample as input. Variants are detected via both column wise pileup analysis and local de novo assembly of haplotypes. The de novo haplotypes allow the detection of much larger insertions and deletions than possible through column wise pileup analysis only. DRAGEN insertions and deletions are validated with lengths of at least 0–25 bp and more than 25 bp can be supported. In addition, DRAGEN also uses the de novo assembly to detect SNVs, insertions, and deletions that are co-phased and part of the same haplotypes. Any such co-phased variants that are within a window of 15 bp can then be reassembled into complex variants (MNVs and delins). The tumor-only pipeline produces a VCF file containing both germline and somatic variants that can be further analyzed to identify tumor mutations. The pipeline makes no ploidy assumptions, enabling detection of low-frequency alleles.
DRAGEN small variant calling includes the following steps:
Detects regions with sufficient read coverage (callable regions).
Detects regions where the reads deviate from the reference and there is a possibility of a germline or somatic call (active regions).
Assembles de novograph haplotypes are assembled from reads (haplotype assembly).
Extracts possible somatic or germline calls (events) from column wise pileup analysis.
Additional information is available at .
The supports both matched tumor-normal pairs and tumor only samples. The germline mode of the small variant caller is used to analyze the normal sample in the matched pair.
The DRAGEN copy number variant caller performs amplification, reference, and deletion calling for CNV targets within the assay. It counts the coverage of each target interval on the panel, uses a preprocessed panel of normal samples to normalize target counts, corrects for GC coverage bias, and calculates scores of a CNV event from observed coverage and makes copy number calls.
Additional information is available at .
Absolute copy numbers are calculated by the CNV ASCN Caller. See .
See more information available at .
The DRAGEN Structural Variant (SV) Caller is described .
The DUX4 rearrangement caller is described .
The Variant Deduplication is described
The contamination analysis step detects foreign human DNA contamination using the SNP error file and pileup file that are generated during the small variant calling and the TMB trace file. The software determines whether a sample has foreign DNA using the contamination score. In contaminated samples, the variant allele frequencies in SNPs shift from the expected values of 0%, 50%, or 100%. The algorithm collects all positions that overlap with common SNPs that have variant allele frequencies of < 25% or > 75%. Then, the algorithm computes the likelihood that the positions are an error or a real mutation. The contamination score is the sum of all the log likelihood scores across the predefined SNP positions with minor allele frequency < 25% in the sample and are not likely due to CNV events.
The larger the contamination score, the more likely there is foreign DNA contamination. A sample is considered to be contaminated if the contamination score is above predefined quality threshold. The contamination score was found to be high in samples with highly rearranged genomes or HRD samples. 1% of HRD samples found to be above the threshold with no evidence for actual contamination.
The Illumina Annotation Engine performs annotation of small variants, and CNVs. The inputs are gVCF files and the outputs are annotated JSON files.
The Illumina Annotation Engine processes each variant entry and annotates with available information from databases such as dbSNP, gnomAD genome and exome, 1000 genomes, ClinVar, COSMIC, RefSeq, and Ensembl. The header includes version information and general details. Each annotated variant is included as a nested dictionary structure in separate lines following the header.
The database content included with Nirvana database is available at the .
The pipeline currently does not support annotation of gVCF files. Please use the to perform tertiary analysis.
DRAGEN is used to compute tumor mutational burden (TMB) in coding regions where there is sufficient coverage.
The following variants are excluded from the TMB calculation:
Non-PASS variants.
Mitochondrial variants.
MNVs.
Variants that do not meet a minimum depth threshold.
Variants with a population allele count ≥ 10 that are observed in either the 1000 Genomes or gnomAD databases are marked as germline. MNVs, which do not count towards TMB, may be marked as germline when all their component small variants are marked as germline. The proxy filter scans the variants surrounding a specific variant and identifies those variants with similar variant allele frequencies (VAF). If the majority of surrounding variants of similar VAF are germline, then the variant is also marked as germline.
The formula for TMB calculation is:
Outputs are captured in a .tmb.trace.tsv file that contains information on variants used in the TMB calculation and a .tmb.metrics.json file that contains the TMB score calculation and configuration details.
Please see the for details about the TMB biomarker analysis.
DRAGEN can determine the MSI status of a sample. It uses a normal reference file, which was created from a set of normal samples. During sequencing, normal reference files are generated by tabulating read counts for each microsatellite site. The normal file contains the read count distribution for each microsatellite.
MSI calling for a tumor-only sample is performed by first tabulating tumor counts from the read alignments for each microsatellite site. Then, the Jensen-Shannon distance (JSD) is calculated between each pair of tumor and normal baseline samples. DRAGEN determines unstable sites by performing Chi-square testing of tumor JSD and normal JSD distributions. Unstable sites are called if the mean distance difference of the two JSD distributions is ≥to the distance threshold and Chi-square p-value is ≤ to the p-value threshold. Lastly, DRAGEN produces an MSI status given assessed site count, unstable site count, the percentage of unstable sites in all assessed sites, and the sum of the Jensen-Shannon distance of all the unstable sites.
Please see the for details about the MSI biomarker analysis.
Homologous Recombination Deficiency (HRD) score is a whole genome signature measurement of genomic instability. The HRD is composed of the sum of three components: loss of heterozygosity (LOH), telomeric allele imbalance (TAI), and large-scale state transition (LST). A panel of normal samples is used for both bias reduction and normalization prior to HRD score estimation. Final HRD results can be found in the *.hrdscore.csv file.
Please see the for details about the HRD biomarker analysis.
Please see the for details.
Please see
Please see .
Please see
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
--fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST --fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON. See 'In-run PON' section below.
# HLA genotyper
--enable-hla true
# Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel)
--enable-targeted true
--targeted-pon $PATH #Targeted PON. See 'In-run PON' section below.
--targeted-systematic-noise $PATH #Targeted systematic noise file
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-enable-self-normalization true
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true Enter trim This indicates that the BCL Convert software trims the specified adapter sequences from each read.
MinimumTrimmedReadLength
Optional
Enter 35. Reads with a length trimmed below this point are masked.
MaskShortReads
Optional
Enter 35. Reads with a length trimmed below this point are masked.
Indicates which lane corresponds to a given sample. Enter a single numeric value per row. Cannot be empty, i.e the analysis fails if the Lane column is present without a value in each row.
Sample description must meet the following requirements: - 1–50 characters. - Alphanumeric characters with underscores, dashes and spaces. If you enter a underscore, dash, or space, enter an alphanumeric character before and after. eg, heme-WGS_213.
IndexAdapterKitName
Not Required
The Index Adapter Kit used.
MSI
HRD
ASCN
LOH
DUX4
HLA
Calibrates read base qualities to account for background noise.
Computes read likelihoods for each read/haplotype pair.
Performs mutation calling by summing the genotype probabilities across all reads/haplotype pairs.
Performs additional filtering to improve variant calling accuracy, including using a systematic noise file. The systematic noise file indicates the statistical probability of noise at specific positions in the genome. This noise file is constructed using clean (normal) samples. Regions where noise is common (eg, difficult to map regions) have higher noise values. The small variant caller penalizes those regions to reduce the probability of making false positive calls.
Variants that fall outside the eligible regions.
Tumor driver mutations. Variants with a population allele count ≥ 50 are treated as tumor driver mutations. Germline variants are not counted towards TMB. Variants are determined as germline based on a database and a proxy filter.

/opt/edico/bin/dragen \
--ref-dir /staging/dragen-app-manager/resources/Illumina_hg38-alt_masked.cnv.graph.hla.methyl_cg.rna-11_r5.0-1 \
--output-directory DragenCaller/Sample-001 \
--output-file-prefix Sample-001 \
--events-log-file DragenCaller/Sample-001/events.csv \
--enable-map-align=true \
--enable-map-align-output=true \
--enable-variant-caller=true \
--vc-emit-ref-confidence=GVCF \
--vc-enable-vcf-output=true \
--enable-targeted=true \
--targeted-merge-vc=true \
--enable-star-allele=true \
--enable-cnv=true \
--cnv-enable-self-normalization=true \
--repeat-genotype-enable=true \
--enable-sv=true \
--enable-vntr=true \
--sv-vntr-merge=false \
--enable-hla=true \
--hla-enable-class-2=true \
--vc-output-evidence-bam=false \
--qc-detect-contamination=true \
--qc-coverage-ignore-overlaps=false \
--logging-to-output-dir=true \
--max-base-quality=63 \
--enable-duplicate-marking false \
--tumor-normal-has-umi both \
--umi-source qname \
--umi-library-type nonrandom-duplex \
--umi-min-supporting-reads 1 \
--umi-correction-table /staging/dragen-app-manager/resources/Illumina_solid-wgs-tn-resources_4.4.4.2/umi/umi_correction_table.txt.gz \
--bam-input Sample-001.bam \
--force --umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. Germline-aware Mode.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
For more information see: UMI Options.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
For more information see: 5-Base Pipeline.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
For more information see: 5-Base Pipeline.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
For more information see: 5-Base Pipeline.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
DRAGEN supports the construction of reference hash tables for both human and non-human reference genomes. The reference autodetect feature of DRAGEN is able to recognize the reference hash tables build on the four Human reference genomes: hg19 (hg19), GRCh37/hs37d5 (hs37d5), GRCh38/hs38d1(hg38), and T2T-CHM13v2.0 (chm13).
DRAGEN supports pangenome reference hash tables which extend the reference genomes with alternative variant paths from a sample cohort used to construct the pangenome reference. A pangenome-based reference improves the mapping accuracy of Illumina reads in the “Difficult-to-Map Regions” of the genome and the downstream variant calling.
Pre-built human references are available for download at DRAGEN Software Support Site page.
The pangenome is the recommended for Germline human analyses. The accuracy achieved with pangenome references are highlighted in the plot below.
In the following tables we summarize the reference support for each DRAGEN component and the recommended reference type for each component.
* DRAGEN supports the component execution, however the component's accuracy has not been established.
By default, DRAGEN will error out if a linear reference is provided when running a component for which a pangemome reference is recommended as listed in the above table. If the user is sure that a linear reference is reference is desired, the error can be suppressed by setting --validate-pangenome-reference=false.
See for how to build a custom reference genome.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
For more information see: .
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
For more detail on the small variant caller in somatic mode please refer to
For instructions on how to download the Nirvana annotation database, please refer to
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
--fastq-list $PATH
--fastq-list-sample-id $STRING --fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --bam-input $PATH --cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
# Targeted caller (only if using the Illumina CS/PGx Custom Enrichment Research Panel)
--targeted-generate-exome-counts true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN pangenome hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--targeted-pon-counts-list $TARGETED_PON_COUNTS_LIST --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-enable-umi-liquid true #>= 0.1% VAF
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-enable-self-normalization true
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-min-supporting-reads 1 #Default=2
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-enable-self-normalization true
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-min-supporting-reads 1 #Default=2
# 5-Base
--methylation-conversion illumina
--methylation-generate-cytosine-report true
--methylation-compress-cx-report true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-enable-umi-solid true #>= 1% VAF
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true --methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 2. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
Targeted Callers
Pangenome
Linear
RNA
Linear
Linear
De Novo
Pangenome
Linear
Joint Genotyping
Pangenome
Linear
Biomarkers (HLA)
Pangenome
Linear
gVCF genotyper
Pangenome
Linear
Yes*
Yes
SV
Yes
Yes
Yes
Yes*
Yes
Expansion Hunter
Yes
Yes
Yes
No
No
Targeted Callers
Yes
Yes
Yes
No
No
RNA
Yes
Yes
Yes
Yes*
Yes
De Novo
Yes
Yes
Yes
Yes*
Yes
Joint Genotyping
Yes
Yes
Yes
Yes*
Yes
Biomarkers (HLA)
Yes
Yes
Yes
Yes*
No
gVCF genotyper
Yes
Yes
Yes
Yes*
Yes
Yes*
No
CNV
Yes
Yes
Yes
Yes*
No
SV
Yes
Yes
Yes
Yes*
No
TruSeq Methyl Capture
Methylation
Linear
Linear
Yes
Yes
Yes
No
No
TruSeq DNA Methyl
Methylation
Yes
Yes
Yes
No
No
TruSeq Methyl Capture
Methylation
Yes
Yes
Yes
No
No
SNV
Pangenome
Linear
CNV
Pangenome
Linear
SV
Pangenome
Linear
Expansion Hunter
Pangenome
Linear
SNV
Yes
Yes
Yes
Yes
Yes
CNV
Yes
Yes
SNV
Linear
Linear
UMI SNV
Linear
Linear
CNV
Linear
Linear
SV
Linear
Linear
SNV
Yes
Yes
Yes
Yes*
No
UMI SNV
Yes
Yes
5-base
Germline
Pangenome
Linear
5-base
Somatic
Linear
Linear
TruSeq DNA Methyl
Methylation
Linear
5-base Germline
Germline
Yes
Yes
Yes
No
No
5-base Somatic
Nirvana
Pangenome
Linear
Nirvana
Yes
Yes
Yes
No
Yes

Yes
Yes
Linear
Somatic
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--methylation-compress-cx-report true
Set to true to enable compression of the CX_report (default=true).
--methylation-keep-ref-cytosine true
Set to true to keep all reference cytosines in the CX_report file, even if they don't appear in the input reads (default=false).
--enable-cpg-methylated-mapping true
Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.
--methylation-report-to-vcf
Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).
--methylation-report-to-gvcf
Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--methylation-conversion STRING
Library conversion for methylation analysis. Options: none, c_t, mc_t, illumina (default=none).
--methylation-protocol STRING
Library protocol for methylation analysis. Options: none, directional, non-directional, directional-complement, pbat. The default value for methylation-conversion=illumina is directional, otherwise it is none.
--methylation-mapq-threshold INT
Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).
--methylation-generate-mbias-report true
Whether to generate a per-sequencer-cycle methylation bias report (default=true).
--mbias-report-include-overlaps
Calculate methylation stats for overlapping bases between mates (default=false).
--methylation-generate-cytosine-report true
Whether to generate a genome-wide cytosine methylation CX_report file (default=false).
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for DNA amplicon samples.
--tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH To support the varied designs of amplicon panels and the specific requirements of different analysis types (e.g., SNV, CNV, SV, MSI, RNA fusion, RNA splice variants, and RNA 3'/5' imbalance ratio), panel-specific parameter settings have been integrated into the command-line options. Each supported Pillar panel has a dedicated option, and the details for these DNA panels are listed in the table below:
Panel Name
Short Name
Panel Code
Sample Type
Default variant caller enabled
Command Line Options
oncoReveal BRCA1 & BRCA2 plus CNV
BRCA CNV
BR283
DNA
SNV, CNV
--amplicon-enable-dna-brca
oncoReveal Lymphoid
Lymphoid
P-LYM-01
DNA
For more detail on the amplicon pipeline, please refer to DRAGEN Amplicon Pipeline
For DRAGEN amplicon runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Enables map-align.
--enable-map-align-output true
Optionally save the output BAM (Default=false).
--amplicon-primer-length INT
If an alignment starts inside the primer region of the amplicon target, the alignment is assigned to the amplicon.
--amplicon-allow-partial-target true
In order to detect deletion events that are close to the target boundaries, we now require only one of the reads to start in the primer region (Default=true)
For more detail on the amplicon post-alignment processing, please refer to DRAGEN Amplicon Pipeline
--enable-duplicate-marking false
The Amplicon Pipeline disables duplicate marking. In amplicon assays, fragments originate from a limited number of unique start and end positions, making conventional duplicate detection inappropriate. (Default=false)
--vc-target-bed
Limit variant calling to region of interest. Default is amplicon target bed.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2).
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-target-vaf FLOAT
The default is 0.03 (3%). For ctDNA, the default is 0.001 (0.1%).
--vc-af-call-threshold FLOAT
If the AF filter is enabled using --vc-enable-af-filter=true, the option sets the allele frequency call threshold for nuclear chromosomes to emit a call in the VCF. The default value is 0.01. For ctDNA, the default is 0.001.
--vc-af-filter-threshold FLOAT
If the AF filter is enabled using --vc-enable-af-filter=true, the option sets the allele frequency filter threshold for nuclear chromosomes to mark emitted VCF calls as filtered. The default value is 0.05. For ctDNA, the default is 0.003.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. By default, bed is used for standard panels and hslm for Pillar panels with a pre-built PON.
--amplicon-cnv-use-default-pon false
We recommend including in-run normal samples—matched in sample type and library preparation—in the same sequencing run to serve as the PON. If generating a custom PON is not feasible, for Pillar panels, the pre-packaged panel-specific PON can be used as a fallback. To enable this, set the option to true
--cnv-segmentation-bed $PATH
You can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed. If bed segmentation mode is used, the segmentation bed is auto-generated from amplicon target bed by default
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 500 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and PON for a microsatellite: 0.3 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Default as amplicon target bed.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in amplicon workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Optional systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed).
For more information, see Structural Variant Calling.
Systematic noise files are considered essential in Tumor-Only workflows.
DRAGEN has pre-built systematic noise files for Pillar panels. To achieve high sensitivity, we recommend generating a custom systematic noise file as described in the Custom section
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-50 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-50 normal samples.
Gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise).
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are considered experimental in amplicon.
Custom systematic noise files can be generated for amplicon Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 50 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (<output-file-prefix>.target.counts.gz as cnv-enable-gcbias-correction is by default false in amplicon). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
See the user guide: .
Microsatellite sites file can be downloaded here: .
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
When the analysis run completes, the software generates an analysis output in a specified location with the folder name /staging/DRAGEN_Solid_WGS_Tumor_Normal_Pipeline_{version}_Analysis_{datetimestamp}. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID.
Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.
📂 Results - Contains the final result files from the pipeline.
📄 MetricsOutput.tsv - Contains summary metrics for all samples.
📂 Case1
📄 Case1_MetricsOutput.tsv - Contains summary metrics for tumor and normal samples for Case1.
📂 TumorSample1
📄 TumorSample1.hard-filtered.vcf.gz - Contains somatic small variant calls.
📄 TumorSample1.cnv.vcf.gz - Contains somatic copy number variant calls.
📄 TumorSample1.sv.vcf.gz - Contains somatic structural variant calls.
📄 TumorSample1_SNV_Tumor_Annotated.json.gz - Contains somatic small variant annotations.
📄 TumorSample1_CNV_Tumor_Annotated.json.gz - Contains somatic copy number variant annotations.
📄 TumorSample1_SV_Tumor_Annotated.json.gz - Contains somatic structural variant annotations.
📄 TumorSample1.tmb.metrics.csv - Contains the TMB result and metrics.
📄 TumorSample1.microsat_output.json - Contains the MSI result and metrics.
📄 TumorSample1.hrdscore.csv - Contains the HRD result and metrics.
📄 TumorSample1.tn.bw - Contains tangent normalized somatic coverage in BigWig format.
📄 TumorSample1.tumor.baf.bedgraph.gz - Contains somatic b-allele frequency in BedGraph format.
📄 TumorSample1.bam - Contains aligned somatic reads in BAM format.
📄 TumorSample1.bam.bai - Contains index of aligned somatic reads in BAI format.
📂 NormalSample1
📄 NormalSample1.hard-filtered.vcf.gz - Contains germline small variant calls.
📄 NormalSample1.cnv.vcf.gz - Contains germline copy number variant calls.
📄 NormalSample1.sv.vcf.gz - Contains germline structural variant calls.
📄 NormalSample1.repeats.vcf.gz - Contains germline short tandem repeat calls.
📄 NormalSample1.vntr.vcf.gz - Contains germline variable number tandem repeat calls.
📄 NormalSample1.targeted.vcf.gz - Contains germline targeted (star allele) calls.
📄 NormalSample1.targeted.json - Contains germline targeted (star allele) data in JSON format.
📄 NormalSample1_SNV_Normal_Annotated.json.gz - Contains germline small variant annotations.
📄 NormalSample1_CNV_Normal_Annotated.json.gz - Contains germline copy number variant annotations.
📄 NormalSample1SV_Normal_Annotated.json.gz - Contains germline structural variant annotations.
📄 NormalSample1.hla.tsv - Contains germline HLA typing calls.
📄 NormalSample1.bam - Contains aligned germline reads in BAM format.
📄 NormalSample1.bam.bai - Contains index of aligned germline reads in BAI format.
📂 Logs_Intermediates - Contains all intermediate files for each step of the pipeline (BAMs moved to the Results folder).
📂 ResourceVerification
📂 SampleSheetValidation
📂 NormalFastqValidation
📂 TumorFastqValidation
📂 DragenCaller
📂 TumorNormalVariantCaller
📂 Tmb
📂 Annotation
📂 SampleAnalysisResults
📂 AdditionalSarjMetrics
📂 MetricsOutput
📂 Work - (DRAGEN server only) Contains information and files related to Nextflow execution.
This section describes the summary output files generated during analysis.
The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each case.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# DNA amplicon
--enable-dna-amplicon true
--amplicon-target-bed $PATH
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #optional for SNV systematic noise
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF) except for ctDNA (which is 0.1%)
# SV
--enable-sv true
--sv-systematic-noise $PATH #optional for SV systematic noise
# CNV
--enable-cnv true
--cnv-combined-counts $PATH #CNV PON
# Annotation
--enable-variant-annotation true
--variant-annotation-data PATH
# Microsatellite Instability (MSI) #optional for panels that support MSI
--amplicon-enable-msi=true
--msi-microsatellites-file $PATH #MSI site file
--msi-ref-normal-input $PATH #MSI PON file--tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--enable-dna-amplicon true
--amplicon-target-bed $PATH
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-dna-amplicon true
--amplicon-target-bed $PATH
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-dna-amplicon true
--amplicon-target-bed $PATH
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--enable-sv true
--sv-systematic-noise $PATH #Optional
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-enable-self-normalization true
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# TMB
--enable-tmb true
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-normal
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 SNV, SV
--amplicon-enable-dna-lymphoid
oncoReveal Core LBx
Core LBx
P-LBX-01
cfDNA
SNV, CNV, MSI
--amplicon-enable-cfdna-core
oncoReveal Essential LBx
Essential LBx
P-LBX-04
cfDNA
SNV, CNV, MSI
--amplicon-enable-cfdna-essential
oncoReveal Essential MPN
MPN
MY7
DNA
SNV
--amplicon-enable-dna-mpn
oncoReveal Multi-Cancer v4 with CNV
Multi-Cancer with CNV
HS341
DNA
SNV, CNV
--amplicon-enable-dna-multicancer
oncoReveal Myeloid
Myeloid
MY766
DNA
SNV, SV
--amplicon-enable-dna-myeloid
oncoReveal Nexus 21 Gene
Nexus
P-CMC-01
DNA
SNV, SV
--amplicon-enable-dna-nexus
oncoReveal Solid Tumor v2
Solid Tumor v2
P-ST-02
DNA
SNV
--amplicon-enable-dna-solidtumor
NA
10.0
PCT_SOFT_CLIPPED_BASES_R2 (%)
NA
10.0
PCT_SUPPLEMENTARY_(CHIMERIC)_ALIGNMENTS (%)
NA
15.0
ESTIMATED_READ_LENGTH (bp)
NA
NA
MEAN_INSERT_LENGTH (bp)
NA
NA
MEDIAN_INSERT_LENGTH (bp)
NA
NA
INPUT_BASES_OVER_REFERENCE_GENOME_SIZE (Count)
NA
NA
ESTIMATED_SAMPLE_CONTAMINATION (%)
NA
2.00
NA
NA
TOTAL_NUMBER_OF_FAMILIES (Count)
NA
NA
FAMILIES_DISCARDED (Count)
NA
NA
DUPLEX_FAMILIES (Count)
NA
NA
MEAN_FAMILY_DEPTH (Count)
NA
NA
NA
10.0
PCT_SOFT_CLIPPED_BASES_R2 (%)
NA
10.0
PCT_SUPPLEMENTARY_(CHIMERIC)_ALIGNMENTS (%)
NA
15.0
ESTIMATED_READ_LENGTH (bp)
NA
NA
MEAN_INSERT_LENGTH (bp)
NA
NA
MEDIAN_INSERT_LENGTH (bp)
NA
NA
INPUT_BASES_OVER_REFERENCE_GENOME_SIZE (Count)
NA
NA
ESTIMATED_SAMPLE_CONTAMINATION (%)
NA
2.00
NA
NA
TOTAL_NUMBER_OF_FAMILIES (Count)
NA
NA
FAMILIES_DISCARDED (Count)
NA
NA
DUPLEX_FAMILIES (Count)
NA
NA
MEAN_FAMILY_DEPTH (Count)
NA
NA
Metric (UOM)
LSL Guideline
USL Guideline
TOTAL_INPUT_READS (Count)
NA
NA
PCT_MAPPED_READS (%)
90.00
NA
PCT_PROPERLY_PAIRED_READS (%)
90.00
NA
PCT_Q30_BASES (%)
80.00
NA
Metric (UOM)
LSL Guideline
USL Guideline
PCT_DUPLICATE_MARKED_READS (%)
NA
20.00
PCT_READS_WITH_VALID_OR_CORRECTABLE_UMIS (%)
NA
NA
PCT_READS_IN_DISCARDED_FAMILIES (%)
NA
NA
PCT_READS_FILTERED_OUT (%)
NA
NA
Metric (UOM)
LSL Guideline
USL Guideline
AVERAGE_GENOME_COVERAGE (Count)
20.00
NA
PCT_UNIFORMITY_OF_COVERAGE_OVER_20PCT_OF_MEAN (%)
50.00
NA
PCT_GENOME_20X (%)
80.00
NA
Metric (UOM)
LSL Guideline
USL Guideline
TOTAL_INPUT_READS (Count)
NA
NA
PCT_MAPPED_READS (%)
90.00
NA
PCT_PROPERLY_PAIRED_READS (%)
90.00
NA
PCT_Q30_BASES (%)
80.00
NA
Metric (UOM)
LSL Guideline
USL Guideline
PCT_DUPLICATE_MARKED_READS (%)
NA
20.00
PCT_READS_WITH_VALID_OR_CORRECTABLE_UMIS (%)
NA
NA
PCT_READS_IN_DISCARDED_FAMILIES (%)
NA
NA
PCT_READS_FILTERED_OUT (%)
NA
NA
Metric (UOM)
LSL Guideline
USL Guideline
AVERAGE_GENOME_COVERAGE (Count)
20.00
NA
PCT_UNIFORMITY_OF_COVERAGE_OVER_20PCT_OF_MEAN (%)
50.00
NA
PCT_GENOME_20X (%)
80.00
NA
Metric (UOM)
LSL Guideline
USL Guideline
OUTLIER_BAF_FRACTION (NA)
NA
0.90
Metric (UOM)
LSL Guideline
USL Guideline
ESTIMATED_PURITY (%)
20.00
NA
PCT_SOFT_CLIPPED_BASES_R1 (%)
PCT_READS_WITH_UNCORRECTABLE_UMIS (%)
PCT_SOFT_CLIPPED_BASES_R1 (%)
PCT_READS_WITH_UNCORRECTABLE_UMIS (%)
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. Germline-aware Mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--heme-cnv true
Configures DRAGEN to use CNV settings for HEME.
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--heme-sv true
Configures DRAGEN to use SV settings for Liquid Tumors (e.g., AML/MLL).
--sv-min-scored-variant-size $INT
100000
For more information, see Structural Variant Calling.
--dux4-skip-santiy-check true
Bypass the requirements checks if the input datasets don't comply with parameters listed in prerequisites
For more information, see DUX4-rearrangement Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
See the user guide: .
Microsatellite sites file can be downloaded here: .
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
See the user guide: .
Microsatellite sites file can be downloaded here: .
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
See the user guide: .
Microsatellite sites file can be downloaded here: .
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
--tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Required
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--heme-sv true
--sv-systematic-noise $PATH #Recommended
# DUX4
--enable-dux4-caller true
# CNV
--heme-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-enable-self-normalization true
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--enable-sv true
--sv-systematic-noise $PATH #Recommended
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-enable-self-normalization true
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# SV
--enable-sv true
--sv-systematic-noise $PATH #Recommended
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-enable-self-normalization true
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--tumor-normal-has-umi STRING #Sample(s) containing UMI ['tumor', 'both'].
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-sq-filter-threshold 17.5 #recommended in tumor-normal UMI mode
# SV
--enable-sv true
--sv-systematic-noise $PATH #Optional
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-enable-self-normalization true
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# TMB
--enable-tmb true
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-normal
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--tumor-normal-has-umi STRING
Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. Germline-aware Mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
For more information, see Structural Variant Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
See the user guide: .
Microsatellite sites file can be downloaded here: .
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
For CNV PON requirements and generation options see .
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
See the user guide: .
Microsatellite sites file can be downloaded here: .
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
For CNV PON requirements and generation options see .
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See:
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using .
FQ list Input
FQ Input
BAM Input
CRAM Input
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
For more information see: .
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
For more detail on the small variant caller in somatic mode please refer to
For more information, see .
For instructions on how to download the Nirvana annotation database, please refer to
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
See the user guide: .
Microsatellite sites file can be downloaded here: .
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.
For more information, see .
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: .
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: .
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the or the DRAGEN Systematic Noise File Builder Pipeline on .
For CNV PON requirements and generation options see .
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
The seed-density option controls how many (normally overlapping) primary seeds from each read the mapper looks up in its hash table for exact matches. The maximum density value of 1.0 generates a seed starting at every position in the read, ie, (L-K+1) K-base seeds from an L-base read.
Seed density must be between 0.0 and 1.0. Internally, an available seed pattern equal or close to the requested density is selected. The sparsest pattern is one seed per 32 positions, or density 0.03125.
Accuracy Considerations--Generally, denser seed lookup patterns improve mapping accuracy. However, for modestly long reads (eg, 50 bp+) and low sequencer error rates, there is little to be gained beyond the default 50% seed lookup density.
Speed Considerations--Denser seed lookup patterns generally slow down mapping, and sparser seed patterns speed it up. However, when the seed mapping stage can run faster than the aligning stage, a sparser seed pattern does not make the mapper much faster.
Relationship to Reference Seed Interval
Functionally, a denser or sparser seed lookup pattern has an impact very similar to a shorter or longer reference seed interval (build hash table option --ht-ref-seed-interval). Populating 100% of reference seed positions and looking up 50% of read seed positions has the same effect as populating 50% of reference seed positions and looking up 100% of read seed positions. Either way, the expected density of seed hits is 50%.
More generally, the expected density of seed hits is the product of the reference seed density (the inverse of the reference seed interval) and the seed lookup density. For example, if 50% of reference seeds are populated and 33.3% (1/3) of read seed positions are looked up, then the expected seed hit density should be 16.7% (1/6).
DRAGEN automatically adjusts its precise seed lookup pattern to ensure it does not systematically miss the seed positions populated from the reference. For example, the mapper does not look up seeds matching only odd positions in the reference when only even positions are populated in the hash table, even if the reference seed interval is 2 and seed-density is 0.5.
The --Mapper.map-orientations option is used in mapping reads for bisulfite methylation analysis. It is set automatically based on the value set for ‑‑methylation-protocol.
The --Mapper.map-orientations option can restrict the orientation of read mapping to only forward in the reference genome, or only reverse-complemented. The valid values for --map-orientations are as follows.
0--Either orientation (default)
1--Only forward mapping
2--Only reverse-complemented mapping
If mapping orientations are restricted and paired end reads are used, the expected pair orientation can only be FR, not FF or RF.
Although DRAGEN primarily maps reads by finding exact reference matches to short seeds, it can also map seeds differing from the reference by one nucleotide by also looking up single-SNP edited seeds. Seed editing is usually not necessary with longer reads (100 bp+), because longer reads have a high probability of containing at least one exact seed match. This is especially true when paired ends are used, because a seed match from either mate can successfully align the pair. But seed editing can, for example, be useful to increase mapping accuracy for short single-ended reads, with some cost in increased mapping time. The following options control seed editing:
Seed Editing Options
edit-mode and edit-chain-limit
The edit-mode and edit-chain-limit options control when seed editing is used. The following four edit-mode values are available:
Edit mode 0 requires all seeds to match exactly. Mode 3 is the most expensive because every seed that fails to match the reference exactly is edited. Modes 1 and 2 employ heuristics to look up edited seeds only for reads most likely to be salvaged to accurate mapping.
The main heuristic in edit modes 1 and 2 is a seed chain length test. Exact seeds are mapped to the reference in a first pass over a given read, and the matching seeds are grouped into chains of similarly aligning seeds. If the longest seed chain (in the read) exceeds a threshold edit-chain-limit, the read is judged not to require seed editing, because there is already a promising mapping position.
Edit mode 1 triggers seed editing for a given read using the seed chain length test. If no seed chain exceeds edit-chain-limit (including if no exact seeds match), then a second seed mapping pass is attempted using edited seeds. Edit mode 2 further optimizes the heuristic for paired-end reads. If either mate has an exact seed chain longer than edit-chain-limit, then seed editing is disabled for the pair, because a rescue scan is likely to recover the mate alignment based on seed matches from one read. Edit mode 2 is the same as mode 1 for single-ended reads.
edit-seed-num and edit-read-len
For edit modes 1 and 2, when the heuristic triggers seed editing, these options control how many seed positions are edited in the second pass over the read. Although exact seed mapping can use a densely overlapping seed pattern, such as seeds starting at 50% or 100% of read positions, most of the value of seed editing can be obtained by editing a much sparser pattern of seeds, even a nonoverlapping pattern. Generally, if a user application can afford to spend some additional amount of mapping time on seed editing, a greater increase in mapping accuracy can be obtained for the same time cost by editing seeds in sparse patterns for a large number of reads, than by editing seeds in dense patterns for a small number of reads.
Whenever seed editing is triggered, these two options request edit-seed-num seed editing positions, distributed evenly over the first edit-read-len bases of the read. For example, with 21-base seeds, edit-seed-num=6 and edit-read-len=100, edited seeds can begin at offsets {0, 16, 32, 48, 64, 80} from the 5' end, consecutive seeds overlapping by 5 bases. Because sequencing technologies often yield better base qualities nearer the (5') beginning of each read, this can focus seed editing where it is most likely to succeed. When a particular read is shorter than edit-read-len, fewer seeds are edited.
Seed editing is more expensive when the reference seed interval (build hash table option ‑-ht‑ref-seed-interval) is greater than 1. For edit modes 1 and 2, additional seed editing positions are automatically generated to avoid missing the populated reference seed positions. For edit mode 3, the time cost can increase dramatically because query seeds matching unpopulated reference positions typically miss and trigger editing.
The first stage of mapping is to generate seeds from the read and look for exact matches in the reference genome. These results are then refined by running full Smith-Waterman alignments on the locations with the highest density of seed matches. This well-documented algorithm works by comparing each position of the read against all the candidate positions of the reference. These comparisons correspond to a matrix of potential alignments between read and reference. For each of these candidate alignment positions, Smith-Waterman generates scores that are used to evaluate whether the best alignment passing through that matrix cell reaches it by a nucleotide match or mismatch (diagonal movement), a deletion (horizontal movement), or an insertion (vertical movement). A match between read and reference provides a bonus, on the score, and a mismatch or indel imposes a penalty. The overall highest scoring path through the matrix is the alignment chosen.
The specific values chosen for scores in this algorithm indicate how to balance, for an alignment with multiple possible interpretations, the possibility of an indel as opposed to one or more SNPs, or the preference for an alignment without clipping. The default DRAGEN scoring values are reasonable for aligning moderate length reads to a whole human reference genome for variant calling applications. But any set of Smith-Waterman scoring parameters represents an imprecise model of genomic mutation and sequencing errors, and differently tuned alignment scoring values can be more appropriate for some applications.
The following alignment options control Smith-Waterman Alignment:
global The global option (value can be 0 or 1) controls whether alignment is forced to be end-to-end in the read. When set to 1, alignments are always end-to-end, as in the Needleman-Wunsch global alignment algorithm (although not end-to-end in the reference), and alignment scores can be positive or negative. When set to 0, alignments can be clipped at either or both ends of the read, as in the Smith-Waterman local alignment algorithm, and alignment scores are nonnegative. Generally, global=0 is preferred for longer reads, so significant read segments after a break of some kind (large indel, structural variant, chimeric read, and so forth) can be clipped without severely decreasing the alignment score. Setting global=1 might not have the desired effect with longer reads because insertions at or near the ends of a read can function as pseudoclipping. Also, with global=0, multiple (chimeric) alignments can be reported when various portions of a read match widely separated reference positions. Using global=1 is sometimes preferable with short reads, which are unlikely to overlap structural breaks, unable to support chimeric alignments, and are suspected of incorrect mapping if they cannot align well end-to-end. Consider using the unclip-score option, or increasing it, instead ofsetting global=1, to make a soft preference for unclipped alignments.
DRAGEN can process paired-end data passed via a pair of FASTQ files or in a single interleaved FASTQ file. The hardware maps the two ends separately, and then determines a set of alignments that seem most likely to form a pair in the expected orientation and having roughly the expected insert size. The alignments for the two ends are evaluated for the quality of their pairing, with larger penalties for insert sizes far from the expected size. The following options control processing of paired-end data:
Reorientation The pe-orientation option specifies the expected paired-end orientation. Only pairs with this orientation can be flagged as proper pairs. Valid values are as follows:
0--FR (default)
1--RF
The pe-max-penalty option limits how much the estimated MAPQ for one read can increase because its mate aligned nearby. A paired alignment is never assigned MAPQ higher than the MAPQ that it would have received mapping single-ended, plus this value. By default, pe-max-penalty = mapq-max = 255, effectively disabling this limit. The key difference between unpaired-pen and pe-max-penalty is that unpaired-pen affects calculated pair scores and thus which alignments are selected and pe-max-penalty affects only reported MAPQ for paired alignments.
When working with paired-end data, DRAGEN must choose among the highest-quality alignments for the two ends to try to choose likely pairs. To make this choice, DRAGEN uses a skew normal insert model to evaluate the likelihood that a pair of alignments constitutes a pair. This model is based on the observation that common library preparation methods have insert-size distributions that are sometimes close to normal, but also sometimes clearly asymmetric, often skewing toward longer insert sizes. The skew normal insert model is used only for the DNA mode.
If you know the statistics of your library prep for an input file (and the file consists of a single read group), you can specify the characteristics of the insert-length distribution: mean, standard deviation, shape (or skewness) and three quartiles. These characteristics can be specified with the Aligner.pe-stat-mean-insert, Aligner.pe-stat-stddev-insert, Aligner.pe-stat-shape-insert, Aligner.pe-stat-quartiles-insert, and Aligner.pe-stat-mean-read-len options. However, it is typically preferable to allow DRAGEN to detect these characteristics automatically.
Dragen automatically samples the insert-length distribution. When the software starts execution, it runs a sample of up to 2,000,000 pairs through the aligner, calculates the distribution, and then uses the resulting statistics for evaluating all pairs in the input set.
The DRAGEN host software reports the statistics in its stdout log in a report, as follows:
Note that the Mean, Standard deviation and Quartiles reported above are the sample mean, standard deviation and quartiles calculated from the initial sample of up to 2,000,000 pairs, assuming a normal distribution. The sample mean and standard deviation are used to fit the parameters of a skew-normal distribution. A skew-normal distribution is defined by starting with an underlying normal distribution (whose mean we call position or xi and standard deviation we call scale or omega) and folding a varying portion of the probability mass from one side of the mean (e.g., left side) to the other (e.g., right) side. The portion folded varies smoothly, from 0% at the original mean, approaching 100% from the left tail to the right tail. A shape parameter which we call alpha controls how rapidly the folded fraction increases, and at alpha=0 there is no folding and the distribution remains normal.
In the standard output, we also include the command line options needed to reproduce the DRAGEN run with the same insert stat settings. Note that when specifying stats on the command line, the skew-normal xi value should be used for Aligner.pe-stat-mean-insert. The omega value should be used for Aligner.pe-stat-stddev-insert, and the alpha value should be used for Aligner.pe-stat-shape-insert. If Aligner-pe-stat-shape-insert is not specified on the command line, a default value of 0 is assumed.
The insert length distribution for each sample is written to fragment_length_hist.csv. Each sample starts with the following lines
These lines are followed by the histogram for the first ~2M read pairs for DNA (~100K read pairs for RNA). The histogram counts are aggregated across all read groups sharing the same sample id (RGSM field).
When the number of sample pairs is very small, there is not enough information to characterize the distribution with high confidence. In this case, DRAGEN applies default statistics that specify a very wide insert distribution, which tends to admit pairs of alignments as proper pairs, even if they may lie tens of thousands of bases apart. In this situation, DRAGEN outputs a message, as follows:
The small samples formula calculates standard deviation as follows:
The default model is "standard deviation = 10000". If the first 2M reads are unmapped or if all pairs are improper pairs, then the standard deviation is set to 10000 and the mean and quartiles are set to 0. Note that the minimum value for standard deviation is 12, which is independent of the number of samples. Also, in the DNA mode when we have fewer than 1000 high quality alignments we revert to the normal distribution based insert model, because of insufficient number of samples to accurately estimate the parameters of the skew normal distribution.
For RNA-Seq data, the insert size distribution is not normal due to pairs containing introns. The DRAGEN software estimates the distribution using a kernel density estimator to fit a long tail to the samples. This estimate leads to a more accurate mean and standard deviation for RNA-Seq data and proper pairing.
DRAGEN writes detected paired-end stats into a tab-delimited log file in the output directory called .insert-stats.tab. This file contains the statistical distribution of detected insert sizes for each read group, including quartiles, mean, standard deviation, shape, minimum, and maximum. The information matches the standard-out report above. Additionally, the log file includes the minimum and maximum insert limits that DRAGEN applied for rescue scans. Note that the reported mean and standard deviation in this tab-limited log file are the xi and omega parameters of the skew-normal distribution.
For paired-end reads, where a seed hit is found for one mate but not the other, rescue scans hunt for missing mate alignments within a rescue radius of the mean insert length. Normally, the DRAGEN host software sets the rescue radius to 2.5 standard deviations of the empirical insert distribution. But in cases where the insert standard deviation is large compared to the read length, the rescue radius is restricted to limit mapping slowdowns. In this case, a warning message is displayed, as follows:
Although the user can ignore this warning, or specify an intermediate rescue radius to maintain mapping speed, it is recommended to use 2.5 sigmas for the rescue radius to maintain mapping sensitivity. To disable rescue scanning, set max-rescues to 0.
DRAGEN can track multiple independent alignments for each read. These alignments include the optimal (primary) one, as well as those mapping different subsegments of the read, (chimeric/supplementary), and sub-optimal (secondary) mappings of the read to different areas of the reference.
For DNA alignment by default, DRAGEN can emit one primary alignment for each read, up to three chimeric alignments (Aligner.supp-aligns=3), and no secondary alignments (Aligner.sec-aligns=0). The maximum user-specified value for supp-aligns or sec-aligns is 4095.
You can use the following configuration options to control how many of each type of alignment to include in DRAGEN output.
mapq-max The mapq-max option specifies a ceiling on the estimated MAPQ that can be reported for any alignment, from 0 to 255. If the calculated MAPQ is higher, this value is reported instead. The default is 60.
supp-aligns, sec-aligns The supp-aligns and sec-aligns options restrict the maximum number of supplementary (ie, chimeric and SAM FLAG 0x800) alignments and secondary (ie, suboptimal and SAM FLAG 0x100) alignments, respectively, that can be reported for each read. A maximum of 4095 supplementary alignments and 4095 secondary alignments can be reported for any read, in addition to a primary alignment. High settings for these two options impact speed so it is advisable to increase only as needed.
Each bit determines whether local alignments of that type are reported with hard clipping (1) or soft clipping (0). The default is 6, meaning primary alignments use soft clipping and supplementary and secondary alignments use hard clipping.
The GRCh38 human reference contains many more alternate haplotypes (ALT contigs) than previous versions of the reference. Generally, including ALT contigs in the mapping reference improves mapping and variant calling specificity, because misalignments are eliminated for reads matching an ALT contig but scoring poorly against the primary assembly. However, mapping with GRCh38's ALT contigs without special treatment can substantially degrade variant calling sensitivity in corresponding regions, because many reads align equally well to an ALT contig and to the corresponding position in the primary assembly.
The recomeneded and default approach for dealing with ALT-contigs in DRAGEN is masking regions of ALT contigs of high similarity to their corresponding primary contig. This approach is more accurate than liftover based ALT-awarness because there are many places where the "correct" or most useful liftover between a long ALT haplotype and the primary assembly is ambiguous. Incorrect liftover can produce dense clusters of mismapped reads and false variant calls. The base masking approach has the benefits of using ALT contigs without the negative consequences.
Masked hash tables are built from a standard hg18 or hg38 FASTA that contains ALT contigs. The hash table builder will automatically mask regions of the ALT contigs with Ns.
With liftover based ALT-awareness, the mapper and aligner are aware of the liftover relationship between ALT contig positions and corresponding primary assembly positions. Seed matches within ALT contigs are used to obtain corresponding primary assembly alignments, even if the latter score poorly. Liftover groups are formed, each containing a primary assembly alignment candidate, and zero or more ALT alignment candidates that lift to the same location. Each liftover group is scored according to its best-matching alignments, taking properly paired alignments into account. The winning liftover group provides its primary assembly representative as the primary output alignment, with MAPQ calculated based on the score difference to the second-best liftover group. Emitting primary alignments within the primary assembly maintains normal aligned coverage and facilitates variant calling there. If the --Aligner.en-alt-hap-aln option is set to 1 and --Aligner.supp-aligns is greater than 0, then corresponding alternate haplotype alignments can also be output, flagged as supplementary alignments.
DRAGEN requires ALT-Aware hash tables for any hg19 or GRCh38 reference where ALT contigs are detected. To disable this requirement in DRAGEN, set the --ht-alt-aware-validate option to false.
The following is a comparison of alternative options for dealing with alternate haplotypes.
Mapping without ALT contigs in the reference:
False-positive variant calls result when reads matching an alternate haplotype misalign somewhere else.
Poor mapping and variant calling sensitivity where reads matching an ALT contig differ greatly from the primary assembly.
Mapping with ALT contigs but no ALT awareness:
The Multigenome Mapper in DRAGEN significantly improves the accuracy of mapping Illumina reads, particularly in challenging regions such as segmental duplications and other difficult to map regions. This advanced method leverages population haplotypes from pangenome references to incorporate additional variant information, constructing alternative haplotype paths that improve reads mapping. By offering these alternate paths, the Multigenome Mapper enables reads containing population-specific variants to align directly to their most likely genomic locations, reducing mapping ambiguity. This improved mapping also results in improved variant calling accuracy.
When given a set of population variants (VCF) or haplotypes, the pangenome reference modification is categorized in the following types:
Alternate contigs represent population haplotypes. Alt-contigs can have a single variant or a combination of nearby phased variants.
Ambiguous codes (IUPAC codes) to represent SNPs. To improve alignment, it edits the reference FASTA with isolated population SNPs.
Haplotype database. An additional haplotype database is built and used to augment the reference FASTA with population variants. A multigenome based mapper algorithm is used to score read alignment according to the variants in this database.
The DRAGEN pangenome hashtables are available to download from the .
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.
--tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line. --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# TMB
--enable-tmb true
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-normal
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-enable-umi-solid true #>= 1% VAF
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--Aligner.unclip-score
unclip-score
--Aligner.no-unclip-score
no-unclip-score
--Aligner.aln-min-score
aln-min-score
--Aligner.min-score-coeff
min-score-coeff
match-score option specifies the score for a read nucleotide matching a reference nucleotide (A, C, G, or T), or matching a reference 2–3 nucleotide IUPAC-IUB code. Its value is an unsigned integer, from 0 to 15. match_score=0 can only be used when global=1. A higher match score results in longer alignments, and fewer long insertions.match-n-score The match-n-score option specifies the score for an aligned position where the read position and/or the reference position is an N code. This option is a signed integer, from -16 to 15.
mismatch-pen The mismatch-pen option is the penalty (negative score) for a read nucleotide mismatching any reference nucleotide or IUPAC-IUB code, except N. This option is an unsigned integer, from 0 to 63. A higher mismatch penalty results in alignments with more insertions, deletions, and clipping to avoid SNPs.
gap-open-pen The gap-open-pen option is the penalty (negative score) for opening a gap (ie, an insertion or deletion). This value is only for a 0-base gap. It is always added to the gap length times gap-ext-pen. This option is an unsigned integer, from 0 to 127. A higher gap open penalty causes fewer insertions and deletions of any length in alignment CIGARs, with clipping or alignment through SNPs used instead.
gap-ext-pen The gap-ext-pen option is the penalty (negative score) for extending a gap (ie, an insertion or deletion) by one base. This option is an unsigned integer, from 0 to 15. A higher gap extension penalty causes fewer long insertions and deletions in alignment CIGARs, with short indels, clipping, or alignment through SNPs used instead.
unclip-score The unclip-score option is the score bonus for an alignment reaching the beginning or end of the read. An end-to-end alignment receives twice this bonus. This option is an unsigned integer, from 0 to 127. A higher unclipped bonus causes alignment to reach the beginning and/or end of a read more often, where this can be done without too many SNPs or indels. A nonzero unclip-score is useful when global=0 to make a soft preference for unclipped alignments. Unclipped bonuses have little effect on alignments when global=1, because end-to-end alignments are forced anyway (although 2 × unclip-score does add to every alignment score unless no-unclip-score = 1). Note that, especially with longer reads, setting unclip-score much higher than gap-open-pen can have the undesirable effect of insertions at or near one end of a read being utilized as pseudoclipping, as happens with global=1
no-unclip-score The no-unclip-score option can be 0 or 1. The default is 1. When no-unclip-score is set to 1, any unclipped bonus (unclip-score) contributing to an alignment is removed from the alignment score before further processing, such as comparison with aln-min-score, comparison with other alignment scores, and reporting in AS or XS tags. However, the unclipped bonus still affects the best-scoring alignment found by Smith-Waterman alignment to a given reference segment, biasing toward unclipped alignments When unclip-score > 0 causes a Smith-Waterman local alignment to extend out to one or both ends of the read, the alignment score stays the same or increases if no-unclip-score=0, whereas it stays the same or decreases if no-unclip-score=1. The default, no-unclip-score=1, is recommended when global=1, because every alignment is end-to-end, and there is no need to add the same bonus to every alignment. When changing no-unclip-score, consider whether aln-min-score should be adjusted. When no-unclip-score=0, unclipped bonuses are included in alignment scores compared to the aln-min-score floor, so the subset of alignments filtered out by aln-min-score can change significantly with no-unclip-score.
aln-min-score The aln-min-score option specifies a minimum acceptable alignment score. Any alignment results scoring lower are discarded. Increasing or decreasing aln-min-score can reduce or increase the percentage of reads mapped. This option is a signed integer (negative alignment scores are possible with global=0). aln-min-score also affects MAPQ estimates. The primary contributor to MAPQ calculation is the difference between the best and second-best alignment scores. A read's best alignment score is saved in the AS SAM tag, and the second-best score (if available) is saved in the XS tag. aln-min-score serves as the suboptimal alignment score if nothing higher was found except the best score. Therefore, increasing aln-min-score can decrease reported MAPQ for some low-scoring alignments. You can use the min-score-coeff option to adjust aln-min-score as a function of read length.
min-score-coeff The min-score-coeff option makes adjustments to aln-min-score per read base. When using the min-score-coeff and aln-min-score options together, you can define the minimum alignment score for each read as an affine function of read lengths. The minimum score for an N-base read is calculated as follows: (min-score-coeff)\*N+(aln-min-score) The min-score-coeff option is an integer ranging from –64 to 63.999. If the value is 0, then the minimum alignment score is fixed at aln-min-score for all read length. You can use positive values for min-score-coeff to allow shorter reads to match with lower alignment scores, but require longer reads to achieve higher scores.
unpaired-pen For paired end reads, best mapping positions are determined jointly for each pair, according to the largest pair score found, considering the various combinations of alignments for each mate. A pair score is the sum of the two alignment scores minus a pairing penalty, which estimates the unlikelihood of insert lengths further from the mean insert than this aligned pair. The unpaired-pen option specifies how much alignment pair scores should be penalized when the two alignments are not in properly paired position or orientation. This option also serves as the maximum pairing penalty for properly paired alignments with extreme insert lengths. The unpaired-pen option is specified in Phred scale, according to its potential impact on MAPQ. Internally, it is scaled into alignment score space based on Smith-Waterman scoring parameters.
pe-max-penalty
sec-phred-delta The sec-phred-delta option controls which secondary alignments are emitted based on the alignment score relative to the primary reported alignment. Only secondary alignments with likelihood within this Phred value of the primary are reported.
sec-aligns-hard The sec-aligns-hard option suppresses the output of all secondary alignments if there are more secondary alignments than can be emitted. Set sec-aligns-hard to 1 to force the read to be unmapped when not all secondary alignments can be output.
supp-as-sec When the supp-as-sec option is set to 1, then supplementary (chimeric) alignments are reported with SAM FLAG 0x100 instead of 0x800. The default is 0. The supp-as-sec option provides compatibility with tools that do not support FLAG 0x800.
hard-clips The hard-clips option is used as a field of 3 bits, with values ranging from 0 to 7. The bits specify alignments, as follows:
Bit 0--primary alignments
Bit 1--supplementary alignments
Bit 2--secondary alignments
False-positive variant calls from misaligned reads matching ALT contigs are eliminated.
Low or zero aligned coverage in primary assembly regions covered by alternate haplotypes, due to some reads mapping to ALT contigs.
Low or zero MAPQ in regions covered by alternate haplotypes, where they are similar or identical to the primary assembly.
Variant calling sensitivity is dramatically reduced throughout regions covered by alternate haplotypes.
Mapping with ALT contigs and ALT awareness:
False-positive variant calls from misaligned reads matching ALT contigs are eliminated.
Normal aligned coverage in regions covered by alternate haplotypes because primary alignments are to the primary assembly.
Normal MAPQs are assigned because alignment candidates in alternative haplotypes are not considered in competition.
Good mapping and variant calling sensitivity where reads matching an ALT contig differ greatly from the primary assembly.
--Mappper.seed-density
seed-density
-Mapper.edit-mode
edit-mode
--Mapper.edit-seed-num
edit-seed-num
--Mapper.edit-read-len
edit-read-len
--Mapper.edit-chain-limit
edit-chain-limit
0
No editing (default)
1
Chain length test
2
Paired chain length test
3
Full seed editing
--Aligner.global
global
--Aligner.match-score
match-score
--Aligner.match-n-score
match-n-score
--Aligner.mismatch-pen
mismatch-pen
--Aligner.gap-open-pen
gap-open-pen
--Aligner.gap-ext-pen
gap-ext-pen
Initial paired-end statistics detected for read group RGID, based on 39042 high quality pairs for FR orientation
Quartiles (25 50 75) = 398 409 420
Mean = 410.192
Standard deviation = 14.1254
NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN)
Skew-normal insert distribution applied:
Position (xi) = 424.084
Scale (omega) = 19.8719
Shape (alpha) = -1.88125
To rerun with identical insert stats, specify:
--Aligner.pe-stat-mean-insert=424.084
--Aligner.pe-stat-stddev-insert=19.8719
--Aligner.pe-stat-shape-insert=-1.88125
--Aligner.pe-stat-quartiles-insert="398 409 420"
--Aligner.pe-stat-mean-read-len=101 #Sample: sample name
FragmentLength,CountWARNING: Less than 28 high quality pairs found - standard deviation is
calculated from the small samples formula if samples < 3 then
standard deviation = 10000
else if samples < 28 then
standard deviation = 25 * (standard deviation + 1) / (samples - 2)
end if
if standard deviation < 12 then
standard deviation = 12
end if Rescue radius = 220
Effective rescue sigmas = 0.5
WARNING: Default rescue sigmas value of 2.5 was overridden by host software!
The user may wish to set rescue sigmas value explicitly with --Aligner.rescue-sigmas--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. Germline-aware Mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see .
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection:
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
For more information see: UMI Options.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. .
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
For more information, see Structural Variant Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
For more information see: UMI Options.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. .
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
For more information, see Structural Variant Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
For more information see: UMI Options.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
--tmb-vaf-threshold FLOAT
Variant minimum allele frequency for usable variants. Default=0.05. Set to 0.002 for ctDNA.
--vc-callability-tumor-thresh
Required read coverage to use a site. Default=50. Set to 1000 for ctDNA.
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 500
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.02
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
For more information, see Structural Variant Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
In the TN pipeline this must be set to false for BAM/CRAM input.
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
--enable-duplicate-marking true
By default, DRAGEN marks duplicate reads and exclude them from variant calling.
--enable-positional-collapsing true
Alternative to --enable-duplicate-marking=true. Instead of discarding duplicate reads, DRAGEN can optionally perform positional collapsing, merging them into higher-quality consensus reads. This is beneficial for small panels without UMIs and coverage between 300X and 1000X. However, it's slower than standard duplicate marking and less effective on samples with coverage lower than 300X. For very high coverage (1000X+), avoid it due to potential read collisions. For high-sensitivity panels with 1000X+ coverage, consider using UMIs.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.). When working with panels it is recommended that a custom systematic noise file be created for each assay.
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-normal-cnv-vcf $CNV_NORMAL_VCF
Specify germline CNVs from the matched normal sample. .
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
For panels it is recommended to post-process the file by intersecting the WES or WGS sites with the manifest. This will avoid using any off-target reads in the MSI analysis. For small panels it may be required to generate custom site files to ensure the panel covers at least 2000 sites. To generate custom MSI site files refer to the MSI Biomarker section in the user guide.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-enable-liquid-tumor-mode true
DRAGEN can account for Tumor-in-Normal (TiN) contamination by running liquid tumor mode.
--sv-tin-contam-tolerance $TIN_CONTAM_TOLERANCE
Set the Tumor-in-Normal (TiN) contamination tolerance level.
For more information, see Structural Variant Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For panels we create GVCF files. Gather the full paths to the small variant caller hard filtered GVCFs (not VCFs) from step 1 and create an input file ${GVCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
For DRAGEN somatic runs it is recommended to use the linear hashtable.
See: Product Files
DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using BCL conversion.
FQ list Input
FQ Input
BAM Input
CRAM Input
--enable-map-align true
Optionally disable map & align (default=true).
--enable-map-align-output true
Optionally save the output BAM (default=false).
--Aligner.clip-pe-overhang 2
Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run.
DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).
Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.
--enable-fractional-down-sampler
Set to true to enable fractional downsampling. The default value is false.
--down-sampler-normal-subsample
Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).
--down-sampler-tumor-subsample
Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).
--down-sampler-random-seed
Specify the random seed for different runs of the same input data. The default value is 42.
--umi-source STRING
Specify the input type for the UMI sequence. Options: qname, fastq, bamtag.
--umi-library-type STRING
Set the batch option for different UMIs correction. Options: random-duplex, random-simplex, nonrandom-duplex.
--umi-nonrandom-whitelist $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.
--umi-correction-table $PATH
If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: <INSTALL_PATH>/resources/umi/umi_correction_table.txt.gz.
--umi-min-supporting-reads INT
Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA).
--umi-metrics-interval-file $BED
Target region in BED format.
For more information see: UMI Options.
--vc-target-bed
Limit variant calling to region of interest.
--vc-combine-phased-variants-distance INT
Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is [0; 15] BP (Default=2)
--vc-emit-ref-confidence GVCF
To enable gVCF output.
--vc-enable-vcf-output
To enable VCF file output during a gVCF run, set to true. The default value is false.
--vc-systematic-noise $PATH
Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).
--vc-somatic-hotspots $PATH
DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.
High-coverage sequencing panels allow for the detection of low-frequency alleles. DRAGEN supports 3 main settings for improved sensitivity on low VAF variant calls.
--vc-target-vaf FLOAT
The default is 0.03 (3%). Set to e.g. 0.01 to improve SNV sensitivity on 1% VAF variants (assuming sufficient coverage).
--vc-enable-umi-solid true
Optimized for 1% and higher VAFs on UMI (or read position collapsed) samples with approx 300-1000X coverage.
--vc-enable-umi-liquid true
Optimized for 0.1% and higher VAFs on UMI samples with 1000X or higher coverage as expected in liquid biopsies.
For more detail on the small variant caller in somatic mode please refer to Somatic Mode
--enable-hla
Enable HLA typer (this setting by default will only genotype class 1 genes)
--hla-as-filter-min-threshold
Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.
--hla-as-filter-ratio-threshold
Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels.
--hla-enable-class-2
Extend genotyping to HLA class 2 genes (default=true).
--cnv-enable-gcbias-correction true
Enable or disable GC bias correction when generating target counts.
--cnv-segmentation-mode $SEG_MODE
Option to override the default segmentation algorithm. Defaults include slm for germline WGS, aslm for somatic WGS, and hslm for targeted analysis.
--cnv-segmentation-bed $PATH
If you are using somatic targeted panels with a set of genes supplied with the capture kit, then you can bypass segmentation by specifying a cnv-segmentation-bed and using cnv-segmentation-mode=bed.
--cnv-population-b-allele-vcf $POP_VCF
Specify a population SNP VCF. This option is available for both the germline and the somatic workflows. In germline it is only supported for WGS. In somatic, it can be used when a matched normal sample is not available and analysis must be performed in tumor-only mode.
For more information, see CNV Calling.
For instructions on how to download the Nirvana annotation database, please refer to Nirvana
The Tumor-Normal pipeline is more effective than the Tumor-Only pipeline at removing or tagging germline variants. The Tumor-Only may subsequently report somewhat elevated TMB values. The TMB proxi filter is an optional setting on top of the regular database germline filter. It will aggressively filter additional germline variants based on allele frequencies.
--tmb-vaf-threshold FLOAT
Variant mininum allele frequency for usable variants (default=0.05)
--vc-callability-tumor-thresh INT
Required read coverage to use a site (default=50).
--tmb-enable-proxi-filter BOOL
Use variant vaf information to increase germline filtering. Recommended for TO, but not for TN. May be overly aggressive at tagging variants as germline (default=false).
See the user guide: TMB Germline Variants.
Microsatellite sites file can be downloaded here: Product Files.
--msi-coverage-threshold INT
Minimum coverage for a microsatellite: 60 (default)
--msi-distance-threshold FLOAT
Minimum Jensen-Shannon distance between tumor and normal for a microsatellite: 0.1 (default)
--sv-call-regions-bed
Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.
--sv-exclusion-bed
Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.
--enable-variant-deduplication true
Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by --output-file-prefix followed by sv.small_indel_dedup. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases.
--sv-systematic-noise $BEDPE
Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES/Panels.
--sv-somatic-ins-tandup-hotspot-regions-bed $BED
Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)
--sv-min-candidate-variant-size
Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.
--sv-min-scored-variant-size $INT
100000
For more information, see Structural Variant Calling.
DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.
Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.
DRAGEN has pre-build systematic noise files for WES/WGS. For high sensitivity applications, including panels or clinical WES/WGS assays, it is recommended to create your own systematic noise file as described under Custom.
Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: Product Files.
WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FF
FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WGS FFPE (only hg38)
WES_hg38_v2.0.0_systematic_noise.snv.bed.gz
For WES FF and FFPE
This section describes how to generate systematic noise files from phenotypically normal (non-tumor) samples to optimize the performance of a specific assay. For best accuracy, the normal samples should ideally closely match the sequencer, sample type, library prep, and coverage of the tumor samples of interest. It is typically recommended to use 30-70 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on each of approximately 30-70 normal samples.
For WES and WGS pipelines gather the full paths to the small variant hard filtered VCFs (not GVCFs) from step 1 and create a lines file ${VCF_LIST} by specifying 1 file per line.
Step 2. Generate the final noise file.
This step generates a bed file containing mean and max noise estimates per position. This can be used directly during variant calling (argument --vc-systematic-noise). The distribution of noise per position can also be plotted to identify particularly noisy positions that could be troubleshooted (e.g. modify assay settings or DRAGEN settings) or blocklisted
The SNV systematic noise files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
SV systematic noise files are recommended for WGS workflows. Prebuilt WGS noise files are available. SV systematic noise files are considered experimental in WES and Panels.
Prebuilt WGS SV systematic noise files can be downloaded here: Product Files.
WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, FF/FFPE
IDPF_WGS_hg38_v3.0.0_systematic_noise.sv.bedpe.gz
For WGS, HEME
Custom systematic noise files can be generated for WES or Panels. For best accuracy the normal samples should ideally closely match the sequencer, sample type, library prep and coverage of the tumor samples of interest. It is typically recommended to use 30 - 100 normals when building a noise file, but fewer can be used.
Step 1. Run DRAGEN somatic tumor-only on normal samples with --sv-detect-systematic-noise set to true to generate VCF output per normal sample.
Step 2. Build the BEDPE file using input VCFs from previous step.
Systematic noise BEDPE files can also be built in the cloud using the DRAGEN Baseline Builder App on BaseSpace or the DRAGEN Systematic Noise File Builder Pipeline on ICA.
For CNV PON requirements and generation options see CNV Preprocessing | Panel of Normals.
If a matched normal is available it is recommended to include it in the PON.
Step 1. Generate CNV target counts of individual normal samples.
Any samples that should not be included in the final PON file can be excluded from this step. Any options used for CNV target counts generation (BED file, GC Bias Correction, etc.) should be matched when processing the case samples.
Step 2. CNV combined counts file generation.
$CNV_NORMALS_LIST is a text file with one line for each path to a CNV target counts file generated in step 1 (either <output-file-prefix>.target.counts.gz or <output-file-prefix>.target.counts.gc-corrected.gz). Individual target counts files are merged into a single <output-file-prefix>.combined.counts.txt.gz PON file in the output directory. The PON file is used for each case sample run of DRAGEN CNV using the --cnv-combined-counts option.
--tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-emit-ref-confidence BP_RESOLUTION
--vc-enable-vcf-output true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${GVCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-emit-ref-confidence BP_RESOLUTION
--vc-enable-vcf-output true
--vc-enable-umi-solid true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${GVCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--tumor-normal-has-umi STRING #Sample(s) containing UMI ['tumor', 'both'].
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-sq-filter-threshold 17.5 #recommended in tumor-normal UMI mode
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# TMB
--enable-tmb true
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-normal
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--tumor-normal-has-umi STRING #Sample(s) containing UMI ['tumor', 'both'].
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-enable-umi-solid true #>= 1% VAF
--vc-sq-filter-threshold 17.5 #recommended in tumor-normal UMI mode
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# TMB
--enable-tmb true
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting
# Microsatellite Instability (MSI)
--msi-command tumor-normal
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-emit-ref-confidence BP_RESOLUTION
--vc-enable-vcf-output true
--vc-enable-umi-solid true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${GVCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-enable-umi-liquid true #>= 0.1% VAF
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--vc-callability-tumor-thresh 1000
--tmb-vaf-threshold 0.002
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-emit-ref-confidence BP_RESOLUTION
--vc-enable-vcf-output true
--vc-enable-umi-liquid true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${GVCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
--enable-duplicate-marking true #default=true
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Recommended
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
--vc-target-vaf 0.03 #Default = 0.03 (>= 3% VAF)
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-use-somatic-vc-baf true
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# Annotation
--variant-annotation-data PATH
--enable-variant-annotation true
# TMB
--enable-tmb true
# HLA genotyper
--enable-hla true
--hla-as-filter-min-threshold 29.0 #panel specific setting
--hla-as-filter-ratio-threshold 0.85 #panel specific setting
# Microsatellite Instability (MSI)
--msi-command tumor-normal
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING
--fastq-list $PATH
--fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING
--fastq-file1 $PATH
--fastq-file2 $PATH
--RGSM $STRING
--RGID $STRING --tumor-bam-input $PATH
--bam-input $PATH --tumor-cram-input $PATH
--cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-emit-ref-confidence BP_RESOLUTION
--vc-enable-vcf-output true
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${GVCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
# Inputs
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# Mapper
--enable-map-align true #optional with BAM/CRAM input
--enable-map-align-output true #optionally save the output BAM
--enable-sort true #default=true
# UMI
--umi-enable true
--umi-source STRING #Default='qname'
--umi-library-type STRING #e.g. random-duplex
--umi-min-supporting-reads 1 #Default=2
# Small variant caller
--enable-variant-caller true
--vc-target-bed $VC_TARGET_BED
--vc-systematic-noise $PATH #Required
--vc-excluded-regions-bed $BED #FFPE: optionally mask ALUs
# SV
--enable-sv true
--sv-exome true
--sv-call-regions-bed $SV_TARGET_BED
# CNV
--enable-cnv true
--cnv-population-b-allele-vcf $POP_VCF
--cnv-target-bed $PATH
--cnv-combined-counts $PATH #CNV PON
# HRD Scoring
--enable-hrd true #requires CNV
# Annotation
--variant-annotation-data PATH
--vc-enable-germline-tagging true
# TMB
--enable-tmb true
--tmb-enable-proxi-filter true #Optional for Tumor-Only
# HLA genotyper
--enable-hla true
# Microsatellite Instability (MSI)
--msi-command tumor-only
--msi-ref-normal-input $PATH
--msi-microsatellites-file $PATH
--msi-coverage-threshold 40 --tumor-fastq-list $PATH
--tumor-fastq-list-sample-id $STRING --tumor-fastq1 $PATH
--tumor-fastq2 $PATH
--RGSM-tumor $STRING
--RGID-tumor $STRING --tumor-bam-input $PATH --tumor-cram-input $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--vc-detect-systematic-noise=true
--vc-target-bed $VC_TARGET_BED #Region assessed in assay
--vc-target-bed-padding 500
--vc-enable-germline-tagging=true
--variant-annotation-data $PATH
--intermediate-results-dir $PATH
--output-directory $PATH
--output-file-prefix $STRING
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--build-sys-noise-vcfs-list ${VCF_LIST}
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
--umi-enable true
--umi-source STRING #default='qname'
--umi-library-type STRING #see 'UMI'
--sv-detect-systematic-noise true
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--sv-build-systematic-noise-vcfs-list $VCF_LIST#one VCF per line.
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--tumor-fastq-list $PATH #see 'Input Options' for FQ, BAM or CRAM
--tumor-fastq-list-sample-id $STRING
# CNV
--enable-cnv true
--cnv-target-bed $PATH
/opt/dragen/$VERSION/bin/dragen #DRAGEN install path
--ref-dir $REF_DIR #path to DRAGEN linear hashtable
--output-directory $OUTPUT
--intermediate-results-dir $PATH #e.g. SSD /staging
--output-file-prefix $PREFIX
--enable-cnv true
--cnv-generate-combined-counts true
--cnv-normals-list $CNV_NORMALS_LIST --umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 2. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--vc-enable-liquid-tumor-mode true
Tumor-in-normal contamination. Only use if there is some tumor leakage in the normal control.
--vc-override-tumor-pcr-params-with-normal false
Mixed sample preparation. Only use if the tumor and normal samples exhibit different PCR (indel) noise patterns, e.g., due to using different sample preparation.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--tumor-normal-has-umi STRING
Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--umi-emit-multiplicity both
Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see Merge Duplex UMIs.
--umi-start-mask-length INT
Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.
--umi-end-mask-length INT
Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.
--tumor-normal-has-umi STRING
Specify if only the tumor, or if both the tumor and normal have UMIs. Options: 'both','tumor'.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 4. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
--vc-sq-filter-threshold $NUM
Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 17.5. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold $INT
Threshold for sensitivity-specificty tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-systematic-noise-filter-threshold-in-hotspot $INT
Threshold for sensitivity-specificty tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Default value = 10. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.
--vc-excluded-regions-bed $BED
Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: Bed File Collection
--sv-min-scored-variant-size
After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.
Before a reference genome can be used with DRAGEN, it must be converted from FASTA format into a custom binary format for use with the DRAGEN hardware. The options used in this preprocessing step offer tradeoffs between performance and mapping quality.
Pre-built DRAGEN reference genomes are available for download in the Illumina customer portal. If you find that performance and mapping quality with these are adequate, there is a good chance that you can simply work with these supplied reference genomes. Depending on your read lengths and other particular aspects of your application, you may be able to improve mapping quality and/or performance by tuning the reference preprocessing options.
The DRAGEN mapper extracts many overlapping seeds (subsequences or K-mers) from each read, and looks up those seeds in a hash table residing in memory on its PCIe card, to identify locations in the reference genome where the seeds match. Hash tables are ideal for extremely fast lookups of exact matches. The DRAGEN hash table must be constructed from a chosen reference genome using the --build-hash-table option, which extracts many overlapping seeds from the reference genome, populates them into records in the hash table, and saves the hash table as a binary file.
DRAGEN will attempt to detect the provided reference in order to automatically apply recommended resources and settings. There are four human references that DRAGEN can detect: hg38, hg19, hs37d5, and chm13v2. DRAGEN is able to detect references that contain a subset of the primary contigs from one of these references, as long as the names and lengths of the detected contigs are consistent with the names and lengths from the standarad assemblies of these references.
In detail, automatic reference detection operates as follows:
We define a primary contig of a human genome to be an autosome (1-22) or sex chromosome (X,Y). Let F be the input fasta. For each reference genome R in hg38, hg19, hs37d5, and chm13v2, DRAGEN checks if there are any contigs in F that have the same name and length as a primary contig in R, and that there are no contigs in F that have the same name as a contig in R, but with different length. If these conditions hold for exactly one of hg38, hg19, hs37d5, and chm13v2, then that reference is detected and resources may be applied automatically.
The DRAGEN hash table builder will automatically apply decoy contigs and mask bed files to detected reference. Other pipelines may also apply automatic resources. For example variant callers may apply machine learning models and target bed files.
In order for DRAGEN to correctly detect the provided reference, it is important to use the standard naming conventions for each of the four human assemblies that DRAGEN detects:
The size of the DRAGEN hash table is proportionate to the number of seeds populated from the reference genome. The default is to populate a seed starting at every position in the reference genome, ie, roughly 3 billion seeds from a human genome. This default requires at least 32 GB of memory on the DRAGEN PCIe board.
To operate on larger, nonhuman genomes or to reduce hash table congestion, it is possible to populate less than all reference seeds using the --ht-ref-seed-interval option to specify an average reference interval. The default interval for 100% population is --ht-ref-seed-interval 1, and 50% population is specified with --ht-ref-seed-interval 2. The population interval does not need to be an integer. For example, --ht-ref-seed-interval 1.2 indicates 83.3% population, with mostly 1-base and some 2-base intervals to achieve a 1.2 base interval on average.
It is characteristic of hash tables that they are allocated a certain size, but always retain some empty records, so they are less than 100% occupied. A healthy amount of empty space is important for quick access to the DRAGEN hash table. Approximately 90% occupancy is a good upper bound. Empty space is important because records are pseudo-randomly placed in the hash table, resulting in an abnormally high number of records in some places. These congested regions can get quite large as the percentage of empty space approaches zero, and queries by the DRAGEN mapper for some seeds can become increasingly slow.
The hash table is populated with reference seeds of a single common length. This primary seed length is controlled with the --ht-seed-len option, which defaults to 21.
The longest primary seed supported is 27 bases when the table is 8 GB to 31.5 GB in size. Generally, longer seeds are better for run time performance, and shorter seeds are better for mapping quality (success rate and accuracy). A longer seed is more likely to be unique in the reference genome, facilitating fast mapping without needing to check many alternative locations. But a longer seed is also more likely to overlap a deviation from the reference (variant or sequencing error), which prevents successful mapping by an exact match of that seed (although another seed from the read may still map), and there are fewer long seed positions available in each read.
Longer seeds are more appropriate for longer reads, because there are more seed positions available to avoid deviations.
Seed Length Recommendations
Due to repetitive sequences, some seeds of any given length match many locations in the reference genome. DRAGEN uses a unique mechanism called seed extension to successfully map such high-frequency seeds. When the software determines that a primary seed occurs at many reference locations, it extends the seed by some number of bases at both ends, to some greater length that is more unique in the reference.
For example, a 21-base primary seed may be extended by 7 bases at each end to a 35-base extended seed. A 21-base primary seed may match 100 places in the reference. But 35-base extensions of these 100 seed positions may divide into 40 groups of 1-3 identical 35-base seeds. Iterative seed extensions are also supported, and are automatically generated when a large set of identical primary seeds contains various subsets that are best resolved by different extension lengths.
The maximum extended seed length, by default equal to the primary seed length plus 128, can be controlled with the --ht-max-ext-seed-len option. For example, for short reads, it is advisable to set the maximum extended seed shorter than the read length, because extensions longer than the whole read can never match.
It is also possible to tune how aggressively seeds are extended using the following options (advanced usage):
--ht-cost-coeff-seed-len
--ht-cost-coeff-seed-freq
--ht-cost-penalty
--ht-cost-penalty-incr
There is a tradeoff between extension length and hit frequency. Faster mapping can be achieved using longer seed extensions to reduce seed hit frequencies, or more accurate mapping can be achieved by avoiding seed extensions or keeping extensions short, while tolerating the higher hit frequencies that result. Shorter extensions can benefit mapping quality both by fitting seeds better between SNPs, and by finding more candidate mapping locations at which to score alignments. The default extension settings along with default seed frequency settings, lean aggressively toward mapping accuracy, with relatively short seed extensions and high hit frequencies.
The defaults for the seed frequency options are as follows:
One primary or extended seed can match multiple places in the reference genome. All such matches are populated into the hash table, and retrieved when the DRAGEN mapper looks up a corresponding seed extracted from a read. The multiple reference positions are then considered and compared to generate aligned mapper output. However, the DRAGEN software enforces a limit on the number of matches, or frequency, of each seed, which is controlled with the --ht-max-seed-freq option. By default, the frequency limit is 16. In practice, when the software encounters a seed with higher frequency, it extends it to a sufficiently long secondary seed that the frequency of any particular extended seed pattern falls within the limit. However, if a maximum seed extension would still exceed the limit, the seed is rejected, and not populated into the hash table. Instead, a single High Frequency record is populated.
This seed frequency limit does not tend to impact DRAGEN mapping quality notably, for two reasons. First, because seeds are rejected only when extension fails, only extremely high-frequency primary seeds, typically with many thousands of matches are rejected. Such seeds are not very useful for mapping. Second, there are other seed positions to check in a given read. If another seed position is unique enough to return one or more matches, the read can still be properly mapped. However, if all seed positions were rejected as high frequency, often this means that the entire read matches similarly well in many reference positions, so even if the read were mapped it would be an arbitrary choice, with very low or zero MAPQ.
Thus, the default frequency limit of 16 for --ht-max-seed-freq works well. However, it may be decreased or increased, up to a maximum of 256. A higher frequency limit tends to marginally increase the number of reads mapped (especially for short reads), but commonly the additional mapped reads have very low or zero MAPQ. This also tends to slow down DRAGEN mapping, because correspondingly large numbers of possible mappings are occasionally considered.
In addition to a frequency limit, a target seed frequency can be specified with --ht-target-seed-freq option. This target frequency is used when extensions are generated for high frequency primary seeds. Extension lengths are chosen with a preference toward extended seed frequencies near the target. The default of 4 for --ht-target-seed-freq means that the software is biased toward generating shorter seed extensions than necessary to map seeds uniquely.
When building a reference hash table from a fasta with ALT contigs, it may be desired to mask certain regions of high similarity, or to establish a liftover realtionships between primary and alternate contigs. The recommended approach is masking, as described in the Map-Align section. When hg19 or hg38 alt contigs are detected, the hash table builder will require a liftover file or a bed file to mask the alt contigs. If non are provided, a mask bed file from <INSTALL_PATH>/resources/ht_builder/ will be used automaticaly.
DRAGEN has adopted a masked approach to handle native reference ALT contigs, where strategic regions are masked to increased accuracy. The hash table builder will build the mapper hash table as if the regions that were specified in the argument for ht-mask-bed were masked with N's. The hash table builder will only allow setting one of ht-mask-bed or ht-alt-liftover. Each line in the bed file is expected to contain a contig name, start position (0-based), and end position (1-based), seperated by a single tab or space. Lines that start with # are ignored by the hash table builder to allow commenting. Any line with a contig name that is not found in the input fasta is skipped and logged to the DRAGEN log file. Likewise, lines that describe empty intervals are skipped. If all lines are skipped this way, the hash table builder will issue an error and abort, unless the mask bed file was automatically applied (see Automatic masking). The hash table builder will always issue an error and abort if an interval described in the BED file is outside of the range of the corresponding contig in the fasta. Lines that are not skipped are written to a file called mask.bed that will be present in the hash table output directory, and whose digest will appear in hash_table.cfg. This file is used when a reference is loaded to the FPGA card to dynamically mask reference.bin.
When running from a fasta for which hg38 or hg19 is detected (See Automatic Reference Detection), and no argument for ht-mask-bed or ht-alt-liftover was provided, the hash table builder will automatically apply the corresponding bed file for the detected reference from <INSTALL_PATH>/resources/ht_builder/. Note that the hash table builder will identify alt contigs by name. So when running from an input fasta that contains alt contig with standard names but modified base content, it is recommended to suppress automatic masking by setting ht-suppress-mask=true or by passing a custom mask bed file to ht-mask-bed.
The behavior of DRAGEN with respect to the handling of decoy contigs in the reference has changed since version 2.6.
Starting with DRAGEN 3.x, DRAGEN's hash table builder automatically detects the absence of the decoy contigs from the reference and adds it to the FASTA file, prior to building the hash table. The decoys file is found at <INSTALL_PATH>/resources/ht_builder/hs_decoys.fa.gz. If the reference is missing the decoy contigs, then the reads which map to the decoy contigs are artificially marked as unmapped in the output BAM (because the original reference does not have the decoy contig). This results in an artificially lower mapping rate, however, the accuracy of variant calling is improved thanks to removing false positive caused by decoy reads.
Illumina recommends using this feature by default. However, you can to set the --ht-suppress-decoys option to true to suppress adding these decoys to the hash table.
The table below describes the difference in behavior between older DRAGEN versions (2.6 and earlier) and DRAGEN 3.x versions with respect to the handling of decoy contigs in the hash table builder:
DRAGEN analysis is capable of mapping on a pangenome hash table. The pangenome hash table introduces alternate graph paths to the linear reference hash table to represent more broadly the allelic diversity of the population over the whole genome or in specific regions defined in a bed file. Gain on accuracy from this methodology has been described in scientific blogs available on the . Mutigenome hash tables for CHM13_v2, hg38, hg19 and hs37d5 assemblies are available on the .
See for information on the multigenome mapping method.
It is possible to build a custom pangenome reference in order to:
customize the released pangenome hash table with custom bed files or hash table builder options. A set of bed files are available in the resource files on the .
generate a population-specific-pangenome hash table from pangenome msVCF generated from the BSSH app.
generate a human or non-human pangenome hash table from customer-provided msVCF.
The input files required are a single multi-sample VCF file containing the set of population variants, and optionally bed files restricting graph to some region. The generated files, including hash_table.cmp and associated files in the specified output directory, can then be used as the reference hash table for the DRAGEN mapper. DRAGEN software supports the tool on human reference with files available on the . For non-human, the user provides the required resource files.
To enable the pangenome hash table builder, example command usage is :
dragen --build-hash-table true (required) --ht-graph-msvcf-file <path to a multi-sampple VCF file (required for pangenome reference) --ht-reference <reference.fasta> (required) --ht-graph-extra-kmer-bed < graph.bed> (optional) --ht-mask-bed <mask.bed> (optional) --ht-graph-exclusion-bed <exclusion bed> (optional) --output-directory <DIR> (required) [options]
The custom pangenome hash table builder tool uses a set of population variants provided by the user to generate a pangenome hash table. The variants must be specified in VCF format, in a single multi-sample VCF (msVCF) file containing the variants for a set of individuals. This multi-sample VCF file must have specific formatting described below.
The custom pangenome hash table builder tool only supports msVCF file input respecting the format described below:
msVCF compliant with 4.2 VCF format specification
with variants positionally sorted in the same contig order as the main FASTA reference genome provided in --ht-reference
records shall include diploid or haploid GT calls
supports multi-allelic variants merged in multi-line or separated in multiple lines
Note: INFO/FORMAT subfields must be defined in the header. Events with undefined subfields are ignored.
To build a high-performance custom genome it is highly recommended to use long read sequencing data. We recommend using external tools such as Whatshap (https://github.com/whatshap/whatshap) to generate phased input. DRAGEN analysis leverages the phasing information to reconstruct population haplotypes.
A reference genome in FASTA format must be provided. Reference genomes are available to download from the .
Note: the reference genome provided as input must be the same as the one used to generate the input phased msVCF. If the msVCF contains variants from regions not present in the fasta file, the pangenome reference builder will stop with an error.
This bed file is used to filter out regions of the msVCF file. Variants that fall within intervals defined in the "Graph exclusion bed" file will be ignored and not used in any part of the pangenome reference builder. The result will be the same as if the input msVCF did not contain any variants in the regions defined in the exclusion bed. The file is optional, by default every variants in the msVCF file will be used. Exclusion bed files are available to download from .
A custom exclusion bed file can also be provided given the following format: tab delimited with first three columns being: contig name, start position, end position. Any line with a contig name that is not found in the input FASTA is skipped. Any lines that describe empty intervals are skipped.
Note: records of the exclusion bed file provided must be from the same build as the reference genome used to build the pangenome reference.
This file is used to define regions in the genome where extra seeds will be indexed in the hash table. By default, only seed extracted from the primary reference will be extracted and saved in the reference hash table for mapping. This option will additionally generate seeds from population variants in the defined regions. It is recommended to include the expected difficult regions in this bed file. Extra-kmer-bed files are available to download from for the human hg38, hg19, hs37d5, and chm13 references.
An Extra-kmer-bed bed file can also be provided given the following format: tab delimited with first three columns being: contig name, start position, end position. Any line with a contig name that is not found in the input FASTA is skipped. Any lines that describe empty intervals are skipped.
Note: records of the Extra-kmer-bed file provided must be from the same build as the reference genome used to build the graph reference.
A mask bed file must be provided in order to mask certain regions of high similarity between primary and alternate contigs present in the main genome FASTA. Mask bed files are available to download from the .
A custom mask bed file can also be provided given the following format: tab delimited with first three columns being: contig name, start position, end position. Any line with a contig name that is not found in the input FASTA is skipped. Any lines that describe empty intervals are skipped.
Note: records of the mask bed file provided must be from the same build as the reference genome used to build the graph reference.
Note: The custom graph reference hash table end to end pipeline will return an error if options --ht-alt-liftover or --ht-allow-mask-and-liftover are specified.
The hash table builder generates the following outputs:
Use the --build-hash-table option to transform a reference FASTA into the hash table for DRAGEN mapping. It takes as input a FASTA file (multiple reference sequences being concatenated) and a preexisting output directory. Build command usage is as follows:
The --ht-reference and --output-directory options are required for building a hash table. The --ht‑reference option specifies the path to the reference FASTA file, while --output-directory specifies a preexisting directory where the hash table output files are written. Illumina recommends organizing various hash table builds into different folders. As a best practice, folder names should include any nondefault parameter settings used to generate the contained hash table. The sequence names in the reference FASTA file must be unique.
While masking is the recommended approach to dealing with ALT contigs, DRAGEN also supports a liftover based method. To enable liftover based ALT-aware mapping in DRAGEN, build the hash table with a liftover file by using the --ht-alt-liftover option. The hash table builder classifies each reference sequence as primary or alternate based on the liftover file, and packs primaries before alternates in reference.bin. SAM liftover files for hg38DH and hg19 are in the <INSTALL_PATH>/resources/ht_builder folder.
Custom Liftover Files
Custom liftover files can be used in place of those provided with DRAGEN. Liftover files must be SAM format, but no SAM header is required. SEQ and QUAL fields can be omitted ('*'). Each alignment record should have an alternate haplotype reference sequence name as QNAME, indicating the RNAME and POS of its liftover alignment in a destination (normally primary assembly) reference sequence.
Reverse-complemented alignments are indicated by bit 0x10 in FLAG. Records flagged unmapped (0x4) or secondary (0x100) are ignored. The CIGAR may include hard or soft clipping, leaving parts of the ALT contig unaligned.
A single reference sequence cannot serve as both an ALT contig (appearing in QNAME) and a liftover destination (appearing in RNAME). Multiple ALT contigs can align to the same primary assembly location. Multiple alignments can also be provided for a single ALT contig (extras optionally be flagged 0x800 supplementary), such as to align one portion forward and another portion reverse-complemented. However, each base of the ALT contig only receives one liftover image, according to the first alignment record with an M CIGAR operation covering that base.
SAM records with QNAME missing from the reference genome are ignored, so that the same liftover file may be used for various reference subsets, but an error occurs if any alignment has its QNAME present but its RNAME absent.
The --ht-seed-len option specifies the initial length in nucleotides of seeds from the reference genome to populate into the hash table. At run time, the mapper extracts seeds of this same length from each read, and looks for exact matches (unless seed editing is enabled) in the hash table.
The maximum primary seed length is a function of hash table size. The limit is k=27 for table sizes from 16 GB to 64 GB, covering typical sizes for whole human genome, or k=26 for sizes from 4 GB to 16 GB.
The minimum primary seed length depends mainly on the reference genome size and complexity. It needs to be long enough to resolve most reference positions uniquely. For whole human genome references, hash table construction typically fails with k < 16. The lower bound may be smaller for shorter genomes, or higher for less complex (more repetitive) genomes. The uniqueness threshold of --ht-seed-len 16 for the 3.1Gbp human genome can be understood intuitively because log4(3.1 G) ≈ 16, so it requires at least 16 choices from 4 nucleotides to distinguish 3.1 G reference positions.
For read mapping to succeed, at least one primary seed must match exactly (or with a single SNP when edited seeds are used). Shorter seeds are more likely to map successfully to the reference, because they are less likely to overlap variants or sequencing errors, and because more of them fit in each read. So for mapping accuracy, shorter seeds are mainly better.
However, very short seeds can sometimes reduce mapping accuracy. Very short seeds often map to multiple reference positions, and lead the mapper to consider more false mapping locations. Due to imperfect modeling of mutations and errors by Smith-Waterman alignment scoring and other heuristics, occasionally these noise matches may be reported. Run time quality filters such as --Aligner.aln_min_score can control the accuracy issues with very short seeds.
Shorter seeds tend to slow down mapping, because they map to more reference locations, resulting in more work such as Smith-Waterman alignments to determine the best result. This effect is most pronounced when primary seed length approaches the reference genome's uniqueness threshold, eg, K=16 for whole human genome.
Read Length---Generally, shorter seeds are appropriate for shorter reads, and longer seeds for longer reads. Within a short read, a few mismatch positions (variants or sequencing errors) can chop the read into only short segments matching the reference, so that only a short seed can fit between the differences and match the reference exactly. For example, in a 36 bp read, just one SNP in the middle can block seeds longer than 18 bp from matching the reference. By contrast, in a 250 bp read, it takes 15 SNPs to exceed a 0.01% chance of blocking even 27 bp seeds.
Paired Ends---The use of paired end reads can make longer seeds yield good mapping accuracy. DRAGEN uses paired end information to improve mapping accuracy, including with rescue scans that search the expected reference window when only one mate has seeds mapping to a given reference region. Thus, paired end reads have essentially twice the opportunity for an exact matching seed to find their correct alignments.
Variant or Error Rate---When read differences from the reference are more frequent, shorter seeds may be required to fit between the difference positions in a given read and match the reference exactly.
Mapping Percentage Requirement---If the application requires a high percentage of reads to be mapped somewhere (even at low MAPQ), short seeds may be helpful. Some reads that do not match the reference well anywhere are more likely to map using short seeds to find partial matches to the reference.
The --ht-max-ext-seed-len option limits the length of extended seeds populated into the hash table. Primary seeds (length specified by --ht-seed-len) that match many reference positions can be extended to achieve more unique matching, which may be required to map seeds within the maximum hit frequency (--ht-max-seed-freq).
Given a primary seed length k, the maximum seed length can be configured between k and k+128. The default is the upper bound, k+128.
The --ht-max-ext-seed-len option is recommended for short reads, eg, less than 50 bp. In such cases, it is helpful to limit seed extension to the read length minus a small margin, such as 1-4 bp. For example, with 36 bp reads, setting --ht-max-ext-seed-len to 35 might be appropriate. This ensures that the hash table builder does not plan a seed extension longer than the read causing seed extension and mapping to fail at run time, for seeds that could have fit within the read with shorter extensions.
While seed extension can be similarly limited for longer reads, eg, setting --ht-max-ext-seed-len to 99 for 100 bp reads, there is little utility in this because seeds are extended conservatively in any event. Even with the default k+128 limit, individual seeds are only extended to the lengths required to fit under the maximum hit frequency (--ht-max-seed-freq), and at most a few bases longer to approach the target hit frequency (‑‑ht‑target-seed-freq), or to avoid taking too many incremental extension steps.
The --ht-max-seed-freq option sets a firm limit on the number of seed hits (reference genome locations) that can be populated for any primary or extended seed. If a given primary seed maps to more reference positions than this limit, it must be extended long enough that the extended seeds subdivide into smaller groups of identical seeds under the limit. If, even at the maximum extended seed length (--ht-max-ext-seed-len), a group of identical reference seeds is larger than this limit, their reference positions are not populated into the hash table. Instead, a single High Frequency record is populated.
The maximum hit frequency can be configured from 1 to 256. However, if this value is too low, hash table construction can fail because too many seed extensions are needed. The practical minimum for a whole human genome reference, other options being default, is 8.
Generally, a higher maximum hit frequency leads to more successful mapping. There are two reasons for this. First, a higher limit rejects fewer reference positions that cannot map under it. Second, a higher limit allows seed extensions to be shorter, improving the odds of exact seed matching without overlapping variants or sequencing errors.
However, as with very short seeds, allowing high hit counts can sometimes hurt mapping accuracy. Most of the seed hits in a large group are not to the true mapping location, and occasionally one of these noise hits may be reported due to imperfect scoring models. Also, the mapper limits the total number of reference positions it considers, and allowing very high hit counts can potentially crowd out the actual best match from consideration.
Higher maximum hit frequencies slow down read mapping, because seed mapping finds more reference locations, resulting in more work, such as Smith-Waterman alignments, to determine the best result.
The DRAGEN Software enables the user to build a custom pangenome hash table from a set of population variants. The population variants are specified in a single multi-sample VCF file.
--ht-graph-msvcf-file: Input file containing list of population variants, in multi-sample VCF format.
This replaces the previous options that were previously used to build a graph Reference that are now deprecated.
List of deprecated options :
--ht-pop-alt-contigs: Population based alternate contigs FASTA.
--ht-pop-alt-liftover: Liftover SAM file of population alternate contigs.
--ht-pop-snps: Population based SNPs VCF
The following options control building hash tables from references with ALT-contigs. See References with ALT contigs for more information.
--ht-mask-bed: Set a custom BED file that defines which regions to mask. If not provided, the DRAGEN software automatically applies BED files for hg38 and hg19 from <INSTALL_PATH>/resources/ht_builder.
--ht-alt-liftover: Set a liftover file to build a liftover based ALT-aware hash table. SAM liftover files for hg38DH and hg19 are provided in <INSTALL_PATH>/resources/ht_builder.
--ht-allow-mask-and-liftover
--ht-decoys The DRAGEN software automatically detects the use of hg19 and hg38 references and adds decoys to the hash table when they are not found in the FASTA file. Use the --ht-decoys option to specify the path to a decoys file. The default is <INSTALL_PATH>/resources/ht_builder/hs_decoys.fa.gz.
--ht-suppress-decoys: Suppress automatic detection of the default decoys file when building the hash table.
--ht-num-threads The --ht-num-threads option determines the maximum number of worker CPU threads that are used to speed up hash table construction. The default for this option is 8, with a maximum of 32 threads allowed. If your server supports execution of more threads, it is recommended that you use the maximum. For example, the DRAGEN servers contain 24 cores that have hyperthreading enabled, so a value of 32 should be used. When using a higher value, adjust --ht-max-table-chunks needs to be adjusted as well. The servers have 128 GB of memory available.
--ht-max-table-chunks The --ht-max-table-chunks option controls the memory footprint during hash table construction by limiting the number of ~1 GB hash table chunks that reside in memory simultaneously. Each additional chunk consumes roughly twice its size (~2 GB) in system memory during construction. The hash table is divided into power-of-two independent chunks, of a fixed chunk size, X, which depends on the hash table size, in the range 0.5 GB < X ≤ 1 GB. For example, a 24 GB hash table contains 32 independent 0.75 GB chunks that can be constructed by parallel threads with enough memory and a 16 GB hash table contains 16 independent 1 GB chunks. The default is
--ht-mem-limit Memory Limit. The --ht-mem-limit option controls the generated hash table size by specifying the DRAGEN card memory available for both the hash table and the encoded reference genome. The ‑‑ht‑mem-limit option defaults to 32 GB when the reference genome approaches WHG size, or to a generous size for smaller references. Normally there is little reason to override these defaults.
--ht-size Hash Table Size. This option specifies the hash table size to generate, rather than calculating an appropriate table size from the reference genome size and the available memory (option --ht-mem-limit). Using default table sizing is recommended and using --ht-mem-limit
--ht-ref-seed-interval Seed Interval. The --ht-ref-seed-interval option defines the step size between positions of seeds in the reference genome populated into the hash table. An interval of 1 (default) means that every seed position is populated, 2 means 50% of positions are populated, etc. Noninteger values are supported, eg, 2.5 yields 40% populated. Seeds from a whole human reference are easily 100% populated with 32 GB memory on DRAGEN boards. If a substantially larger reference genome is used, change this option.
--ht-soft-seed-freq-cap and --ht-max-dec-factor Soft Frequency Cap and Maximum Decimation Factor for Seed Thinning. Seed thinning is an experimental technique to improve mapping performance in high-frequency regions. When primary seeds have higher frequency than the cap indicated by the --ht-soft-seed-freq-cap option
DRAGEN seed extension is dynamic, applied as needed for particular K-mers that map to too many reference locations. Seeds are incrementally extended in steps of 2--14 bases (always even) from a primary seed length to a fully extended length. The bases are appended symmetrically in each extension step, determining the next extension increment if any.
There is a potentially complex seed extension tree associated with each high frequency primary seed. Each full tree is generated during hash table construction and a path from the root is traced by iterative extension steps during seed mapping. The hash table builder employs a dynamic programming algorithm to search the space of all possible seed extension trees for an optimal one, using a cost function that balances mapping accuracy and speed. The following options define that cost function:
--ht-target-seed-freq Target Hit Frequency. The --ht-target-seed-freq option defines the ideal number of hits per seed for which seed extension should aim. Higher values lead to fewer and shorter final seed extensions, because shorter seeds tend to match more reference positions.
--ht-cost-coeff-seed-len Cost Coefficient for Seed Length The --ht-cost-coeff-seed-len option assigns the cost component for each base by which a seed is extended. Additional bases are considered a cost because longer seeds risk overlapping variants or sequencing errors and losing their correct mappings. Higher values lead to shorter final seed extensions.
When building a hash table, DRAGEN configures the options for DNA analysis by default. To run RNA-Seq data, you must build an RNA-Seq hash table by setting --ht-build-rna-hashtable to true. If running RNA-Seq alignment, use the original --output-directory instead of the automatically generated subdirectory.
If using the CNV pipeline, set --ht-build-cnv-hashtable to true. The command generates an additional Kmer hash map that is used in the CNV algorithm. Illumina recommends to always use the --ht-build-cnv-hashtable option, so you can perform CNV calling with the same hash table used for mapping and aligning.
To run the methylation pipeline, you must build a methylation-specific hash table. DRAGEN can build a single-pass or legacy multi-pass methylation hash table. Methylation runs using a single-pass hash table are completed faster than the legacy multipass hash tables. Single-pass hash tables are recommended for building methylation tables and running analyses.
The following is an example of a single-pass hash table build. The example generates a combined hash table in your reference index folder under the methyl_converted subdirectory.
dragen --build-hash-table true \ --output-directory $REFDIR \ --ht-reference $FASTA \ --ht-num-threads 40 \ --ht-methylated-combined=true \ --ht-seed-len 27
Multi-pass methylation mapping requires building two special hash tables with reference bases converted from C to T in one table and G to A in the other table. The conversions are performed automatically when using the --ht-methylated command line option. The converted hash tables are generated in two subdirectories under the folder specified using the --output-directory command line option. The subdirectories are named CT_converted and GA_converted, corresponding with the base conversions. When using the hash tables for methylated alignment runs, make sure to refer to the --output-directory folder, not the subdirectories.
The base conversions remove a significant amount of information from the hash tables. You might need to use different hash table parameters than in a conventional hash table build. The following options are recommended for building hash tables for mammalian species.
dragen --build-hash-table=true --output-directory $REFDIR --ht-reference $FASTA --ht-max-seed-freq 16 --ht-seed-len 27 --ht-num-threads 40 --ht-methylated=true
To run the HLA caller, an HLA-specific anchored reference hash table must be built. Set --ht-build-hla-hashtable to true. The command will create a anchored_hla subdirectory inside the --output-directory. The HLA-specific reference subdirectory can be built at the same time as the primary reference construction.
An HLA resource file is packaged with DRAGEN and located at the following path after installation: <INSTALL_PATH>/resources/hla/HLA_resource.v1.fasta.gz. This file is used by default when building the HLA-specific anchored hash table. A custom file can be specified with --ht-hla-reference. See the HLA section for more information
with the following FILTER codes, non-PASS records are ignored:
##FILTER=<ID=PASS,Description="All filters passed">
with the following FORMAT field :
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
for better results, we recommend variants to be left-aligned.
maximum number of recommended samples in the msVCF is 256. Higher number may lead to very high memory usage at hash table creation.
--ht-mask-bed
No (but recommended)
Path to the mask bed file
--ht-graph-exclusion-bed
No
Path to the exclusion bed file
--output-directory
Yes
Specify the directory where all related hash table files will be written
--ht-mask-bed--ht-alt-liftover--ht-suppress-mask: Suppress automatic detection of the default mask bed files when building the hash table.
--ht-max-table-chunks--ht-num-threads--ht-max-table-chunks--ht-max-table-chunks--ht-num-threads--ht-num-threads--ht-max-table-chunks--ht-max-dec-factor--ht-max-dec-factor 3--ht-max-dec-factor 1--ht-rand-hit-hifreq and --ht-rand-hit-extend Random Sample Hit with HIFREQ Record and EXTEND Record. Whenever a HIFREQ or EXTEND record is populated into the hash table, it stands in place of a large set of reference hits for a certain seed. Optionally, the hash table builder can choose a random representative of that set, and populate that HIT record alongside the HIFREQ or EXTEND record. Random sample hits provide alternative alignments that are very useful in estimating MAPQ accurately for the alignments that are reported. They are never used outside of this context for reporting alignment positions, because that would result in biased coverage of locations that happened to be selected during hash table construction. To include a sample hit, set --ht-rand-hit-hifreq to 1. The --ht-rand-hit-extend option is a minimum pre-extension hit count to include a sample hit, or zero to disable. Modifying these options is not recommended.
--ht-cost-coeff-seed-freq Cost Coefficient for Hit Frequency. The --ht-cost-coeff-seed-freq option assigns the cost component for the difference between the target hit frequency and the number of hits populated for a single seed. Higher values result primarily in high-frequency seeds being extended further to bring their frequencies down toward the target.
--ht-cost-penalty Cost Penalty for Seed Extension. The --ht-cost-penalty option assigns a flat cost for extending beyond the primary seed length. A higher value results in fewer seeds being extended at all. Current testing shows that zero (0) is appropriate for this parameter.
--ht-cost-penalty-incr Cost Increment for Extension Step. The --ht-cost-penalty-incr option assigns a recurring cost for each incremental seed extension step taken from primary to final extended seed length. More steps are considered a higher cost because extending in many small steps requires more hash table space for intermediate EXTEND records, and takes substantially more run time to execute the extensions. A higher value results in seed extension trees with fewer nodes, reaching from the root primary seed length to leaf extended seed lengths in fewer, larger steps.
hg38, hg19, chm13v2
chr1-chr22, chrX, chrY
hs37d5
1-22, X, Y
Value for --ht-seed-len
Read Length
21
100 bp to 150 bp
17 to 19
shorter reads (36 bp)
27
250+ bp
--ht-cost-coeff-seed-len
1
--ht-cost-coeff-seed-freq
0.5
--ht-cost-penalty
0
--ht-cost-penalty-incr
0.7
--ht-max-seed-freq
16
--ht-target-seed-freq
4
Reference does not include the decoy contigs (eg, hg19)
Decoy reads mismap elsewhere in the genome due to the lack of contigs in the reference. Artificially higher mapping rate. False positive calls in noisy regions to which the decoy contigs are mismapped.
DRAGEN automatically detects the absence of the decoy contig from the reference and adds it to the FASTA file. Artificially lower mapping rate because decoy reads which map to the decoy contigs are artificially marked as unmapped in the output BAM (because the original reference does not have the decoy contig). False positive calls are avoided thanks to adding the decoy contigs under the hood. Therefore this helps variant calling.
Reference includes the decoy contigs (eg, hs37d5)
Decoy reads map to the decoy contigs. High mapping rate. No false positive calls caused by decoy reads because decoy reads map to the right place
Decoy reads map to the decoy contigs. High mapping rate. No false positive calls caused by decoy reads because decoy reads map to the right place
--build-hash-table
Yes
Set to true
--ht-graph-msvcf-file
Yes
Path to the multi-sample VCF file containing population variants
--ht-reference
Yes
Path to the reference genome FASTA file.
--ht-graph-extra-kmer-bed
No
Path to the extra kmer bed file
reference.bin
The reference sequences, encoded in 4 bits per base. Four-bit codes are used, so the size in bytes is roughly half the reference genome size. In between reference sequences, N are trimmed and padding is automatically inserted. For example, hg19 has 3,137,161,264 bases in 93 sequences. This is encoded in 1,526,285,312 bytes = 1.46 GB, where 1 GB means 1 GiB or 2^30^ bytes.
hash_table.cmp
Compressed hash table. The hash table is decompressed and used by the DRAGEN mapper to look up primary seeds with length specified by the --ht-seed-len option and extended seeds of various lengths.
hash_table.cfg
A list of parameters and attributes for the generated hash table, in a text format. This file provides key information about the reference genome and hash table.
hash_table.cfg.bin
A binary version of hash_table.cfg used to configure the DRAGEN hardware.
hash_table_stats.txt
A text file listing extensive internal statistics on the constructed hash including the hash table occupancy percentages. This table is for information purposes. It is not used by other tools.
mask.bed
Present only for masked hash tables. A tab delimeted bed file that describes the masked regions. Contains all lines from the input bed file that are not comment lines, lines that describe empty intervals, or lines with contig names that were not found in the input fasta.
--build-hash-table
Yes
Set to true
--ht-reference
Yes
Path to the reference genome FASTA file.
--ht-mask-bed
No (but recommended)
Path to the mask bed file. If not provided, the DRAGEN software automatically applies BED files for hg38 and hg19 from <INSTALL_PATH>/resources/ht_builder.
--output-directory
Yes
Specify the directory where all related hash table files will be written
single-pass
--ht-methylated-combined=true --ht-seed-len 27
multi-pass
--ht-methylated=true --ht-seed-len 27 --ht-max-seed-freq 16
dragen --build-hash-table true [options] --ht-reference
<reference.fasta> --output-directory <outdir>You use the DRAGEN host software program dragen to build and load reference genomes, and then to analyze sequencing data by decompressing the data, mapping, aligning, sorting, duplicate marking with optional removal, and variant calling.
Invoke the software using the dragen command. The command line options are described in the following sections.
Command line options can also be set in a configuration file. For more information on configuration files, see Configuration Files . If an option is set in the configuration file and is also specified on the command-line, the command line option overrides the configuration file.
The following are examples of frequently used command lines:
Build Reference/Hash Table
Run Map/Align and Variant Caller (*.fastq to *.vcf)
Run Map/Align (*.fastq to *.bam)
Run Variant Caller Only (*.bam to *.vcf)
For recommended command lines in typical use cases, see .
Before you can use the DRAGEN system for aligning reads, you must load a reference genome and its associated hash tables onto the PCIe card. For information on preprocessing a reference genome's FASTA files into the native DRAGEN binary reference and hash table formats, see . You must also specify the directory containing the preprocessed binary reference and hash tables with the -r [or --ref-dir] option. This argument is always required.
Use the following command to load the reference genome and hash tables to DRAGEN card memory separately from processing reads.
dragen -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
Use the -l (--force-load-reference) option to force the reference genome to load even if it is already loaded.
dragen -l -r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149
The time needed to load the reference genome depends on the size of the reference, but for typical recommended settings, it takes approximately 30--60 seconds.
DRAGEN has two primary modes of operation, as follows:
Mapper/aligner
Variant caller
DRAGEN is capable of performing each mode independently or as an end-to-end solution. DRAGEN also allows you to enable and disable decompression, sorting, duplicate marking, and compression along the DRAGEN pipeline.
Full pipeline mode To execute full pipeline mode, set --enable-variant-caller to true and provide input as unmapped reads in *.fastq, *.bam, or *.cram formats. DRAGEN performs decompression, mapping, aligning, sorting, and optional duplicate marking and feeds directly into the variant caller to produce a VCF file. In this mode, DRAGEN uses parallel stages throughout the pipeline to drastically reduce the overall run time.
Map/align mode Map/align mode is enabled by default. Input is unmapped reads in *.fastq, *.bam, or *.cram format. DRAGEN produces an aligned and sorted BAM or CRAM file. To mark duplicate reads at the same time, set ‑-enable‑duplicate‑marking to true.
The following command line options for output are mandatory:
--output-directory <out_dir>—Specifies the output directory for generated files.
--output-file-prefix <out_prefix>-Specifies the output file prefix. DRAGEN appends the appropriate file extension onto this prefix for each generated file.
-r [--ref-dir ]—Specifies the reference hash table.
The following examples do not include these mandatory options.
For mapping and aligning, the output is sorted and compressed into BAM format by default before saving to disk. The user can control the output format from the map/align stage with the --output-format <SAM|BAM|CRAM> option. If the output file exists, the software issues a warning and exits. To force overwrite if the output file already exists, use the -f [ --force ] option.
For example, the following commands output to a compressed BAM file, and then forces overwrite:
dragen ... -f
dragen ... -f --output-format bam
To generate a BAI-format BAM index file (*.bai), set --enable-bam-indexing to true.
The following example outputs to a SAM file, and then forces overwrite:
dragen ... -f --output-format sam
The following example outputs to a CRAM file, and then forces overwrite:
dragen ... -f --output-format cram
DRAGEN only outputs lossless CRAM files. All QNAMEs and BAM tags are preserved in the CRAM.
DRAGEN can generate mismatch difference (MD) tags, as described in the BAM standard. The feature is turned off by default because there is a small performance cost to generate these strings. To generate MD tags, set --generate-md-tags to true.
DRAGEN can also annotate additional information about alignments in a ZS:Z tag. The following are valid tag values:
By default, DRAGEN writes a ZS:Z:PAI tag in the output BAM for alignments that map completely inside insertions encoded in population based alternate contigs. To write ZS:Z alignment status tags for all other types described above, set --generate-zs-tags to true (false by default). These tags are only generated in the primary alignment and when a read has suboptimal alignments qualifying for secondary output (even if none were output because --Aligner.sec-aligns was set to 0).
To generate SA:Z tags, set --generate-sa-tags to true (the default). These tags provide alignment information (position, cigar, orientation) of groups of supplementary alignments, which are useful in structural variant calling.
To generate pair score in a ps:i tag, set --generate-ps-tags to true (false by default for DNA, true for RNA). The pair score is used in DRAGEN for computing MAPQ and can be used to check how well alignment candidate pairs score against each other.
DRAGEN can also output mate alignment tags. To generate the mate cigar (in the MC:Z tag), set --generate-mc-tags to true (this is the default). To generate the mate mapping quality (in the MQ:i) tag, set --generate-mq-tags to true (this is the default). To generate mate sequence (in the R2:Z tag) and mate base qualities (in the Q2:Z tag), set --generate-r2-tags to true (default is false) and set --generate-q2-tags to true (default is false) respectively. Please note that when enabled, R2:Z and Q2:Z tags are emitted only for improperly paired read alignments with fragment length atleast 1000 bp. Also, our methylation pipelines currently do not support the output of mate alignment tags.
DRAGEN also outputs a graph alignment tag ga:Z --generate-ga-tags (true by default for DNA, false for RNA) when applicable. This tag is used to describe the best alt contig alignment which improved the score of a primary-contig alignment at its liftover position. It can also be used to describe read alignments to alt contigs for which there is no liftover and the primary alignment is unmapped. For example, cases when the read maps best to an alt contig describing a novel long-insertion that is not present in the reference. In addition, read alignments that have been marked as unmapped because they map to auto-detected decoy contigs not present in the original user-provided FASTA also have their alignments described in the ga tag.
The ga tag uses the same format as the SA tag used to describe supplementary alignments.
When CRAM is selected as output, DRAGEN generates a CRAM file with the following features:
CRAM format V3.0 is produced by default, V3.1 can be enabled by using the option --cram-version 3.1
The CRAM is lossless. Lossy compression is never employed and not optional
Quality score compression is lossless. Read names are preserved
Only the GZIP compression algorithm is employed for maximum compatibility. bgzip, lzma not employed. rANS is used for quality scores
The following list of default settings are used for the CRAM output
DRAGEN can process reads in FASTQ format or BAM/CRAM format. DRAGEN supports the following compression options for FASTQ input files.
Uncompressed
gzip or bgzip compression
ORA compression. To use ORA compression, you must provide an ORA reference and reference directory. See ORA Compression and Decompression.
If your input FASTQ files are gzipped, DRAGEN automatically decompresses the files using hardware-accelerated decompression, and then streams the reads into the mapper. If your files end in *.ora, DRAGEN automatically decompresses the files using ORA decompression, and then streams the reads into the mapper. The same FASTQ command-line options apply for all compression formats.
FASTQ input files can be single-ended or paired-end, as shown in the following examples.
Single-ended in one FASTQ file (-1 option)
Paired-end in two matched FASTQ files(-1 and -2 options)
Paired-end in a single interleaved FASTQ file(--interleaved (-i) option)
Both bcl2fastq and the DRAGEN BCL command use a common file naming convention, as follows:
<SampleID>_S<#>_<Lane>_<Read>_<segment#>.fastq.gz
Older versions of bcl2fastq and DRAGEN could segment FASTQ samples into multiple files to limit file size or to decrease the time to generate them.
For Example:
These files do not need to be concatenated to be processed together by DRAGEN. To map/align any sample, provide the first file in the series (-1 <FileName>_001.fastq). DRAGEN reads all segment files in the sample consecutively for both of the FASTQ file sequences specified using the -1 and -2 options for paired-end input and for compressed fastq.gz files. To turn the behavior off, set ‑‑enable-auto-multifile to false on the command line.
DRAGEN can also optionally read multiple files by the sample name given in the file name, which can be used to combine samples that have been distributed across multiple BCL lanes or flow cells. To enable this feature, set the --combine-samples-by-name option to true
If the FASTQ files specified on the command-line use the Casava 1.8 file naming convention shown above and additional files in the same directory share that sample name, those files and all their segments are processed automatically. Note that sample name, read number, and file extension must match. Index barcode and lane number can differ.
To avoid impacting system performance, input files must be located on a fast file system.
To process multiple FASTQ input files as one sample, it is recommended that you use the --fastq-list <csv file name> option to specify the name of a CSV file containing the list of FASTQ files, instead of using the --combine-samples-by-name option.
For example:
Using a CSV file avoids having to concatenate the FASTQ files, for cases where there are multiple FASTQ files for a sample such as top-up scenarios or where FASTQ files are split across lanes. It also allows you to name the FASTQ input files, input from multiple subdirectories, and add BAM tags specified explicitly for each read group. DRAGEN automatically generates a CSV file of the correct format during BCL conversion to FASTQ. The CSV file is named fastq_list.csv and contains an entry for each FASTQ file or paired-end file pair produced during the run.
FASTQ CSV File Format
The first line of the CSV file specifies the title of each column, and is followed by one or more data lines. All lines in the CSV file must contain the same number of comma-separated values and should not contain white space or other extraneous characters.
Column titles are case-sensitive. The following column titles are required:
RGID--Read Group
RGSM--Sample ID
RGLB--Library
Lane--Flow cell lane
Each FASTQ file referenced in the CSV list can be referenced only once. All values in the Read2File column must be either nonempty and reference valid files, or they must all be empty.
When generating a BAM file using fastq-list input, one read group is generated per unique RGID value. The BAM header contains RG tags for the following read groups:
ID (from RGID)
SM (from RGSM)
LB (from RGLB)
You can specify additional tags for each read group by adding a column title. The column title must be only four upper-case characters and begin with RG. For example, to add a PU (platform unit) tag, add a column named RGPU and specify the value for each read group in this column. All column titles must be unique.
A fastq-list file can contain files for more than one sample. If a fastq-list file contains only one unique RGSM entry, then no additional options need to be specified, and DRAGEN processes all files listed in the fastq-list file. If there is more than one unique RGSM entry in a fastq-list file, --fastq-list-sample-id <SampleID> must be used in addition to --fastq-list <filename> to process only a specific sample from the CSV file. Only the entries in the fastq-list file with an RGSM value that match the specified SampleID are processed.
Independent processing and output for multiple individual samples in one run is not supported.
To process all listed files together as one sample, regardless of the RGSM value, the option --fastq-list-all-samples=true can be used instead of --fastq-list-sample-id.
Note
For a single run, only one BAM and VCF output file are produced because all input read groups are expected to belong to the same sample. To process multiple samples independently from one BCL conversion run, DRAGEN must be run multiple times using different values for the `--fastq-list-sample-id` option.
There is no option to specify groupings or subsets of RGSM values for more complex filtering, but the fastq-list file can be modified to achieve the same effect.
The following is an example FASTQ list CSV file with the required columns:
If you use the --tumor-fastq-list option for somatic input, use the --tumor-fastq-list-sample-id SampleID> option to specify the sample ID for the corresponding FASTQ list, as shown in the following example:
Tumor-Normal Pairs Input
If using fastq_lists or tumor_fastq_lists comprising of multiple samples (RGSMs) in somatic mode, you can use a loop to iterate through the two lists to create tumor-normal pairs for testing. Create a *.txt file with the RGSM of each normal sample to be tested (one per line), and then create a separate *.txt file with the RGSM of the tumor samples to be tested. Make sure that the tumor sample RGSM is listed in the same order as the corresponding normal samples and to include a blank line after the last sample.
You can use the following example script to perform testing in somatic mode. Each iteration takes one entry from the tumor samples list and one entry from the normal samples list (from top to bottom) to create a tumor-normal pair as input for the DRAGEN run.
The following are examples of the FASTQ lists and samples lists used as input for the script.
You can use the same options as the other FASTQ input file types for ORA files. To use the ORA file, replace the FASTQ file name with the ORA file name and specify the ORA reference directory using --ora-reference.
See ORA Compression and Decompression for more information on ORA reference files.
The following command represents paired-end in two matched ORA FASTQ files (-1 and -2 options).
BAM files can be used as input to the mapper/aligner. By default, --enable-map-align is true. When a BAM file input is provided with map/align enabled, DRAGEN ignores any alignment or duplicate marking information contained in the input file, reads are re-mapped and the new alignments are fed downstream to the variant callers. Any existing flags in the input BAM are erased when reads are re-mapped. BAM re-mapping is supported for multiple BAM inputs at a time, such as in paired tumor-normal input to somatic variant calling. Outputting the re-mapped BAM(s) can be enabled by setting --enable-map-align-output=true.
Alternatively, existing alignments in the BAM file can be used as input to the variant callers by setting the --enable-map-align option to false.
If the input file contains paired-end reads, it is important to specify that the input data should be sorted so that pairs can be processed together. Other pipelines would require you to re-sort the input data set by read name. DRAGEN vastly increases the speed of this operation by pairing the input reads, and sending them on to the mapper/aligner when pairs are identified. Use the --pair-by-name option to enable or disable this feature (the default is true).
Specify single-ended input in one BAM file with the (-b) and --pair-by-name=false options, as follows:
Specify paired-end input in one BAM file with the (-b) and \--pair-by-name=true options, as follows:
You can use CRAM files as input to the DRAGEN mapper/aligner and variant caller. The DRAGEN functionality available when using CRAM input is the same as when using BAM input. Supported CRAM input file formats are v3.0 and v3.1.
By default, the CRAM compressor and decompressor uses the DRAGEN reference specified with the --ref-dir option. CRAM compression is reference based, and the reference used for compression is not part of the CRAM file. Therefore, the CRAM input file must have been created with the same reference than what is provided to DRAGEN for the analysis.
DRAGEN supports the re-alignment of a CRAM input that was created with a different reference in one step. Re-aligning a CRAM file that was created with a different reference requires use of the --cram-reference option. This option will make the CRAM decompressor use the specified reference.
--cram-reference can be either a fasta file, or a DRAGEN hash table folder.
If pointing to a fasta file, the fasta .fai index file must be present next to the fasta file
CRAM output will always be compressed using the --ref-dir reference
Example: CRAM was created with hg19, re-analysis with hg38
The following options are used for providing a CRAM input to either mapper/aligner or variant caller:
--cram-input--The name and path for the CRAM file
--cram-input--One usage example is paired-end input in a single CRAM file. In addition, set the --pair-by-name option to true.
Multiple BAM or CRAM Input Files
To provide multiple BAM input files, you can use the --bam-list <csv file name> option to specify the name of a CSV file containing the list of BAM files. For example:
To provide multiple CRAM input files, you can use the --cram-list <csv file name> option.
BAM or CRAM CSV Input File Format
The first line of the CSV file specifies the header containing the title for each column and each subsequent line is a data line. All lines in the CSV file must contain the same number of comma-separated values and should not contain white space or any other extraneous characters.
An example BAM CSV file:
Column titles are case sensitive. The following column titles are required:
BamFile -- path to BAM file
Please note that only the "BamFile" column is supported as this time. Extra fields may be specified in the CSV file but they will not be processed by DRAGEN.
CRAM CSV input follows the same format above, with "CramFile" as the column title instead.
Restrictions and Limitations:
DRAGEN bam-list and cram-list are intended to mirror manually merging BAM or CRAM files via a utility such as samtools or MergeSamFiles (Picard). As a result, using bam-list or cram-list is analogous to having a single merged BAM or CRAM input file. Please note that some callers (i.e. DRAGEN variant calling) are unable to process a bam-list or cram-list that is composed of input files containing multiple samples.
In the case where identical read group IDs appear across multiple files and you want to treat them as distinct read groups, you can use the --prepend-filename-to-rgid=true option to distinguish between read groups.
If enabled, the resulting output BAM or CRAM file will contain all read groups from the input BAM or CRAM files passed in the CSV list file.
Tumor-Normal Pairs Input
You can also use --tumor-bam-list <csv file name> or --tumor-cram-list <csv file name> when running with tumor-only or tumor-normal inputs to DRAGEN. The CSV file has the same format as the options described above.
BCL is the output format of Illumina sequencing systems. Under limited circumstances, DRAGEN can read directly from BCL for map-align operations, saving the time needed for conversion to FASTQ.
DRAGEN can read directly from BCL in the following circumstances:
Only one lane is input as part of a run (specified on the command-line).
The lane has only a single sample specified in the SampleSheet.csv file. When converting BCL to FASTQ is required, DRAGEN provides a BCL to FASTQ converter (see DRAGEN BCL Data Conversion).
The following example command is for BCL input with only one lane of input:
For additional BCL conversion options, see Input File Types.
One of the techniques that DRAGEN uses to optimize handling sequences can lead to the overwriting the base quality score assigned to N base calls.
When you use the --fastq-n-quality and --fastq-offset options, the base quality scores are overwritten with a fixed base quality. The default values for these options are 2 and 33 to match the Illumina minimum quality of 35 (ASCII character ‘#’).
By a common convention, read names can include suffixes, such as /1 or /2), which indicate the end of a pair the read represents. For BAM input using the --pair-by-name option, DRAGEN ignores these suffixes to find matching pair names. By default, DRAGEN uses the forward slash character as the delimiter for these suffixes and ignores the /1 and /2 when comparing names. By default, DRAGEN strips these suffixes from the original read names.
DRAGEN has the following options to control how suffixes are used:
To change the delimiter character, for suffixes, use the --pair-suffix-delimiter option. Valid values for this option include forward-slash (/), dot (.), and colon (:).
To preserve the entire name, including the suffixes, set --strip-input-qname-suffixes to false.
To append a new set of suffixes to all read names, set --append-read-index-to-name to true. The delimiter is determined by the --pair-suffix-delimiter
When processing RNA-Seq data, you can supply a gene annotations file by using the --annotation-file option. Providing this file improves the accuracy of the mapping and aligning stage (see [Input Files]{.underline}). The file should conform to the GTF/GFF format specification and should list annotated transcripts that match the reference genome being mapped against. The similar GFF3 format is currently not supported, due to inconsistent contig naming between GENCODE and Ensembl. See the RNA user guide section for more details on potential issues and workarounds.
DRAGEN can take the SJ.out.tab file (see [SJ.out.tab]{.underline}) as an annotations file to help guide the aligner in a two-pass mode of operation.
DRAGEN can stream input files directly from an AWS S3 bucket, Azure Blob storage account, or by using AWS presigned URLs (presigned URLs are not supported for Azure Blob storage at this time). With streaming, input files are not required to be downloaded locally prior to being processed. The files are streamed over the network directly into the DRAGEN processor.
Input streaming is most beneficial for large input files. DRAGEN supports input streaming for BAMs and compressed FASTQ files. For FASTQ files, input streaming can be used in all the configurations, including single-end FASTQs, paired-end FASTQs, and FASTQ lists.
Input streaming is supported for the following use cases:
Mapping/aligning of FASTQ and BAM.
Germline and somatic small variant calling from BAM (without remapping).
For other file types that are significantly smaller in size, download them locally before running the analysis.
Streaming FASTQ Input Using AWS S3
Streaming FASTQ Input Using Azure Blob Storage Account
Streaming FASTQ Input Using Presigned URLs (for AWS only)
Streaming BAM Input Using AWS S3
Streaming BAM Input Using Azure Blob Storage Account
Streaming BAM Input Using Presigned URLs (for AWS only)
DRAGEN can stream its output to an AWS S3 Bucket or an Azure Blob Storage Account Container. Output streaming is beneficial for large output files and for sharing results.
Streaming output to AWS S3
Streaming output to Azure Blob Storage Account
To stream input files or write to a cloud providers storage, you must have permission to access the remote files.
AWS S3
S3 requires AWS authentication and credentials. The authentication should already be set up on the instance you are running, for example, via IAM policies.
Azure Blob Storage Account
Azure requires authentication and environment variables. DRAGEN supports two cases: (1) Using managed identities and (2) Storage account access keys.
To use managed identities you must run DRAGEN on an Azure instance. The instance must have Contributor permissions (read/write) on the Storage Account it wants to read and write to. If the instance has a single managed identity, only the AZ_ACCOUNT_NAME=<azure-storage-account-name> environment variable is required. For multiple managed identities, you must also provide the AZR_IDENT_CLIENT_ID=<client-id> environment variable, with the client id of the identity that can access your storage bucket. This can be found on the Azure Portal.
With storage account access keys, DRAGEN can write to an Azure bucket both on and off Azure instances. For this use case, find the and set the environment variables AZ_ACCOUNT_NAME=<azure-storage-account-name> and AZ_ACCOUNT_KEY=<account-key>.
Presigned URL (AWS only)
An AWS presigned URL most likely has a query string attached to it, which provides the authentication credentials or necessary tokens to grant permission to the S3 bucket (e.g., https://bucket-name.amazonaws.com/path/to/folder?querystring). Currently, streaming input to DRAGEN Azure presigned URLs is not supported.
Use the --sample-sex command line option to control the sex karyotype input used in downstream components, such as variant callers. If a sample sex karyotype input is not specified using the command line, the sex karyotype is automatically determined. The sex karyotype input is converted to a reference sex karyotype for use in variant calling. Other components might support sex karyotype input. Refer to the corresponding section for the component you are using.
The --sample-sex option supports the following values. Values are not case-sensitive.
none: No sex karyotype input. Components use a default reference sex karyotype.
auto: The sex karyotype is estimated by the Ploidy Estimator. If using CNV calling, sex karyotype is determined using a separate sex estimation module. If DRAGEN cannot estimate the sex karyotype, then components do not have a sex karyotype input. This behavior is then the same as none. auto is the default value.
female
The following example command lines use --sample-sex to specify the sex karyotype.
If the value is none, female, or male, the Ploidy Estimator could still run and produce output, but variant callers will not use any estimated sex karyotype that is different than the sex karyotype provided via the command-line.
The sex karyotype input is converted to the reference sex karyotype for the different components as follows. See the relevant component section for more information on how --sample-sex is used.
For sex karyotype input of None, CNV/Ploidy Caller independently check the coverage ratio of X and Y to determine the reference sex karyotype. Detection of minimal Y coverage will yield XY, otherwise XX.
The Picard Base Quality Score Recalibration (BQSR) tool produces output BAM files that include tags BI and BD. BQSR calculates these tags relative to the exact sequence for a read. If a BAM file with BI and BD tags is used as input to mapper/aligner with hard clipping enabled, the BI and/or BD tags can become invalid.
The recommendation is to strip these tags when using BAM files as input. To remove the BI and BD tags, set the --preserve-bqsr-tags option to false. If you preserve the tags, DRAGEN warns you to disable hard clipping.
DRAGEN assumes that all the reads in a given FASTQ belong to the same read group. DRAGEN creates a single @RG read group descriptor in the header of the output BAM file, with the ability to specify the following standard BAM attributes:
If any of these arguments are present, DRAGEN adds an RG tag to all the output records to indicate that they are members of a read group. The following example shows a command line that includes read group parameters:
When using the --fastq-list option to input multiple read groups, BAM tags (and others) are specified for each read group by adding columns to the fastq_list.csv file. Each column heading consists of four capital letters and each begins with 'RG'. For each column, each read group's values for that column are propagated to the output BAM file in an identically named tag.
To suppress the license status message at the end of the run, use the --lic-no-print option. The following shows an example of the license status message:
An MD5SUM file is generated automatically for BAM and CRAM output files. The MD5SUM file has the same name as the output file, with an .md5sum extension appended (eg, whole_genome_run_123.bam.md5sum). The MD5SUM file is a single-line text file that contains the md5sum of the output file, which exactly matches the output of the Linux md5sum command.
The MD5SUM calculation is performed as the output file is written, so there is no measurable performance impact (compared to the Linux md5sum command, which can take several minutes for a 30x BAM).
Command line options can be stored in a configuration file. The location of the default configuration file is <INSTALL_PATH>/config/dragen-user-defaults.cfg. You can override this file by using the --config-file (-c) option to specify a different file. The configuration file used for a given run supplies the default settings for that run, any of which can be overridden by command line options.
The recommended approach is to use the dragen-user-defaults.cfg file as a template to create default settings for different use cases. Copy dragen-user-defaults.cfg, rename the copy, then modify the new file for the specific use-case. Best practice is to put options that rarely change into the configuration file and to specify options that vary from run to run on the command line.
DRAGEN utilizes quota based licensing for a majority of features. More information can be found in the .
Run BCL Converter (BCL to *.fastq)
Run RNA Map/Align (*.fastq to *.bam)
Variant caller mode To execute variant caller mode, set the --enable-variant-caller option to true, and set --enable-map-align option to false. The input must be a mapped and aligned BAM/CRAM file. DRAGEN produces a VCF file. DRAGEN will force-enable re-sorting of the BAM, because a number of read statistics and estimates are required for the Variant Caller to operate effectively. Setting --enable-sort to false will be overridden. BAM files cannot be duplicate marked in the DRAGEN pipeline prior to variant calling if they have not already been marked. Use the end-to-end mode of operation to take advantage of the mark-duplicates feature.
RNA-Seq data To enable processing of RNA-Seq--based data, set --enable-rna to true. DRAGEN uses the RNA spliced aligner during the mapper/aligner stage. DRAGEN dynamically switches between the required modes of operation..
Bisulfite MethylSeq data To enable processing of Bisulfite MethylSeq data, set the --enable-methylation-calling option to true. DRAGEN automates the processing of data for Lister (directional) and Cokus (nondirectional) protocols to generate a single BAM with bismark-compatible tags. Alternatively, you can run DRAGEN in a mode that produces a separate BAM file for each combination of the C->T and G->A converted reads and references. To enable this mode of processing, you need to build a set of reference hash tables with --ht-methylated enabled, and run DRAGEN with the appropriate ‑‑methylation-protocol setting.
All input BAM tags are preserved
The reference used to compress the CRAM file, is the DRAGEN Hash Table provided during the map/align run. When decompressing the CRAM with a FASTA file and 3rd party tools, the FASTA that was used to generate the Hash Table must be used.
A CRAM index is produced in .crai format
CRAM output is only possible when sort is enabled. CRAM alignments will always be positionally sorted
noref
0
Do not use non-referenced based encoding
multiseq
-1
Do not use multiple references per slice
unsorted
0
Do not use unsorted mode
use_bz2
0
Do not compress using bzip2
use_lzma
0
Do not compress using lmza
use_rans
1
Use rANS for quality score compression
binning
NONE
Qual score binning not used
preserve_aux_order
1
Preserve all aux tags and order (incl RG,NM,MD)
preserve_aux_size
0
Aux tag sizes not preserved ('i', 's', 'c')
lossy_read_names
0
Preserve read names
lossy
0
Do not enable Illumina 8 quality-binning system
ignore_md5
0
Enable all checking of checksums
decode_md
0
Do not (re)generate MD and NM tags
cram_version
3.0
Default is CRAM v3.0.
Read2File--Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty.
/1/2male: Sex karyotype input is XY.
XX
XXYY
XXX
XX
XX
XX
XXYY
XXYY
XXXX
XX
XX
XX
XXYY
XXYY
XXXXX
XX
XX
XX
XXYY
XXYY
XY
XY
XY
XY
XY
XXYY
XXY
XY
XX
XY
XXYY
XXYY
XXXY
XY
XX
XY
XXYY
XXYY
XXXXY
XY
XX
XY
XXYY
XXYY
XYY
XY
XY
XY
XXYY
XXYY
XXYY
XY
XX
XY
XXYY
XXYY
XXXYY
XY
XX
XY
XXYY
XXYY
XYYY
XY
XY
XY
XXYY
XXYY
XXYYY
XY
XX
XY
XXYY
XXYY
XYYYY
XY
XY
XY
XXYY
XXYY
None
XX/XY
XX
XX/XY
XXYY
XXYY
SM
--RGSM
Sample.
CN
--RGCN
Name of the sequencing center that produced the read.
DS
--RGDS
Description.
DT
--RGDT
Date the run was produced.
PI
--RGPI
Predicted mean insert size.
ZS:Z:R
Multiple alignments with similar score were found.
ZS:Z:NM
No alignment was found.
ZS:Z:QL
An alignment was found but it was below the quality threshold.
ZS:Z:NRD
Alignment is to an auto-added decoy contig (not present in input FASTA).
ZS:Z:PAI
Alignment is to an insertion encoded in a population based alternate contig (not present in input FASTA).
SEQS_PER_SLICE
2000
Max sequences per slice
BASES_PER_SLICE
SEQS_PER_SLICE*500
Max bases per slice
SLICE_PER_CNT
1
Max slices per container
embed_ref
0
X0
XX
XY
XX
XXYY
XXYY
XX
XX
XX
ID
--RGID
Read group identifier. If you include any of the read group parameters, RGID is required. It is the value written into each output BAM record.
LB
--RGLB
Library.
PL
--RGPL
Platform/technology used to produce the reads. The BAM standard allows for values CAPILLARY, LS454, ILLUMINA, SOLID, HELICOS, IONTORRENT and PACBIO.
PU
--RGPU
Do not embed reference sequence
XX
Platform unit, eg, flowcell-barcode.lane.
dragen --bcl-conversion-only true --bcl-input-directory <BCL_DIRECTORY> \
--output-directory <OUT_DIRECTORY>dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
--output-file-prefix <FILE_PREFIX> [options] -1 <FASTQ1> \
[-2 <FASTQ2>] --enable-rna truedragen --build-hash-table true --ht-reference <REF_FASTA> \
--output-directory <REF_DIRECTORY> [options]dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
--output-file-prefix <FILE_PREFIX> [options] -1 <FASTQ1> \
[-2 <FASTQ2>] --RGID <RG0> --RGSM <SM0> --enable-variant-caller truedragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
--output-file-prefix <FILE_PREFIX> [options] \
-1 <FASTQ1> [-2 <FASTQ2>] \
--RGID <RG0> --RGSMdragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
--output-file-prefix <FILE_PREFIX> [options] -b <BAM> \
--enable-map-align false \
--enable-variant-caller truedragen -r <REF_DIR> -1 <fastq> \
--output-directory <OUT_DIR> -output-file-prefix <OUTPUT_PREFIX> \
--RGID <RGID> --RGSM <RGSM>dragen -r <REF_DIR> -1 <fastq1> -2 <fastq2> \
--output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX> \
--RGID <RGID> --RGSM <RGSM>dragen -r <REF_DIR> -1 <INTERLEAVED_FASTQ> -i \
--RGID <RGID> --RGSM <RGSM>RDRS182520_S1_L001_R1_001.fastq.gz
RDRS182520_S1_L001_R1_002.fastq.gz
...
RDRS182520_S1_L001_R1_008.fastq.gzdragen -r <ref_dir> --fastq-list <CSV_FILE> \
-fastq-list-sample-id <Sample_ID> -output-directory <OUT_DIR>
--output-file-prefix <OUT_PREFIX>RGID,RGSM,RGLB,Lane,Read1File,Read2File
CACACTGA.1,RDSR181520,UnknownLibrary,1,/staging/RDSR181520_S1_L001_R1_001.fastq,
/staging/RDSR181520_S1_L001_R2_001.fastq
AGAACGGA.1,RDSR181521,UnknownLibrary,1,/staging/RDSR181521_S2_L001_R1_001.fastq,
/staging/RDSR181521_S2_L001_R2_001.fastq
TAAGTGCC.1,RDSR181522,UnknownLibrary,1,/staging/RDSR181522_S3_L001_R1_001.fastq,
/staging/RDSR181522_S3_L001_R2_001.fastq
AGACTGAG.1,RDSR181523,UnknownLibrary,1,/staging/RDSR181523_S4_L001_R1_001.fastq,
/staging/RDSR181523_S4_L001_R2_001.fastqdragen -r <ref_dir> --tumor-fastq-list <csv_file> \
--tumor-fastq-list-sample-id <Sample_ID> \
--output-directory <out_dir> \
--output-file-prefix <out_prefix> --fastq-list <csv_file_2> \
--fastq-list-sample-id <Sample_ID_2>#!/bin/bash
HT="/staging/HT/"
tumor_fastq_list="/staging/inputs/tumor_fastq_list.csv"
normal_fastq_list="/staging/inputs/normal_fastq_list.csv"
tumor_samples_list="/staging/inputs/tumor_samples_list.txt"
normal_samples_list="/staging/inputs/normal_samples_list.txt"
while read -u 3 -r tumor_RGSM && read -u 4 -r normal_RGSM; do
output_dir="/staging/results/${tumor_RGSM}_${normal_RGSM}"
mkdir -p ${output_dir}
dragen \
-r ${HT} \
--tumor-fastq-list ${tumor_fastq_list} \
--tumor-fastq-list-sample-id ${tumor_RGSM} \
--fastq-list ${normal_fastq_list} \
--fastq-list-sample-id ${normal_RGSM} \
--output-directory ${output_dir} \
--output-file-prefix ${tumor_RGSM}_${normal_RGSM}
done 3<${tumor_samples_list} 4<${normal_samples_list}
Sample fastq_list.csv content:
RGPL,RGID,RGSM,RGLB,Lane,Read1File,Read2File
DRAGEN_RGPL,DRAGEN_RGID_N1.1,normal-1,ILLUMINA,1,/staging/inputs/normal-1_S1_L001_R1_001.fastq.gz,/staging/inputs/normal-1_S1_L001_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_N1.2,normal-1,ILLUMINA,2,/staging/inputs/normal-1_S1_L002_R1_001.fastq.gz,/staging/inputs/normal-1_S1_L002_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_N2.1,normal-2,ILLUMINA,1,/staging/inputs/normal-2_S1_L001_R1_001.fastq.gz,/staging/inputs/normal-2_S1_L001_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_N2.2,normal-2,ILLUMINA,2,/staging/inputs/normal-2_S1_L002_R1_001.fastq.gz,/staging/inputs/normal-2_S1_L002_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_N3.1,normal-3,ILLUMINA,1,/staging/inputs/normal-3_S1_L001_R1_001.fastq.gz,/staging/inputs/normal-3_S1_L001_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_N3.2,normal-3,ILLUMINA,2,/staging/inputs/normal-3_S1_L002_R1_001.fastq.gz,/staging/inputs/normal-3_S1_L002_R2_001.fastq.gz
Sample tumor_fastq_list.csv content:
RGPL,RGID,RGSM,RGLB,Lane,Read1File,Read2File
DRAGEN_RGPL,DRAGEN_RGID_T1.1,tumor-1,ILLUMINA,1,/staging/inputs/tumor-1_S1_L001_R1_001.fastq.gz,/staging/inputs/tumor-1_S1_L001_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_T1.2,tumor-1,ILLUMINA,2,/staging/inputs/tumor-1_S1_L002_R1_001.fastq.gz,/staging/inputs/tumor-1_S1_L002_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_T2.1,tumor-2,ILLUMINA,1,/staging/inputs/tumor-2_S1_L001_R1_001.fastq.gz,/staging/inputs/tumor-2_S1_L001_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_T2.2,tumor-2,ILLUMINA,2,/staging/inputs/tumor-2_S1_L002_R1_001.fastq.gz,/staging/inputs/tumor-2_S1_L002_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_T3.1,tumor-3,ILLUMINA,1,/staging/inputs/tumor-3_S1_L001_R1_001.fastq.gz,/staging/inputs/tumor-3_S1_L001_R2_001.fastq.gz
DRAGEN_RGPL,DRAGEN_RGID_T3.2,tumor-3,ILLUMINA,2,/staging/inputs/tumor-3_S1_L002_R1_001.fastq.gz,/staging/inputs/tumor-3_S1_L002_R2_001.fastq.gz
Sample normal_samples_list content
normal-1
normal-2
normal-3
Sample tumor_samples_list content
tumor-1
tumor-2
tumor-3
dragen -r <REF_DIR> -1 <fastq.ora1> -2 <fastq.ora2> \
--ora-reference <ORADATA_DIR> \
--output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX> \
--RGID <RGID> --RGSM <RGSM>dragen -r <ref_dir> -b <bam> --output-directory <out_dir> \
--output-file-prefix <out_prefix> --pair-by-name falsedragen -r <ref_dir> -b <bam> --output-directory <out_dir> \
--output-file-prefix <out_prefix> --pair-by-name truedragen -r <ref_dir HG38> --cram-input <cram> --output-directory <out_dir> \
--output-file-prefix <out_prefix> --cram-reference <ref_dir HG19>dragen -r <ref_dir HG38> --cram-input <cram> --output-directory <out_dir> \
--output-file-prefix <out_prefix> --cram-reference <hg19.fa>dragen -r <ref_dir> --cram-input <cram> --output-directory <out_dir> \
--output-file-prefix <out_prefix> --pair-by-name truedragen -r <ref_dir> --bam-list <CSV_FILE> \
--output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX>BamFile
/path/to/bam/one
/path/to/bam/twodragen --bcl-input-dir <BCL_ROOT> --bcl-only-lane <num> -r <ref_dir> \
--output-directory <out_dir> --output-file-prefix <out_prefix>dragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 s3://s3-bucket-name/path/to/object_1.fastq.gz \
-2 s3://s3-bucket-name/path/to/object_2.fastq.gz \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streamingAZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 https://storage-account-name.blob.core.windows.net/path/to/object_1.fastq.gz \
-2 https://storage-account-name.blob.core.windows.net/path/to/object_2.fastq.gz \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streamingdragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 https://bucket-name.amazonaws.com/path/to/object_1.fastq.gz?querystring \
-2 https://bucket-name.amazonaws.com/path/to/object_2.fastq.gz?querystring \
--RGID object_ID \
--RGSM sample_name \
--output-directory /staging/examples/ \
--output-file-prefix streamingdragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-b s3://s3-bucket-name/path/to/object_1.bam \
--output-directory /staging/examples/ \
--output-file-prefix streamingAZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-b https://storage-account-name.blob.core.windows.net/path/to/object_1.bam \
--output-directory /staging/examples/ \
--output-file-prefix streamingdragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-b https://bucket-name.amazonaws.com/path/to/object_1.bam?querystring \
--output-directory /staging/examples/ \
--output-file-prefix streamingdragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 SRA056922.fastq \
--RGID object_ID \
--RGSM sample_name \
--output-directory s3://s3-bucket-name/path/to/output \
--intermediate-results-dir /staging/examples \
--output-file-prefix streamingAZ_ACCOUNT_NAME="storage-account-name" AZ_ACCOUNT_KEY="<account-key>" dragen -f \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 SRA056922.fastq \
--RGID object_ID \
--RGSM sample_name \
--output-directory https://storage-account-name.blob.core.windows.net/path/to/output \
--intermediate-results-dir /staging/examples \
--output-file-prefix streaming--sample-sex FEMALE
--sample-sex MALE
--sample-sex NONEdragen --RGID 1 --RGCN Broad --RGLB Solexa-135852 \
--RGPL Illumina --RGPU 1 --RGSM NA12878 \
-r /staging/human/reference/hg19/hg19.fa.k_21.f_16.m_149 \
-1 SRA056922.fastq --output-directory /staging/tmp/ \
--output-file-prefix rg_exampleLICENSE_MSG| =====================================================
LICENSE_MSG| License report
LICENSE_MSG| Genome status [ACxxxxxxxxxxx] : used 1263.9 Gbases
since 2018-Feb-15 (1263886160894 bases, unlimited)
LICENSE_MSG| Genome bases [ACxxxxxxxxxxx] : 202000000
LICENSE_MSG| Genome bases [total] : 202000000dragen -r <REF_DIRECTORY> --output-directory <OUT_DIRECTORY> \
--output-file-prefix <FILE_PREFIX> [options] -b <BAM> \
--enable-map-align true \
--enable-variant-caller true