# 5 Base DNA Somatic Tumor-Only Solid WGS

A DRAGEN recipe, like this one, is a predefined set of analysis parameters and workflow settings tailored to a specific type of genomic analysis. For clarity, some default parameters are explicitly included and annotated with comments. This recipe includes the recommended commands for solid samples. These settings support fresh frozen samples, as well as some optional settings for FFPE samples.

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN linear hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SSD /staging 
--output-file-prefix $PREFIX 
# Inputs 
--tumor-fastq-list $PATH                #see 'Input Options' for FQ, BAM or CRAM 
--tumor-fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true                 #optional with BAM/CRAM input 
--enable-map-align-output true          #optionally save the output BAM 
--enable-sort true                      #default=true 
--enable-duplicate-marking true         #default=true 
# 5-Base 
--methylation-conversion illumina 
--methylation-generate-cytosine-report true 
--methylation-compress-cx-report true 
# Small variant caller 
--enable-variant-caller true 
--vc-systematic-noise $PATH             #Required 
--vc-excluded-regions-bed $BED          #FFPE: optionally mask ALUs 
# SV 
--enable-sv true 
--sv-systematic-noise $PATH             #Recommended 
--enable-oncovirus-detection true       #Optional 
--oncovirus-detection-db $PATH          #Optional 
# CNV 
--enable-cnv true 
--cnv-population-b-allele-vcf $POP_VCF 
--cnv-enable-self-normalization true 
# Annotation 
--variant-annotation-data $NIRVANA_PATH 
--vc-enable-germline-tagging true 
```

## Notes and additional options

### Hashtable

For DRAGEN somatic runs it is recommended to use the linear hashtable.

See: [Product Files](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html)

### Input options

DRAGEN input sources include: fastq list, fastq, bam, or cram. For BCL input, first create FASTQs using [BCL conversion](https://help.dragen.illumina.com/product-guides/dragen-v4.5/bcl-conversion).

FQ list Input

```
--tumor-fastq-list $PATH 
--tumor-fastq-list-sample-id $STRING 
```

FQ Input

```
--tumor-fastq1 $PATH 
--tumor-fastq2 $PATH 
--RGSM-tumor $STRING 
--RGID-tumor $STRING 
```

BAM Input

```
--tumor-bam-input $PATH 
```

CRAM Input

```
--tumor-cram-input $PATH 
```

### Mapping and Aligning

| Option                           | Description                                                                                          |
| -------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `--enable-map-align true`        | Optionally disable map & align (default=true).                                                       |
| `--enable-map-align-output true` | Optionally save the output BAM (default=false).                                                      |
| `--Aligner.clip-pe-overhang 2`   | Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run. |

### Duplicate Marking

| Option                            | Description                                                                     |
| --------------------------------- | ------------------------------------------------------------------------------- |
| `--enable-duplicate-marking true` | By default, DRAGEN marks duplicate reads and exclude them from variant calling. |

### Fractional (Raw Reads) Downsampling

DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

Downsampling may be useful to reduce runtime on very deep samples. For Tumor-Normal analyses it is also recommended to use a normal sample with coverage that is less than the tumor sample. If the matched normal has deeper coverage than the tumor sample, then the fractional samples may be used to reduce coverage on the normal sample.

| Option                             | Description                                                                                                 |
| ---------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `--enable-fractional-down-sampler` | Set to true to enable fractional downsampling. The default value is false.                                  |
| `--down-sampler-normal-subsample`  | Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%). |
| `--down-sampler-tumor-subsample`   | Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).  |
| `--down-sampler-random-seed`       | Specify the random seed for different runs of the same input data. The default value is 42.                 |

### 5-Base Methylation

| Option                                        | Description                                                                                                                                                                                                                       |
| --------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--methylation-conversion STRING`             | Library conversion for methylation analysis. Options: `none`, `c_t`, `mc_t`, `illumina` (default=none).                                                                                                                           |
| `--methylation-protocol STRING`               | Library protocol for methylation analysis. Options: `none`, `directional`, `non-directional`, `directional-complement`, `pbat`. The default value for `methylation-conversion=illumina` is `directional`, otherwise it is `none`. |
| `--methylation-mapq-threshold INT`            | Only reads with MAPQ greater or equal than the threshold will be included in methyl-seq analysis (default=0).                                                                                                                     |
| `--methylation-generate-mbias-report true`    | Whether to generate a per-sequencer-cycle methylation bias report (default=true).                                                                                                                                                 |
| `--mbias-report-include-overlaps`             | Calculate methylation stats for overlapping bases between mates (default=false).                                                                                                                                                  |
| `--methylation-generate-cytosine-report true` | Whether to generate a genome-wide cytosine methylation CX\_report file (default=false).                                                                                                                                           |
| `--methylation-compress-cx-report true`       | Set to true to enable compression of the CX\_report (default=true).                                                                                                                                                               |
| `--methylation-keep-ref-cytosine true`        | Set to true to keep all reference cytosines in the CX\_report file, even if they don't appear in the input reads (default=false).                                                                                                 |
| `--enable-cpg-methylated-mapping true`        | Enable methylated mapping with base conversions restricted to CpG context (default=true). When false, runs DRAGEN Methylation 3-base map/align instead.                                                                           |
| `--methylation-report-to-vcf`                 | Specify methylation type (none, cg, or c) which is reported in VCF files (default=c).                                                                                                                                             |
| `--methylation-report-to-vcf`                 | Specify methylation type (none, cg, or c) which is reported in gVCF files (default=cg).                                                                                                                                           |

For more information see: [5-Base Pipeline](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-methylation-pipeline/dragen-5base-pipeline).

### SNV

| Option                                                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| -------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--vc-target-bed`                                        | Limit variant calling to region of interest.                                                                                                                                                                                                                                                                                                                                                                                                      |
| `--vc-combine-phased-variants-distance INT`              | Maximum distance in base pairs (BP) over which phased variants will be combined. Set to 0 to disable. Valid range is \[0; 15] BP (Default=2)                                                                                                                                                                                                                                                                                                      |
| `--vc-systematic-noise $PATH`                            | Systematic noise file. This filter is recommended for removing systematic noise observed in normal samples (i.e. systematic alignment errors, sequencing errors, etc.).                                                                                                                                                                                                                                                                           |
| `--vc-somatic-hotspots $PATH`                            | DRAGEN has a default set of hotspot variants (positions and alleles) where it will assign an increased prior probability. Use this option to override with a custom hotspots file.                                                                                                                                                                                                                                                                |
| `--vc-sq-filter-threshold $NUM`                          | Threshold for sensitivity-specificity tradeoff using SQ score. The pipeline specific default threshold is 3. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.                                                                                                                                                                                                   |
| `--vc-systematic-noise-filter-threshold $INT`            | Threshold for sensitivity-specificity tradeoff using AQ score for non-hotspot variants. This is only used when supplying a systematic noise file. Pipeline specific default value = 60. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.                                                                                                                        |
| `--vc-systematic-noise-filter-threshold-in-hotspot $INT` | Threshold for sensitivity-specificity tradeoff using AQ score for hotspot variants. This is only used when supplying a systematic noise file. Pipeline specific default value = 20. Raise this value to improve specificity at the cost of sensitivity, or lower it to improve sensitivity at the cost of specificity.                                                                                                                            |
| `--vc-excluded-regions-bed $BED`                         | Hard filter variants that overlap with this region. ALU regions comprise approximately 11% of the genome, and are often exceptionally noisy regions in FFPE samples. Optionally filter out ALU regions using the DRAGEN excluded regions filter. ALU bed files can be downloaded as part of the Bed File Collection: [Bed File Collection](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html) |

Clinical applications require maximum confidence in variant calls to prevent false positive diagnoses. High-specificity mode reduces false positives with some loss in sensitivity.

| High Specificity Option      | Description                                                                                               |
| ---------------------------- | --------------------------------------------------------------------------------------------------------- |
| `--vc-high-specificity true` | Apply aggressive filters in the small variant caller to boost specificity, with some loss in sensitivity. |

For more detail on the small variant caller in somatic mode please refer to [Somatic Mode](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/small-variant-calling/somatic-mode)

### CNV

| Option                                | Description                                                                                                                                               |
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--cnv-enable-gcbias-correction true` | Enable or disable GC bias correction when generating target counts.                                                                                       |
| `--cnv-segmentation-mode $SEG_MODE`   | Option to override the default segmentation algorithm. Defaults include `slm` for germline WGS, `aslm` for somatic WGS, and `hslm` for targeted analysis. |
| `--cnv-segmentation-bed $PATH`        | Specify a segmentation bed file to add pre-defined segments to be called.                                                                                 |

### Annotation

For instructions on how to download the Nirvana annotation database, please refer to [Nirvana](https://help.dragen.illumina.com/product-guides/dragen-v4.5/nirvana)

### SV

| Option                                             | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| -------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--sv-call-regions-bed`                            | Specifies a BED file containing the set of regions to call. Optionally gzip or bgzip format.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| `--sv-exclusion-bed`                               | Specifies a BED file containing the set of regions to exclude for the SV calling. Optionally, you can compress the file in gzip or bgzip format.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| `--enable-variant-deduplication true`              | Relevant when both SV and SNV callers are enabled in somatic workflows. Can increase sensitivity and prevent the occurrence of replicated variants within genes such as FLT3 and KMT2A. Filter all small indels in the structural variant VCF that appear and are passing in the small variant VCF. DRAGEN will create a new VCF that contains variants in SV VCF that are not matching a variant from SNV VCF file. The new deduplicated SV VCF file will have the same prefix passed by `--output-file-prefix` followed by `sv.small_indel_dedup`. DRAGEN normalizes variants by trimming and left shifting by up to 500 bases. |
| `--sv-systematic-noise $BEDPE`                     | Systematic noise BEDPE file containing the set of noisy paired regions (optionally gzip or bzip compressed). Optional for WGS Tumor-Normal, but strongly recommended for WGS Tumor-Only. Has not been validated in WES, enrichment and amplicon panels.                                                                                                                                                                                                                                                                                                                                                                           |
| `--sv-somatic-ins-tandup-hotspot-regions-bed $BED` | Specify a custom BED of ITD hotspot regions to increase sensitivity for calling ITDs in somatic variant analysis. The default file includes FLT3, ARHGEF7, KMT2A, and UBTF exonic regions with some padding on both sides (300 bps)                                                                                                                                                                                                                                                                                                                                                                                               |
| `--sv-min-candidate-variant-size`                  | Run SV caller and report all SVs/indels at or above this size. The default value is set to 10.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `--sv-min-scored-variant-size`                     | After candidate identification, only score and report SVs/indels at or above this size. The default value is set to 50. This parameter doesn't affect the somatic hotspot region.                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
| `--enable-oncovirus-detection`                     | Set to enable detection of oncoviral integration. See [Oncovirus Detection](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/oncovirus-detection) for more information.                                                                                                                                                                                                                                                                                                                                                                                                                            |
| `--oncovirus-detection-db`                         | Specifies the directory containing oncovirus detection resource files. See [Oncovirus Detection](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/oncovirus-detection) for more information.                                                                                                                                                                                                                                                                                                                                                                                                       |

| Option                              | Recommended Value for Liquid Tumors (e.g. AML/MLL) |
| ----------------------------------- | -------------------------------------------------- |
| `--sv-min-scored-variant-size $INT` | 100000                                             |

For more information, see [Structural Variant Calling](https://help.dragen.illumina.com/product-guides/dragen-v4.5/dragen-dna-pipeline/sv-calling).

## Resource Files

DRAGEN requires resource files for components such as SNV, SV, and CNV. The following notes provide references for downloading these files or generating them for custom workflows or assays.

### SNV Systematic Noise

Systematic noise files are considered essential in Tumor-Only workflows. It is also recommended for Tumor-Normals workflows.

DRAGEN has pre-built systematic noise files for WGS, WES and for Pillar Amplicons. These files should also be used in 5-Base workflows. The 5-Base workflows have not been tested with custom noise files.

#### Prebuilt

Prebuilt systematic noise BED files (WES and WGS) can be downloaded here: [Product Files](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html).

| Prebuilt WES/WGS noise files                       | Description              |
| -------------------------------------------------- | ------------------------ |
| `WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz`      | For WGS FF               |
| `FFPE_WGS_hg38_v2.0.0_systematic_noise.snv.bed.gz` | For WGS FFPE (only hg38) |
| `WES_hg38_v2.0.0_systematic_noise.snv.bed.gz`      | For WES FF and FFPE      |
