# DNA Germline WGS UMI

The DRAGEN recipe includes the recommended pipeline specific commands. A DRAGEN recipe is a predefined set of analysis parameters and workflow settings tailored for a specific type of genomic analysis. Some default parameters are included for clarity and are marked with comments.

```
  
/opt/dragen/$VERSION/bin/dragen         #DRAGEN install path 
--ref-dir $REF_DIR                      #path to DRAGEN graph hashtable 
--output-directory $OUTPUT 
--intermediate-results-dir $PATH        #e.g. SDD /staging 
--output-file-prefix $PREFIX 
# Inputs 
--fastq-list $PATH                      #see 'Input Options' for FQ, BAM or CRAM 
--fastq-list-sample-id $STRING 
# Mapper 
--enable-map-align true                 #optional with BAM/CRAM input 
--enable-map-align-output true          #optionally save the output BAM 
--enable-sort true                      #default=true 
# UMI 
--umi-enable true 
--umi-source STRING                     #Default='qname' 
--umi-library-type STRING               #e.g. random-duplex 
--umi-metrics-interval-file $BED 
--remove-duplicates false 
--umi-min-supporting-reads 1            #Default=2 
# Small variant caller 
--enable-variant-caller true 
# Annotation 
--variant-annotation-data PATH 
--variant-annotation-assembly GRCh37/8 
--enable-variant-annotation true 
# SV 
--enable-sv true 
# CNV 
--enable-cnv true 
--cnv-enable-self-normalization true 
# HLA genotyper 
--enable-hla true 
--hla-enable-class-2 true               #optional if assay covers class II HLA regions 
# Targeted caller 
--enable-targeted true                  #Targeted 
# Star allele 
--enable-star-allele true 
# PGX 
--enable-pgx true                       #PGX 
# Short tandem repeats 
--repeat-genotype-enable true 
```

## Notes and additional options

### Hashtable

For DRAGEN germline runs, it is recommended to use the graph hashtable.

See: [Product Files](https://support.illumina.com/sequencing/sequencing_software/dragen-bio-it-platform/product_files.html)

### Input options

DRAGEN input sources include: fastq list, fastq, bam, or cram.

FQ list Input

```
--fastq-list $PATH 
--fastq-list-sample-id $STRING 
```

FQ Input

```
--fastq-file1 $PATH 
--fastq-file2 $PATH 
--RGSM $STRING 
--RGID $STRING 
```

BAM Input

```
--bam-input $PATH 
```

CRAM Input

```
--cram-input $PATH 
```

### Mapping and Aligning

| Option                           | Description                                                                                          |
| -------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `--enable-map-align true`        | Optionally disable map & align (default=true).                                                       |
| `--enable-map-align-output true` | Optionally save the output BAM (default=false).                                                      |
| `--Aligner.clip-pe-overhang 2`   | Clean up any unwanted UMI indexes. Only use when reads contain UMIs, but UMI collapsing was not run. |

### UMI

| Option                             | Description                                                                                                                                                                                                                                                                                                                     |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--umi-source STRING`              | Specify the input type for the UMI sequence. Options: `qname`, `fastq`, `bamtag`.                                                                                                                                                                                                                                               |
| `--umi-library-type STRING`        | Set the batch option for different UMIs correction. Options: `random-duplex`, `random-simplex`, `nonrandom-duplex`.                                                                                                                                                                                                             |
| `--umi-nonrandom-whitelist $PATH`  | If UMI is nonrandom, either a whitelist or correction table is required. The whitelist includes a valid UMI sequence per line.                                                                                                                                                                                                  |
| `--umi-correction-table $PATH`     | If UMI is nonrandom, either a whitelist or correction table is required. The correction table defaults to the table used by TruSight Oncology: \<INSTALL\_PATH>/resources/umi/umi\_correction\_table.txt.gz.                                                                                                                    |
| `--umi-min-supporting-reads INT`   | Specify the number of matching UMI input reads required to generate a consensus read. Any family with insufficient supporting reads is discarded. The default is 2, but most pipelines perform better with this setting set to 1. A setting of 2 may potentially be relevant for samples with ultra deep coverage (e.g. ctDNA). |
| `--umi-metrics-interval-file $BED` | Target region in BED format.                                                                                                                                                                                                                                                                                                    |
| `--umi-emit-multiplicity both`     | Set the consensus sequence type to output. DRAGEN UMI allows collapsing duplex sequences from the two strands of the original molecules. For more information, see [Merge Duplex UMIs](https://help.dragen.illumina.com/dragen-v4.3/product-guide/dragen-dna-pipeline/unique-molecular-identifiers#merge-duplex-umis).          |
| `--umi-start-mask-length INT`      | Number of additional bases to ignore from start of read. The default is 0. To reduce FP optionally set to 1.                                                                                                                                                                                                                    |
| `--umi-end-mask-length INT`        | Number of additional bases to ignore from end of read. The default is 0. To reduce FP optionally set to 3.                                                                                                                                                                                                                      |

For more information see: [UMI Options](https://help.dragen.illumina.com/dragen-v4.3/product-guide/dragen-dna-pipeline/unique-molecular-identifiers#umi-options).

### SNV

DRAGEN SNV VC employs machine learning based variant recalibration (DRAGEN-ML). It processes read and other contextual evidence to remove false positives, recover false negatives and reduce zygosity errors. No additional setup is required. DRAGEN-ML is enabled by default as needed, when running the germline SNV VC on hg19 or hg38.

Note that we do not recommend changing the default QUAL thresholds of 3 for DRAGEN-ML and 10 for DRAGEN without ML. These values differ from each other because DRAGEN-ML improves the calibration of QUAL scores, leading to a change in the scoring range.

| Option                                      | Description                                                                                                            |
| ------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `--vc-target-bed`                           | Limit variant calling to region of interest.                                                                           |
| `--vc-combine-phased-variants-distance INT` | Maximum distance over which phased variants will be combined. Set to 0 to disable. Valid range is \[0; 15] (Default=2) |
| `--vc-emit-ref-confidence GVCF`             | To enable gVCF output.                                                                                                 |
| `--vc-enable-vcf-output`                    | To enable VCF file output during a gVCF run, set to true. The default value is false.                                  |

For more detail on the small variant caller in somatic mode please refer to [Somatic Mode](https://help.dragen.illumina.com/dragen-v4.3/product-guide/dragen-v4.3/dragen-dna-pipeline/small-variant-calling/somatic-mode)

### Annotation

For instructions on how to download the Nirvana annotation database, please refer to [Nirvana](https://help.dragen.illumina.com/dragen-v4.3/product-guide/dragen-v4.3/nirvana)

### HLA

| Option                            | Description                                                                                                                     |
| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- |
| `--enable-hla`                    | Enable HLA typer (this setting by default will only genotype class 1 genes)                                                     |
| `--hla-as-filter-min-threshold`   | Internal option to set min alignment score threshold. The default is 59 and works for WES and WGS. Set to 29 for panels.        |
| `--hla-as-filter-ratio-threshold` | Minimum Alignment score of a read mate to be considered. The default is 0.67 and works for WES and WES. Set to 0.85 for panels. |
| `--hla-enable-class-2`            | Extend genotyping to HLA class 2 genes (default=true).                                                                          |

### CNV

| Option                                | Description                                                                                                                                               |
| ------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--cnv-enable-gcbias-correction true` | Enable or disable GC bias correction when generating target counts.                                                                                       |
| `--cnv-segmentation-mode $SEG_MODE`   | Option to override the default segmentation algorithm. Defaults include `slm` for germline WGS, `aslm` for somatic WGS, and `hslm` for targeted analysis. |

For more information, see [CNV Calling](https://help.dragen.illumina.com/dragen-v4.3/product-guide/dragen-v4.3/dragen-dna-pipeline/cnv-calling).
