# DRAGEN Microbial Enrichment Plus

## Description

DRAGEN Microbial Enrichment Plus (DME+), formerly known as the Explify Analysis Pipeline, offers a dedicated informatics solution with flexible analysis options for the following Illumina Infectious Disease and Microbiology target-capture enrichment panel kits: the Illumina Respiratory Pathogen ID/AMR Enrichment Panel Kit (RPIP), Illumina Urinary Pathogen ID/AMR Enrichment Panel Kit (UPIP), and Illumina Viral Surveillance Panel V2 Kit (VSP V2). The application delivers easy-to-use, powerful secondary analysis of Illumina sequencing data, with workflows for sample QC, viral WGS (whole-genome sequencing), pathogen detection and quantification, and antimicrobial resistance (AMR) marker profiling. It also supports custom reference sequence analysis.

* RPIP: Target-capture enrichment of >280 RNA and DNA respiratory pathogens, including SARS-CoV-2, Influenza viruses, Respiratory syncytial virus, Mycobacterium and Legionella species, and >4000 AMR markers.
* UPIP: Target-capture enrichment of >170 genitourinary pathogens, including fastidious, slow-growing, and anaerobic uropathogens, sexually transmitted microorganisms, and >4000 bacterial AMR markers.
* VSP V2: Target-capture enrichment for whole-genome sequencing (WGS) of 200 RNA and DNA viruses prioritized as high-risk to public health, zoonotic surveillance, and biotech, and >200 viral AMR markers.
* Custom: Analyze FASTQ/FASTA read files with a custom reference sequence database.

Note that samples enriched using the Illumina Respiratory Virus Oligo Panel/Respiratory Virus Enrichment Kit (RVOP/RVEK) and Viral Surveillance Panel Kit (VSP) can also be analyzed using DME+ and the VSP V2 database.

## Pipeline Steps

The following table describes the different steps performed by the pipeline, which steps apply to each panel, and whether the step is run when using a set of custom references.

|              Step             |                                                                               Description                                                                              |    Panels    | Custom References |
| :---------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------: | :---------------: |
|            Read QC            | Can be disabled. Low-quality bases are trimmed. Short and low-quality reads are discarded. It is assumed that appropriate adapter trimming has already been performed. |      All     |        Yes        |
|    Post-QC FASTQ Generation   |                  Can choose to create a FASTQ with the trimmed reads, or a set of kingdom-specific FASTQs with the trimmed reads. Disabled by default.                 |      All     |        Yes        |
|           Dehosting           |                                                                          Removes human reads.                                                                          |      All     |        Yes        |
|           Sample QC           |                                   Sample composition analysis and enrichment factor calculation (which requires an internal control).                                  |      All     |         No        |
|  Microorganism Classification |                                                           K-mer-based analysis with configurable sensitivity.                                                          |    VSP V2    |         No        |
|    Microorganism Detection    |                                                           Alignment-based analysis and consensus generation.                                                           |      All     |        Yes        |
|  Microorganism Quantification |                                                                      Requires an internal control.                                                                     |      All     |         No        |
| Bacterial AMR Marker Analysis |                                         Nucleotide and protein alignment, consensus generation, variant calling and annotation.                                        |  RPIP, UPIP  |         No        |
|     Viral Variant Calling     |                                                                Detects variants from alignment results.                                                                | RPIP, VSP V2 |         No        |
|   Viral AMR Marker Analysis   |                                                                     Variant calling and annotation.                                                                    | RPIP, VSP V2 |         No        |
|       Report Generation       |                                                                          Creates the AP JSON.                                                                          |      All     |        Yes        |

## Command Line Settings

| Option                                      | Description                                                                                                                         |
| ------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| Required Inputs                             |                                                                                                                                     |
| `--enable-explify`                          | Enables the DME+ pipeline. (Default=false).                                                                                         |
| `--output-file-prefix`                      | Prefix for all output files.                                                                                                        |
| `--output-directory`                        | Directory for all output files.                                                                                                     |
| `--explify-sample-list`                     | Input sample list .tsv file with sample IDs, FASTQs, etc.                                                                           |
| `--explify-test-panel-name`                 | "RPIP", "UPIP", "VSP V2", "Custom".                                                                                                 |
| `--explify-test-panel-version`              | Set to test panel version (e.g. "1.0.0").                                                                                           |
| `--explify-ref-db-dir`                      | Path to root directory for database files.                                                                                          |
| Optional Inputs                             |                                                                                                                                     |
| `--intermediate-results-dir`                | Area for temporary files. Size must be greater than size of all FASTQ files multiplied by 3.                                        |
| `--explify-load-db-ram`                     | Option to load database into RAM if not on ramdisk. (Default=false).                                                                |
| `--explify-no-read-qc`                      | Option to turn off read QC on FASTQs before analysis. (Default=false).                                                              |
| `--explify-internal-control`                | Option to set internal control from an accepted list. (Default="Enterobacteria phage T7").                                          |
| `--explify-internal-control-concentration`  | Option to set internal control concentration. (Default=12100000).                                                                   |
| `--explify-ncpus`                           | Option to set the number of CPUs available for processing.                                                                          |
| `--explify-sensitivity-threshold`           | Option to set sensitivity threshold for considering a virus present. Range: 0 < Integer < 1000. Only valid for VSP V2. (Default=5). |
| `--explify-custom-ref-fasta`                | Reference FASTA file. Required for Custom reference DBs.                                                                            |
| `--explify-custom-ref-bed`                  | Reference BED file. Optional for Custom reference DBs.                                                                              |
| `--explify-viral-consensus-depth-threshold` | Minimum depth at position to include base in viral consensus sequence. Only relevant for RPIP and VSP V2 (Default=1).               |
| `--explify-viral-vc-depth-threshold`        | Minimum total depth at position to report viral variant. Only relevant for RPIP and VSP V2. (Default=5).                            |
| `--explify-viral-vc-af-threshold`           | Minimum allele frequency to report viral variant. Only relevant for RPIP and VSP V2. (Default=0.2).                                 |
| `--explify-post-qc-fastq-mode`              | Create a single post-quality fastq file or files split by kingdom. Choices='off', 'single', 'split'. (Default=off).                 |

### Example Command Line

```shell
dragen \
  --enable-explify=true \
  --output-file-prefix <PREFIX> \
  --explify-sample-list /path/to/sample/list/tsv \
  --explify-test-panel-name <"RPIP"/"UPIP"/"VSP V2"/"Custom"> \
  --explify-test-panel-version <VERSION> \
  --explify-ref-db-dir /path/to/root/db/dir \
  --explify-load-db-ram=true \
  --output-directory <OUTPUT_DIR> \
  --intermediate-results-dir <OUTPUT_DIR> \
  --explify-ncpus=20
```

## Input Details

### Sample Input List

Applies to: `--explify-sample-list`

The sample input list is a column-formatted file with *tab* separations between the columns (i.e., a `.tsv` file).

```
SampleID     BatchID     RunID     ControlFlag     FastQs
MySample     MyBatch     MyRun     POS             /path/to/fastq1.gz     /path/to/fastq2.gz
```

Notes:

* The **SampleID** values *must* be unique.
* **BatchID** and **RunID** are to help users track and manage sample analyses. Often the **BatchID** is used to track libraries that were prepared together, and the **RunID** is used to track sequencing runs. They can also be left blank.
* The **ControlFlag** value can be *POS*, *NEG*, *BLANK*, or left empty.
  * *POS* is used to indicate a positive control sample.
  * *NEG* is used to indicate a negative control sample.
  * *BLANK* is used to indicate a blank control sample (e.g. buffer only).
* If there are multiple FASTQ files, they are tab delimited.
* Please be very careful when editing tsv files. Some editors replace tabs with spaces without alerting the user.

### Internal Control

Applies to: `--explify-internal-control`, `--explify-internal-control-concentration`

The user may specify one of the internal controls listed below. If `NONE` is specified, the internal control concentration is ignored. These are case-sensitive and must be input exactly as they appear:

* `Allobacillus halotolerans`
* `Armored RNA Quant Internal Process Control`
* `Enterobacteria phage T7` (This is the default)
* `Escherichia virus MS2`
* `Escherichia virus Qbeta`
* `Escherichia virus T4`
* `Imtechella halotolerans`
* `Phocid alphaherpesvirus 1`
* `Phocine morbillivirus`
* `Truepera radiovictrix`
* `NONE`

The internal control concentration is an integer representing the number of *copies/mL of sample* for the internal control.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.dragen.illumina.com/product-guides/dragen-v4.5/dme-plus-overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
