DRAGEN Microbial Enrichment Plus

Description

DRAGEN Microbial Enrichment Plus (DME+), formerly known as the Explify Analysis Pipeline, offers a dedicated informatics solution with flexible analysis options for the following Illumina Infectious Disease and Microbiology target-capture enrichment panel kits: the Illumina Respiratory Pathogen ID/AMR Enrichment Panel Kit (RPIP), Illumina Urinary Pathogen ID/AMR Enrichment Panel Kit (UPIP), and Illumina Viral Surveillance Panel V2 Kit (VSP V2). The application delivers easy-to-use, powerful secondary analysis of Illumina sequencing data, with workflows for sample QC, viral WGS (whole-genome sequencing), pathogen detection and quantification, and antimicrobial resistance (AMR) marker profiling. It also supports custom reference sequence analysis.

  • RPIP: Target-capture enrichment of >280 RNA and DNA respiratory pathogens, including SARS-CoV-2, Influenza viruses, Respiratory syncytial virus, Mycobacterium and Legionella species, and >4000 AMR markers.

  • UPIP: Target-capture enrichment of >170 genitourinary pathogens, including fastidious, slow-growing, and anaerobic uropathogens, sexually transmitted microorganisms, and >4000 bacterial AMR markers.

  • VSP V2: Target-capture enrichment for whole-genome sequencing (WGS) of 200 RNA and DNA viruses prioritized as high-risk to public health, zoonotic surveillance, and biotech, and >200 viral AMR markers.

  • Custom: Analyze FASTQ/FASTA read files with a custom reference sequence database.

Note that samples enriched using the Illumina Respiratory Virus Oligo Panel/Respiratory Virus Enrichment Kit (RVOP/RVEK) and Viral Surveillance Panel Kit (VSP) can also be analyzed using DME+ and the VSP V2 database.

Pipeline Steps

The following table describes the different steps performed by the pipeline, which steps apply to each panel, and whether the step is run when using a set of custom references.

Step
Description
Panels
Custom References

Read QC

Can be disabled. Low-quality bases are trimmed. Short and low-quality reads are discarded. It is assumed that appropriate adapter trimming has already been performed.

All

Yes

Post-QC FASTQ Generation

Can choose to create a FASTQ with the trimmed reads, or a set of kingdom-specific FASTQs with the trimmed reads. Disabled by default.

All

Yes

Dehosting

Removes human reads.

All

Yes

Sample QC

Sample composition analysis and enrichment factor calculation (which requires an internal control).

All

No

Microorganism Classification

K-mer-based analysis with configurable sensitivity.

VSP V2

No

Microorganism Detection

Alignment-based analysis and consensus generation.

All

Yes

Microorganism Quantification

Requires an internal control.

All

No

Bacterial AMR Marker Analysis

Nucleotide and protein alignment, consensus generation, variant calling and annotation.

RPIP, UPIP

No

Viral Variant Calling

Detects variants from alignment results.

RPIP, VSP V2

No

Viral AMR Marker Analysis

Variant calling and annotation.

RPIP, VSP V2

No

Report Generation

Creates the AP JSON.

All

Yes

Command Line Settings

Option
Description

Required Inputs

--enable-explify

Enables the DME+ pipeline. (Default=false).

--output-file-prefix

Prefix for all output files.

--output-directory

Directory for all output files.

--explify-sample-list

Input sample list .tsv file with sample IDs, FASTQs, etc.

--explify-test-panel-name

"RPIP", "UPIP", "VSP V2", "Custom".

--explify-test-panel-version

Set to test panel version (e.g. "1.0.0").

--explify-ref-db-dir

Path to root directory for database files.

Optional Inputs

--intermediate-results-dir

Area for temporary files. Size must be greater than size of all FASTQ files multiplied by 3.

--explify-load-db-ram

Option to load database into RAM if not on ramdisk. (Default=false).

--explify-no-read-qc

Option to turn off read QC on FASTQs before analysis. (Default=false).

--explify-internal-control

Option to set internal control from an accepted list. (Default="Enterobacteria phage T7").

--explify-internal-control-concentration

Option to set internal control concentration. (Default=12100000).

--explify-ncpus

Option to set the number of CPUs available for processing.

--explify-sensitivity-threshold

Option to set sensitivity threshold for considering a virus present. Range: 0 < Integer < 1000. Only valid for VSP V2. (Default=5).

--explify-custom-ref-fasta

Reference FASTA file. Required for Custom reference DBs.

--explify-custom-ref-bed

Reference BED file. Optional for Custom reference DBs.

--explify-viral-consensus-depth-threshold

Minimum depth at position to include base in viral consensus sequence. Only relevant for RPIP and VSP V2 (Default=1).

--explify-viral-vc-depth-threshold

Minimum total depth at position to report viral variant. Only relevant for RPIP and VSP V2. (Default=5).

--explify-viral-vc-af-threshold

Minimum allele frequency to report viral variant. Only relevant for RPIP and VSP V2. (Default=0.2).

--explify-post-qc-fastq-mode

Create a single post-quality fastq file or files split by kingdom. Choices='off', 'single', 'split'. (Default=off).

Example Command Line

Input Details

Sample Input List

Applies to: --explify-sample-list

The sample input list is a column-formatted file with tab separations between the columns (i.e., a .tsv file).

Notes:

  • The SampleID values must be unique.

  • BatchID and RunID are to help users track and manage sample analyses. Often the BatchID is used to track libraries that were prepared together, and the RunID is used to track sequencing runs. They can also be left blank.

  • The ControlFlag value can be POS, NEG, BLANK, or left empty.

    • POS is used to indicate a positive control sample.

    • NEG is used to indicate a negative control sample.

    • BLANK is used to indicate a blank control sample (e.g. buffer only).

  • If there are multiple FASTQ files, they are tab delimited.

  • Please be very careful when editing tsv files. Some editors replace tabs with spaces without alerting the user.

Internal Control

Applies to: --explify-internal-control, --explify-internal-control-concentration

The user may specify one of the internal controls listed below. If NONE is specified, the internal control concentration is ignored. These are case-sensitive and must be input exactly as they appear:

  • Allobacillus halotolerans

  • Armored RNA Quant Internal Process Control

  • Enterobacteria phage T7 (This is the default)

  • Escherichia virus MS2

  • Escherichia virus Qbeta

  • Escherichia virus T4

  • Imtechella halotolerans

  • Phocid alphaherpesvirus 1

  • Phocine morbillivirus

  • Truepera radiovictrix

  • NONE

The internal control concentration is an integer representing the number of copies/mL of sample for the internal control.

Last updated

Was this helpful?