Custom Database

Instead of using one of the pre-built reference databases, it is possible to create and use a database with a custom set of reference sequences. Note that it is still required to download the Custom database, which is used by the pipeline to filter out host reads.

FASTA and BED Files

Custom reference sequences are supplied through a FASTA file via --explify-custom-ref-fasta with metadata in an optional BED file provided via --explify-custom-ref-bed.

In the FASTA file, sequence names must be unique and should not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use only the following in sequence names: letters, numbers, underscore (_), hyphen (-), parentheses ((, )), and period (.). Otherwise, the sequence names may appear different in the output.

The BED file must be tab-delimited with at least 4 columns:

  1. chrom: the sequence name as it appears in the FASTA

  2. chromStart: start position (always set to 0)

  3. chromEnd: end position (sequence length)

  4. genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)

  5. segmentName (optional): the name of the segment or gene (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome

Sequence names must match between the FASTA file and BED file, and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.

The BED file controls how sequences are labeled in the output JSON. If the custom reference FASTA file includes sequences from multiple segments, it is recommended to provide a BED file so that the segments are included under the results of that microorganism.

Example Command Line

dragen \
  --enable-explify true \
  --output-file-prefix <PREFIX> \
  --explify-sample-list /path/to/sample/list/tsv \
  --explify-test-panel-name Custom \
  --explify-test-panel-version 1.0.0 \
  --explify-ref-db-dir /path/to/root/db/dir \
  --explify-custom-ref-fasta /path/to/custom.fna \
  --explify-custom-ref-bed /path/to/custom.bed \
  --explify-load-db-ram true \
  --output-directory <OUTPUT_DIR> \
  --intermediate-results-dir <OUTPUT_DIR> \
  --explify-ncpus=20

Last updated

Was this helpful?