Explify Analysis Pipeline

Description

The Explify Analysis Pipeline offers a dedicated informatics solution for the Illumina Respiratory Pathogen ID/AMR Enrichment Panel Kit (RPIP), Illumina Urinary Pathogen ID/AMR Enrichment Panel Kit (UPIP), and Illumina Viral Surveillance V2 Kit (VSP V2). The application delivers powerful, simple data analysis for simultaneous detection, quantification, and profiling of microorganisms and antimicrobial resistance (AMR) markers.

  • RPIP: Targeted enrichment of >280 RNA and DNA respiratory pathogens, including SARS-CoV-2, Influenza viruses, Respiratory syncytial virus, Mycobacterium and Legionella species, and >4000 AMR markers.

  • UPIP: Targeted enrichment of >170 genitourinary pathogens, including fastidious, slow-growing, and anaerobic uropathogens, sexually transmitted microorganisms, and >4000 bacterial AMR markers.

  • VSP V2: Targeted enrichment for whole-genome sequencing (WGS) of 200 RNA and DNA viruses prioritized as high-risk to public health, zoonotic surveillance, and biotech, and >200 viral AMR markers.

The Explify Analysis Pipeline can also be used to analyze FASTQ/FASTA read files with a set of custom reference sequences.

Command Line Settings

OptionDescription

Required Inputs

--enable-explify

Enables the Explify Analysis Pipeline. (Default=false)

--output-file-prefix

Prefix for all output files.

--output-directory

Directory for all output files.

--explify-sample-list

Input sample list .tsv file with sample IDs, FASTQs, etc.

--explify-test-panel-name

"RPIP", "UPIP", "VSPv2", "Custom".

--explify-test-panel-version

Set to test panel version (e.g. "1.0.0").

--explify-ref-db-dir

Path to root directory for Explify Database files.

Optional Inputs

--intermediate-results-dir

Area for temporary files. Size must be greater than size of all FASTQ files multiplied by 3.

--explify-load-db-ram

Option to load database into RAM if not on ramdisk. (Default=false).

--explify-no-read-qc

Option to turn off read QC on FASTQs before analysis. (Default=false).

--explify-internal-control

Option to set internal control from an accepted list. (Default="Enterobacteria phage T7")

--explify-internal-control-concentration

Option to set internal control concentration. (Default=12100000)

--explify-ncpus

Option to set the number of CPUs available for processing.

--explify-sensitivity-threshold

Option to set sensitivity threshold. Range: 0 < Integer < 1000. Only valid for VSPv2. (Default=5).

--explify-custom-ref-fasta

Reference FASTA file. Required for Custom reference DBs.

--explify-custom-ref-bed

Reference BED file. Optional for Custom reference DBs.

Example Command Line

dragen \
  --enable-explify=true \
  --output-file-prefix <PREFIX> \
  --explify-sample-list /path/to/sample/list/tsv \
  --explify-test-panel-name <"RPIP"/"UPIP"/"VSPv2"/"Custom"> \
  --explify-test-panel-version <VERSION> \
  --explify-ref-db-dir /path/to/root/db/dir \
  --explify-load-db-ram=true \
  --output-directory <OUTPUT_DIR> \
  --intermediate-results-dir <OUTPUT_DIR> \
  --explify-ncpus=1

Input Details

Sample Input List

Applies to: --explify-sample-list

The sample input list is a column-formatted file with tab separations between the columns (i.e., a .tsv file).

SampleID     BatchID     RunID     ControlFlag     FastQs
MySample     MyBatch     MyRun     POS             /path/to/fastq1.gz     /path/to/fastq2.gz

Notes:

  • The SampleID values must be unique.

  • BatchID and RunID are to help users track and manage sample analyses. Often the BatchID is used to track libraries that were prepared together, and the RunID is used to track sequencing runs. They can also be left blank.

  • The ControlFlag value can be POS, NEG, BLANK, or left empty.

    • POS is used to indicate a positive control sample.

    • NEG is used to indicate a negative control sample.

    • BLANK is used to indicate a blank control sample (e.g. buffer only).

  • If there are multiple FASTQ files, they are tab delimited.

  • Please be very careful when editing tsv files. Some editors replace tabs with spaces without alerting the user.

Internal Control

Applies to: --explify-internal-control, --explify-internal-control-concentration

The user may specify one of the internal controls listed below. If NONE is specified, the internal control concentration is ignored. These are case-sensitive and must be input exactly as they appear:

  • Allobacillus halotolerans

  • Armored RNA Quant Internal Process Control

  • Enterobacteria phage T7 (This is the default)

  • Escherichia virus MS2

  • Escherichia virus Qbeta

  • Escherichia virus T4

  • Imtechella halotolerans

  • Phocid alphaherpesvirus 1

  • Phocine morbillivirus

  • Truepera radiovictrix

  • NONE

The internal control concentration is an integer representing the number of copies/mL of sample for the internal control.

Reference Databases

Applies to: --explify-ref-db-dir, --explify-test-panel-name, --explify-test-panel-version, --explify-load-db-ram,--explify-custom-ref-fasta, --explify-custom-ref-bed

An Explify Reference Database is required to run the Explify Analysis Pipeline in DRAGEN. The databases are stored remotely and must be downloaded prior to running an analysis. The database download script provided to facilitate the download is described below.

Directory Setup

Prior to downloading the databases, create a directory that will be dedicated to storing them. It is recommended that the directory be on a disk with at least 150 GB of free space. The path to this directory will be used for the -d parameter when the download script is run in subsequent steps: "explify-databases/" is used in the examples below.

Obtaining the Download Script

Download and management of Explify reference databases is handled by a shell script. The script can be downloaded with the following command:

wget -O explify-dbs.sh https://illumina-explify-databases.s3.us-east-1.amazonaws.com/explify-dbs.sh
chmod +x explify-dbs.sh

Seeing What Databases are Available for Download

The search subcommand can be used to list what databases can be downloaded:

$ ./explify-dbs.sh search -d explify-databases/
4 database(s) found meeting those criteria:
- Custom-1.0.0
- RPIP-6.3.0
- UPIP-8.3.0
- VSPv2-2.3.0
  • The -d argument is the base directory used for storage of the databases

  • Optionally, when a test panel name is specified with the -p argument, the results will be limited to that panel

  • Optionally, setting the -n argument will filter the search to databases that have not already been downloaded

Downloading a Database

The download subcommand is used to download the database files for a test panel:

./explify-dbs.sh download -d explify-databases/ -p UPIP -v 8.3.0 -n 20
  • The -d argument is the base directory used for storage of the databases

  • The -p argument is the test panel name

  • The -v argument is the test panel version

  • The -n argument is the number of CPUs that can be used to download the files (defaults to 1)

Additional notes:

  • In this example, after the UPIP-8.3.0 files are downloaded, additional required files will be downloaded to a subdirectory named "common"

  • After the files are downloaded, their checksums will be automatically checked

  • Due to the size of some of the files, this command will take some time. It is best to run it via screen or nohup

Listing Downloaded Databases

The list subcommand is used to view the databases that have already been downloaded:

$ ./explify-dbs.sh list -d explify-databases/
  • The -d argument is the base directory used for storage of the databases

  • Optionally, when a test panel name is specified with the -p argument, the results will be limited to that panel

Checking Database Integrity

The download subcommand will automatically check the file checksums after download. The check subcommand can also be used on its own to check the files:

$ ./explify-dbs.sh check -d explify-databases/ -p UPIP -v 8.3.0 -n 20
  • The -d argument is the base directory used for storage of the databases

  • The -p argument is the test panel name

  • The -v argument is the test panel version

  • The -n argument is the number of CPUs that can be used to download the files (defaults to 1)

Using the Databases with the Explify Analysis Pipeline

Assume the Explify database distributable, when unpacked, has a root directory name of /explify-databases. The database files will be organized in this root directory first by test panel type, then by test panel version:

explify-databases/
    Custom/
        1.0.0/
    RPIP/
        6.3.0/
    UPIP/
        8.3.0/
    VSPv2/
        2.3.0/

To run an analysis with RPIP 6.3.0, for example, the following inputs would be needed:

--explify-ref-db-dir /explify-databases
--explify-test-panel-name RPIP
--explify-test-panel-version 6.3.0

The Explify Analysis Pipeline will use these inputs to navigate to the specified database location, namely /explify-databases/RPIP/6.3.0.

If the databases are stored on a normal file system, it is recommended that you set --explify-load-db-ram=true. This will tell the Explify Analysis Pipeline to load the databases into memory for faster analysis. It is also allowable to store the databases on a RAM disk, which reduces load time over many Explify Analysis Pipeline runs. In this case, it is recommended to set --explify-load-db-ram=false.

Using the Custom Database Option

To use a Custom database, references are supplied through a FASTA file via --explify-custom-ref-fasta and an optional BED file via --explify-custom-ref-bed. Note that you must have downloaded the Custom database and set --explify-test-panel-name to "Custom", and set --explify-test-panel-version to the version you have downloaded. The supplied Custom Explify Reference Database is used by the Explify Analysis Pipeline to filter out host reads.

In the FASTA file, sequence names must be unique and must not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output.

The BED file must be tab-delimited with at least 4 columns:

  1. chrom: the sequence name as it appears in the FASTA

  2. chromStart: start position (always set to 0)

  3. chromEnd: end position (sequence length)

  4. genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)

  5. segmentName (optional): the name of the segment or gene (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome

Sequence names must match between the FASTA and BED file, and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.

The BED file controls how sequences are labeled in the output JSON. If the Custom Reference FASTA includes sequences from multiple segments, it is recommended to provide this BED file so that the segments are included under the results of that microorganism.

Output Details

The output of the Explify Analysis Pipeline is a single report.json file containing sample QC and targeted microorganism and AMR marker detection results written to the specified output dirtectory.

Report.json format

Top-Level Node

The fields in the top-level node of the output JSON provide general metadata and version information.

FieldDescription

.accession

Sample identifier.

.deploymentEnvironment

The environment in which the results were produced.

.batchId

Identifier used for a batch of samples prepared in the lab at the same time.

.analysisId

Identifier for the analysis.

.runId

Identifier used for a sequencing run.

.controlFlag

Indicates whether the sample is a control. It is based on the ControlFlag field in the sample .tsv and can be set to “POS”, “NEG”, “BLANK”, or “-”.

.dragenVersion

The DRAGEN release version.

.analysisPipelineVersion

The analysis pipeline version.

.testType

"RPIP", "UPIP", "VSPv2", or "Custom".

.testVersion

Version of the test.

.testName

Name of the test, e.g. "Explify® Respiratory Pathogen ID/AMR Panel (RPIP) - Data Analysis Solution".

.testUse

"For Research Use Only. Not for use in diagnostic procedures".

.reportTime

Time the report was generated.

.warnings

A list of warnings encountered during the analysis.

.errors

A list of errors encountered during the analysis.

.qcReport Node

All of the fields are relative to .qcReport. This section provides information about sampleQc

FieldDescription

.sampleQc

Sample QC information.

.sampleQc.totalRawBases

Number of base pairs in sample before read QC processing.

.sampleQc.totalRawReads

Number of reads in sample before read QC processing.

.sampleQc.uniqueReads

Nuber of reads in sample before read QC processing.

.sampleQc.uniqueReadsProportion

Proportion of unique reads in sample before read QC processing.

.sampleQc.preQualityMeanReadLength

Average read length before read QC processing.

.sampleQc.postQualityMeanReadLength

Average read length after read QC processing.

.sampleQc.postQualityReads

Number of reads in sample after read QC processing.

.sampleQc.postQualityReadsProportion

Proportion of post-quality reads in smple relative to total raw reads.

.sampleQc.removedInDehostingReads

Number of host reads in sample removed during dehosting.

.sampleQc.removedInDehostingReadsProportion

Proportion of host reads in sample removed relative to total raw reads.

.sampleQc.entropy

Kmer entropy of reads after read QC processing.

.sampleQc.gContent

Proportion of guanine (G) base calls in reads after read QC processing.

.sampleQc.libraryQScore

Quality score of the library after read QC processing.

.sampleQc.enrichmentFactor

Enrichment factor information (calculation requires detection of an appropriate Internal Control).

.sampleQc.enrichmentFactor.value

Enrichment factor value reflecting how well targeted regions were enriched.

.sampleQc.enrichmentFactor.category

Enrichment factor category: "poor", "fair", "good", or "not calculated".

.qcReport.sampleComposition Node

All of the fields are relative to .qcReport.sampleComposition. This section provides information about the composition of the sample.

FieldDescription

.readClassification

Proportion of reads classified to the following groups:

.readClassification.targetedMicrobial

Targeted microbial (non-IC) reference sequences

.readClassification.targetedInternalControl

Targeted IC reference sequences

.readClassification.untargeted

Untargeted reference sequences

.readClassification.ambiguous

More than one pathogen class

.readClassification.unclassified

Could not be classified

.readClassification.lowComplexity

Low complexity sequence

.targetedMicrobial

Proportion of targeted reads classified to the following groups:

.targetedMicrobial.viral

Viral targeted sequences

.targetedMicrobial.bacterial

Bacterial targeted sequences

.targetedMicrobial.fungal

Fungal targeted sequences

.targetedMicrobial.parasitic

Parasitic targeted sequences

.targetedMicrobial.bacterialAmr

Bacterial AMR targeted sequences

.untargeted

Proportion of untargeted reads classified to the following groups:

.untargeted.viral

Viral untargeted sequences

.untargeted.bacterial

Bacterial untargeted sequences

.untargeted.fungal

Fungal untargeted sequences

.untargeted.parasitic

Parasitic untargeted sequences

.untargeted.bacterialAmr

Bacterial AMR untargeted sequences

.untargeted.internalControl

Internal Control (IC) untargeted sequences

.untargeted.human

Human sequences

.viral

.viral.targeted

.viral.untargeted

.viral.untargetedSubcategories

.viral.untargetedSubcategories.panel

.viral.untargetedSubcategories.phage

.viral.untargetedSubcategories.other

.bacterial

.bacterial.targeted

.bacterial.untargeted

.bacterial.untargetedSubcategories

.bacterial.untargetedSubcategories.panel

.bacterial.untargetedSubcategories.ribosomalDna

.bacterial.untargetedSubcategories.plasmid

.bacterial.untargetedSubcategories.other

.fungal

.fungal.targeted

.fungal.untargeted

.fungal.untargetedSubcategories

.fungal.untargetedSubcategories.panel

.fungal.untargetedSubcategories.ribosomalDna

.fungal.untargetedSubcategories.other

.parasitic

.parasitic.targeted

.parasitic.untargeted

.parasitic.untargetedSubcategories

.parasitic.untargetedSubcategories.panel

.parasitic.untargetedSubcategories.ribosomalDna

.parasitic.untargetedSubcategories.other

.human

.human.untargeted

.human.untargetedSubcategories

.human.untargetedSubcategories.ribosomalDna

.human.untargetedSubcategories.codingSequence

.human.untargetedSubcategories.other

.internalControl

.internalControl.targeted

.internalControl.untargeted

.microbialAndInternalControl

.microbialAndInternalControl.targeted

.microbialAndInternalControl.untargeted

.bacterialAmr

.bacterialAmr.targeted

.bacterialAmr.untargeted

.qcReport.internalControls Node

The internalControls object is a list that gives the name and RPKM for the 10 possible IC organisms. See the code block below for an example:

[
    {
        "name": "Allobacillus halotolerans",
        "rpkm": 0
    },
    {
        "name": "Armored RNA Quant Internal Process Control",
        "rpkm": 0
    },
    {
        "name": "Enterobacteria phage T7",
        "rpkm": 180323
    },
    {
        "name": "Escherichia virus MS2",
        "rpkm": 0
    },
    {
        "name": "Escherichia virus Qbeta",
        "rpkm": 0
    },
    {
        "name": "Escherichia virus T4",
        "rpkm": 0
    },
    {
        "name": "Imtechella halotolerans",
        "rpkm": 0
    },
    {
        "name": "Phocid alphaherpesvirus 1",
        "rpkm": 0
    },
    {
        "name": "Phocine morbillivirus",
        "rpkm": 0
    },
    {
        "name": "Truepera radiovictrix",
        "rpkm": 0
    }
]

.userOptions Node

The fields are relative to .userOptions

FieldDescription

.quantitativeInternalControlName

The quantitative Internal Control used for microorganism absolute quantification (recommendation: Enterobacteria phage T7)

.quantitativeInternalControlConcentration

The quantitative Internal Control concentration (recommendation: 1.21 x 10^7 copies/mL of sample)

.readQcEnabled

Boolean field that indicates whether read QC (trimming and filtering based on quality and read length) was enabled.

.readClassificationSensitivity

Sensitivity threshold for classifying reads. Determines whether alignment should proceed for a microorganism and/or reference sequence. Only used for VSPv2.

.targetReport.microorganisms Node

The fields are relative to .targetReport.microorganisms. The value of the microorganisms field is an array of objects describing organism detections. The following table describes one microorganisms object.

FieldDescription

.class

Microorganism class (viral, bacterial, fungal, parasite)

.name

Name of detected microorganism

.coverage

Proportion of targeted microorganism sequence bases that appear in sequencing reads

.ani

Average nucleotide identity of majority consensus sequence to targeted microorganism reference sequences

.medianDepth

Median depth of reads aligned to targeted microorganism reference sequences, indicating the median number of times each targeted microorganism sequence base appears in sequencing reads

.condensedDepthVector

The depths across the microorganism's targeted reference genes, condensed (if needed) down to 256 items.

.rpkm

Normalized representation of the number of reads aligned to targeted microorganism reference sequences (aligned reads per kilobase of targeted sequence per million reads)

.alignedReadCount

The number of reads that aligned to the organism's target genes.

.kmerReadCount

The number of reads that were assigned to the microorganism's targeted genes with k-mer classification.

.absoluteQuantityRatio

Numerical absolute quantification value

.absoluteQuantityRatioFormatted

Formatted absolute quantification value and units

.phenotypicGroup

Grouping indicating general association with normal flora, colonization, or contamination from the environment or other sources, as well as general association with disease

.associatedAmrMarkers

Information about the detected and predicted AMR markers associated with this bacterium. Only present for bacteria.

.associatedAmrMarkers.applicable

A boolean field that indicates whether the bacterium has one or more AMR markers associated with it in the database.

.associatedAmrMarkers.detected

A list of the detected AMR markers associated with this bacterium. Only present for bacteria.

.associatedAmrMarkers.predicted

A list of the predicted AMR markers associated with this bacterium. Only present for bacteria.

.consensusGenomeSequences

Consensus genome information. Included for RPIP viruses only.

.consensusGenomeSequences.sequence

The consensus genome (or segment) sequence.

.consensusGenomeSequences.referenceAccession

The accession for the reference.

.consensusGenomeSequences.referenceDescription

A description of the reference.

.consensusGenomeSequences.referenceLength

The length of the reference genome.

.consensusGenomeSequences.maximumAlignmentLength

The longest contiguous alignment between consensus and reference sequences.

.consensusGenomeSequences.maximumGapLength

The longest contiguous gap (insertion or deletion) within the alignment between consensus and reference sequences.

.consensusGenomeSequences.maximumUnalignedLength

The longest section of the reference sequence not aligned to by the consensus sequence.

.consensusGenomeSequences.coverage

Proportion of reference sequence bases that appear in sequencing reads

.consensusGenomeSequences.ani

Average nucleotide identity of majority consensus sequence to genome reference sequences

.consensusGenomeSequences.alignedReadCount

The number of reads that aligned to the organism's target genes.

.consensusGenomeSequences.medianDepth

Median depth of reads aligned to genome reference sequences, indicating the median number of times each genome sequence base appears in sequencing reads

.consensusGenomeSequences.targetAnnotation

A list of target annotations for the genome. Each annotation is a JSON object with the following fields: start (int), end (int), strand (string), target_name (string), type (string).

.consensusGenomeSequences.condensedDpethVector

The depth vector for the genome, condensed to 256 items.

.consensusTargetSequences

Information about the consensus sequences for the target sequences.

.consensusTargetSequences.sequence

The consensus sequence for the target.

.consensusTargetSequences.name

The name of the target sequence.

.consensusTargetSequences.referenceAccession

The accession of the reference target.

.consensusTargetSequences.depthVector

The full depth vector for this target gene.

.predictionInformation

Information about Explify's automated interpretation results.

.predictionInformation.predictedPresent

Whether Explify interpretation predicts that the organism is present (true/false)

.predictionInformation.notes

List of notes about the interpretation result.

.predictionInformation.subpanels

A list of the subpanels that the organism belongs to.

.predictionInformation.relatedMicroorganisms

An object that gives key metrics for closely-related on- and off-panel organisms that were detected. See below for details.

.targetReport.microorganisms.relatedMicroorganisms Node

The relatedMicroorganisms object includes a list of the organisms that were considered as part of this organism's interpretation. The fields below describe an object in the relatedOrganisms array.

FieldDescription

.name

Related microorganism's name

.onPanel

Whether the related microorganism is on the panel or not.

.kmerReadCount

The number of reads assigned to the microorganism using a k-mer based appraoch. This field is only present when this approach is applied. Currently, it is present for UPIP but not RPIP.

.coverage

The coverage to the microorganism resulting from alignment.

.ani

The ANI to the microorganism resulting from alignment.

.alignedReadCount

The read count to the organism resulting from alignment.

.targetReport.microorganisms.variants Node

The fields are relative to .targetReport.microorganisms.variants. The variants object is only present for select viruses.

FieldDescription

.referenceAccession

NCBI accession of reference sequence used for variant calling.

.segment

(Influenza A only). Segment number of reference sequence

.ntChange

Nucleotide change associated with the variant

.referencePosition

Variant position in reference sequence

.referenceAllele

Reference allele at same position as the variant

.variantAllele

Variant allele

.depth

Variant depth, indicating the number of times the variant appears in sequencing reads.

.alleleFrequency

Frequency of the variant allele in the sequencing reads.

.targetReport.amrMarkers Node

The fields are relative to .targetReport.amrMarkers. This section provides information about the detected bacterial AMR markers.

FieldDescription

.class

Microorganism class (e.g. bacterial)

.cardModelType

AMR marker detection model specified by CARD (homolog, protein variant, rRNA variant)

.cardGeneFamily

AMR marker family name in CARD

.name

AMR marker name

.cardName

Name of marker in the CARD database

.ncbiName

Name of marker in the NCBI database

.referenceAccession

NCBI or CARD accession of AMR marker reference sequence

.coverage

Proportion of reference sequence residues that appear in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)

.pid

Percent identity of majority consensus sequence aligned to reference sequence (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)

.medianDepth

Median depth of reads aligned to AMR marker reference sequence, indicating the median number of times each AMR marker sequence residue appears in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)

.rpkm

Median depth of reads aligned to AMR marker reference sequence, indicating the median number of times each AMR marker sequence residue appears in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type)

.alignedReadCount

The read count to the marker resulting from alignment.

.nucleotideConsensusSequence

(UPIP only) The nucleotide consensus sequence.

.proteinConsensusSequence

(UPIP only) The protein consensus sequence.

.nucleotideDepthVector

The depths across the nucleotide alignment, not condensed.

.proteinDepthVector

The depths across the protein alignment, not condensed.

.associatedMicroorganisms

Lists of the detected and predicted organisms associated with this marker.

.associatedMicroorganisms.all

A list of all organisms associated with this marker.

.associatedMicroorganisms.detected

A list of the detected organisms associated with this marker.

.associatedMicroorganisms.predicted

A list of the predicted organisms associated with this marker.

.predictionInformation

Information about Explify's automated interpretation results.

.predictionInformation.predictedPresent

Whether Explify interpretation predicts that the marker is present (true/false)

.predictionInformation.confidence

Whether the AMR marker is predicted with high or medium confidence.

.predictionInformation.notes

List of notes about the interpretation result.

.targetReport.amrMarkers.variants Node

The fields are relative to targetReport.amrMarkers.variants. This section provides information about variants detected on select bacterial AMR markers.

FieldDescription

.category

"Bacterial Variant; Known AMR"

.referenceSourceMicroorganism

Microorganism that reference sequence is associated with in NCBI

.comments

Comments about the variant

.product

The protein product of the gene

.ntChange

The nucleotide change

.referencePosition

The position on the reference sequence

.referenceAllele

The reference sequence at the position of the variant

.variantAllele

The variant sequence

.depth

The depth at the variant position

.alleleFrequency

The frequency of the variant allele in the read pileup

.annotation

Type of change (e.g. "Nonsynonymous Variant")

.aaChange

Amino acid change

.epistaticGroups

List of epistatic groups the variant is associated with.

.customReferences Node

Only present and populated for custom reference analyses. When only a fasta file is submitted (no BED file), each customReferences object will be for a single reference. When a BED file is provided, each customReferences object is for a single organism/genome and can be for one or more references. The values in the Field column are relative to targetReport.customReferences.

FieldDescription

.alignedReadCount

Number of reads aligned to the reference or organism.

.ani

Average nucleotide identity of majority consensus sequence to genome reference sequences.

.condensedDepthVector

The depths across the consensus sequences, condensed (if needed) down to 256 items.

.consensusSequences

Array of objects with information about each consensus sequence for this reference or organism.

.coverage

Proportion of reference sequence bases that appear in sequencing reads

.medianDepth

Median depth of reads aligned to reference sequences, indicating the median number of times each genome sequence base appears in sequencing reads

.name

Either the name (accession) of the reference or the organism

.pangoLineage

Pango lineage information for SARS-CoV-2. Only present if pangolin is run.

.rpkm

Normalized representation of the number of reads aligned to targeted microorganism reference sequences (aligned reads per kilobase of targeted sequence per million reads)

.variants

Array of objects with information about variants detected in the reference sequences

.customReferences.consensusSequences Node

consensusSequences is an array of objects with each object describing the results for a single reference. When only a fasta file is submitted (no BED file), there will be only one reference in the array. When a BED file is provided, there could be more than one. The values in the Field column are relative to targetReport.customReferences[].consensusSequences[]

FieldDescription

.alignedReadCount

Number of reads aligned to the reference or organism.

.ani

Average nucleotide identity of majority consensus sequence to genome reference sequences.

.coverage

Proportion of reference sequence bases that appear in sequencing reads

.depthVector

Depths for each base in the sequence

.maximumAlignmentLength

The longest contiguous alignment between consensus and reference sequences.

.maximumGapLength

The longest contiguous gap between consensus and reference sequences.

.maximumUnalignedLength

Longest stretch of unaligned sequence

.medianDepth

Median depth of reads aligned to reference sequences, indicating the median number of times each genome sequence base appears in sequencing reads

.referenceAccession

Accession of the sequence

.referenceDescription

Description of the sequence

.referenceLength

Length of the reference sequence

.sequence

The consensus sequence

.customReferences.variants Node

Variants is an array of objects with each object describing a single detected variant. The values in the Field column are relative to targetReport.customReferences[].variants[].

FieldDescription

.alleleFrequency

Frequency of the variant allele in the sequencing reads.

.depth

The depth at the variant position

.ntChange

The nucleotide change

.referenceAccession

Accession of the associated reference

.referenceAllele

The reference sequence at the position of the variant

.referencePosition

The position on the reference sequence

.variantAllele

The variant sequence

Last updated