Explify Analysis Pipeline
Description
The Explify Analysis Pipeline offers a dedicated informatics solution with flexible analysis options for the following Illumina Infectious Disease and Microbiology target-capture enrichment panel kits: the Illumina Respiratory Pathogen ID/AMR Enrichment Panel Kit (RPIP), Illumina Urinary Pathogen ID/AMR Enrichment Panel Kit (UPIP), and Illumina Viral Surveillance Panel V2 Kit (VSP V2). The application delivers easy-to-use, powerful secondary analysis of Illumina sequencing data, with workflows for sample QC, viral WGS (whole-genome sequencing), pathogen detection and quantification, and antimicrobial resistance (AMR) marker profiling. It also supports custom reference sequence analysis.
RPIP: Target-capture enrichment of >280 RNA and DNA respiratory pathogens, including SARS-CoV-2, Influenza viruses, Respiratory syncytial virus, Mycobacterium and Legionella species, and >4000 AMR markers.
UPIP: Target-capture enrichment of >170 genitourinary pathogens, including fastidious, slow-growing, and anaerobic uropathogens, sexually transmitted microorganisms, and >4000 bacterial AMR markers.
VSP V2: Target-capture enrichment for whole-genome sequencing (WGS) of 200 RNA and DNA viruses prioritized as high-risk to public health, zoonotic surveillance, and biotech, and >200 viral AMR markers.
Custom: Analyze FASTQ/FASTA read files with a custom reference sequence database.
Note that samples enriched using the Illumina Respiratory Virus Oligo Panel/Respiratory Virus Enrichment Kit (RVOP/RVEK) and Viral Surveillance Panel Kit (VSP) can also be analyzed using the Explify Analysis Pipeline and VSPv2 database.
Command Line Settings
Required Inputs
--enable-explify
Enables the Explify Analysis Pipeline. (Default=false)
--output-file-prefix
Prefix for all output files.
--output-directory
Directory for all output files.
--explify-sample-list
Input sample list .tsv file with sample IDs, FASTQs, etc.
--explify-test-panel-name
"RPIP", "UPIP", "VSPv2", "Custom".
--explify-test-panel-version
Set to test panel version (e.g. "1.0.0").
--explify-ref-db-dir
Path to root directory for Explify Database files.
Optional Inputs
--intermediate-results-dir
Area for temporary files. Size must be greater than size of all FASTQ files multiplied by 3.
--explify-load-db-ram
Option to load database into RAM if not on ramdisk. (Default=false).
--explify-no-read-qc
Option to turn off read QC on FASTQs before analysis. (Default=false).
--explify-internal-control
Option to set internal control from an accepted list. (Default="Enterobacteria phage T7")
--explify-internal-control-concentration
Option to set internal control concentration. (Default=12100000)
--explify-ncpus
Option to set the number of CPUs available for processing.
--explify-sensitivity-threshold
Option to set sensitivity threshold. Range: 0 < Integer < 1000. Only valid for VSPv2. (Default=5).
--explify-custom-ref-fasta
Reference FASTA file. Required for Custom reference DBs.
--explify-custom-ref-bed
Reference BED file. Optional for Custom reference DBs.
Example Command Line
Input Details
Sample Input List
Applies to: --explify-sample-list
The sample input list is a column-formatted file with tab separations between the columns (i.e., a .tsv
file).
Notes:
The SampleID values must be unique.
BatchID and RunID are to help users track and manage sample analyses. Often the BatchID is used to track libraries that were prepared together, and the RunID is used to track sequencing runs. They can also be left blank.
The ControlFlag value can be POS, NEG, BLANK, or left empty.
POS is used to indicate a positive control sample.
NEG is used to indicate a negative control sample.
BLANK is used to indicate a blank control sample (e.g. buffer only).
If there are multiple FASTQ files, they are tab delimited.
Please be very careful when editing tsv files. Some editors replace tabs with spaces without alerting the user.
Internal Control
Applies to: --explify-internal-control
, --explify-internal-control-concentration
The user may specify one of the internal controls listed below. If NONE
is specified, the internal control concentration is ignored. These are case-sensitive and must be input exactly as they appear:
Allobacillus halotolerans
Armored RNA Quant Internal Process Control
Enterobacteria phage T7
(This is the default)Escherichia virus MS2
Escherichia virus Qbeta
Escherichia virus T4
Imtechella halotolerans
Phocid alphaherpesvirus 1
Phocine morbillivirus
Truepera radiovictrix
NONE
The internal control concentration is an integer representing the number of copies/mL of sample for the internal control.
Reference Databases
Applies to: --explify-ref-db-dir
, --explify-test-panel-name
, --explify-test-panel-version
, --explify-load-db-ram
,--explify-custom-ref-fasta
, --explify-custom-ref-bed
An Explify Reference Database is required to run the Explify Analysis Pipeline in DRAGEN. The databases are stored remotely and must be downloaded prior to running an analysis. The database download script provided to facilitate the download is described below.
Directory Setup
Prior to downloading the databases, create a directory that will be dedicated to storing them. It is recommended that the directory be on a disk with at least 150 GB of free space. The path to this directory will be used for the -d
parameter when the download script is run in subsequent steps: "explify-databases/" is used in the examples below.
Obtaining the Download Script
Download and management of Explify reference databases is handled by a shell script. The script can be downloaded with the following command:
Seeing What Databases are Available for Download
The search
subcommand can be used to list what databases can be downloaded:
The
-d
argument is the base directory used for storage of the databasesOptionally, when a test panel name is specified with the
-p
argument, the results will be limited to that panelOptionally, setting the
-n
argument will filter the search to databases that have not already been downloaded
Downloading a Database
The download
subcommand is used to download the database files for a test panel:
The
-d
argument is the base directory used for storage of the databasesThe
-p
argument is the test panel nameThe
-v
argument is the test panel versionThe
-n
argument is the number of CPUs that can be used to download the files (defaults to 1)
Additional notes:
In this example, after the UPIP-8.6.0 are downloaded, additional required files will be downloaded to a subdirectory named "common"
After the files are downloaded, their checksums will be automatically checked
Due to the size of some of the files, this command will take some time. It is best to run it via
screen
ornohup
Listing Downloaded Databases
The list
subcommand is used to view the databases that have already been downloaded:
The
-d
argument is the base directory used for storage of the databasesOptionally, when a test panel name is specified with the
-p
argument, the results will be limited to that panel
Checking Database Integrity
The download
subcommand will automatically check the file checksums after download. The check
subcommand can also be used on its own to check the files:
The
-d
argument is the base directory used for storage of the databasesThe
-p
argument is the test panel nameThe
-v
argument is the test panel versionThe
-n
argument is the number of CPUs that can be used to download the files (defaults to 1)
Using the Databases with the Explify Analysis Pipeline
Assume the Explify database distributable, when unpacked, has a root directory name of /explify-databases
. The database files will be organized in this root directory first by test panel type, then by test panel version:
To run an analysis with RPIP 6.5.1, for example, the following inputs would be needed:
The Explify Analysis Pipeline will use these inputs to navigate to the specified database location, namely /explify-databases/RPIP/6.5.1
.
If the databases are stored on a normal file system, it is recommended that you set --explify-load-db-ram=true
. This will tell the Explify Analysis Pipeline to load the databases into memory for faster analysis. It is also allowable to store the databases on a RAM disk, which reduces load time over many Explify Analysis Pipeline runs. In this case, it is recommended to set --explify-load-db-ram=false
.
Using the Custom Database Option
To use a Custom database, references are supplied through a FASTA file via --explify-custom-ref-fasta
and an optional BED file via --explify-custom-ref-bed
. Note that you must have downloaded the Custom database and set --explify-test-panel-name
to "Custom", and set --explify-test-panel-version
to the version you have downloaded. The supplied Custom Explify Reference Database is used by the Explify Analysis Pipeline to filter out host reads.
In the FASTA file, sequence names must be unique and should not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use only the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output.
The BED file must be tab-delimited with at least 4 columns:
chrom: the sequence name as it appears in the FASTA
chromStart: start position (always set to 0)
chromEnd: end position (sequence length)
genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)
segmentName (optional): the name of the segment or gene (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome
Sequence names must match between the FASTA file and BED file, and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.
The BED file controls how sequences are labeled in the output JSON. If the custom reference FASTA file includes sequences from multiple segments, it is recommended to provide a BED file so that the segments are included under the results of that microorganism.
Output Details
The output of the Explify Analysis Pipeline is a single ap.json
file written to the specified output directory containing general metadata, version information, sample QC, microorganism, and AMR marker results, as well as detailed test information.
ap.json format
Top-Level Node
The top-level section of the output JSON contains general metadata and version information.
.accession
Identifier used for the sample
.deploymentEnvironment
Environment in which the results were produced
.batchId
Identifier used for the batch of samples processed together
.analysisId
Identifier used for the analysis
.runId
Identifier used for the sequencing run
.controlFlag
Indicates whether the sample is a control. It is based on the ControlFlag field in the sample .tsv
and can be set to “POS”, “NEG”, “BLANK”, or “-”
.dragenVersion
DRAGEN release version
.analysisPipelineVersion
Analysis Pipeline release version
.testType
Type of test panel ("RPIP", "UPIP", "VSPv2", "Custom")
.testVersion
Test panel release version
.testName
Full name of test panel
.testUse
Test use. "For Research Use Only. Not for use in diagnostic procedures"
.reportTime
Date and time the report was generated
.warnings
List of warnings encountered during the analysis
.errors
List of errors encountered during the analysis
.qcReport.sampleQc Node
This section contains information about sample quality control (QC). The fields are relative to .qcReport.sampleQc
.totalRawBases
Number of base pairs in sample before read QC processing
.totalRawReads
Number of reads in sample before read QC processing
.uniqueReads
Number of distinct reads in sample before read QC processing
.uniqueReadsProportion
Proportion of distinct reads in sample before read QC processing
.preQualityMeanReadLength
Average read length before read QC processing
.postQualityMeanReadLength
Average read length after read QC processing
.postQualityReads
Number of reads in sample after read QC processing, inclusive of any duplicate reads
.postQualityReadsProportion
Proportion of post-quality reads in sample relative to total raw reads
.removedInDehostingReads
Number of host reads in sample removed during dehosting (host = human)
.removedInDehostingReadsProportion
Proportion of host reads in sample removed relative to total raw reads (host = human)
.entropy
Shannon entropy of the counts of 5-mers in the reads after read QC processing, which is a measure of randomness
.gContent
Proportion of guanine (G) base calls in reads after read QC processing
.libraryQScore
Quality score of the library after read QC processing
.qcReport.enrichmentFactor Node
This section contains information about the enrichment factor calculation. Detection of an appropriate Internal Control is required. The fields are relative to .qcReport.enrichmentFactor
.value
Enrichment factor value reflecting how well targeted regions were enriched
.category
Enrichment factor category: "poor", "fair", "good", or "not calculated"
.qcReport.sampleComposition Node
This section contains information about the composition of the sample. The fields are relative to .qcReport.sampleComposition
.readClassification
Proportion of post-quality reads classified to the following categories:
.readClassification.targetedMicrobial
Targeted microbial
.readClassification.targetedInternalControl
Targeted Internal Control
.readClassification.untargeted
Untargeted
.readClassification.ambiguous
More than one category
.readClassification.unclassified
No category
.readClassification.lowComplexity
Low complexity
.targetedMicrobial
Proportion of post-quality targeted microbial reads classified to the following sub-categories:
.targetedMicrobial.viral
Viral targeted
.targetedMicrobial.bacterial
Bacterial targeted
.targetedMicrobial.fungal
Fungal targeted
.targetedMicrobial.parasitic
Parasitic targeted
.targetedMicrobial.bacterialAmr
Bacterial AMR targeted
.untargeted
Proportion of post-quality untargeted reads classified to the following sub-categories:
.untargeted.viral
Viral untargeted
.untargeted.bacterial
Bacterial untargeted
.untargeted.fungal
Fungal untargeted
.untargeted.parasitic
Parasitic untargeted
.untargeted.bacterialAmr
Bacterial AMR untargeted
.untargeted.internalControl
Internal Control untargeted
.untargeted.human
Human untargeted
.viral
Proportion of post-quality viral reads classified to the following categories:
.viral.targeted
Viral targeted
.viral.untargeted
Viral untargeted
.viral.untargetedSubcategories
Proportion of post-quality viral untargeted reads classified to the following sub-categories:
.viral.untargetedSubcategories.panel
Viral panel members
.viral.untargetedSubcategories.phage
Viral phage
.viral.untargetedSubcategories.other
Viral other (not a panel member or phage)
.bacterial
Proportion of post-quality bacterial reads classified to the following categories:
.bacterial.targeted
Bacterial targeted
.bacterial.untargeted
Bacterial untargeted
.bacterial.untargetedSubcategories
Proportion of post-quality bacterial untargeted reads classified to the following sub-categories:
.bacterial.untargetedSubcategories.panel
Bacterial panel members
.bacterial.untargetedSubcategories.ribosomalDna
Bacterial ribosomal DNA (16S)
.bacterial.untargetedSubcategories.plasmid
Bacterial plasmids
.bacterial.untargetedSubcategories.other
Bacterial other (not a panel member, ribosomal DNA, or plasmid)
.fungal
Proportion of post-quality fungal reads classified to the following categories:
.fungal.targeted
Fungal targeted
.fungal.untargeted
Fungal untargeted
.fungal.untargetedSubcategories
Proportion of post-quality fungal untargeted reads classified to the following sub-categories:
.fungal.untargetedSubcategories.panel
Fungal panel members
.fungal.untargetedSubcategories.ribosomalDna
Fungal ribosomal DNA (18S)
.fungal.untargetedSubcategories.other
Fungal other (not a panel member or ribosomal DNA)
.parasitic
Proportion of post-quality parasitic reads classified to the following categories:
.parasitic.targeted
Parasitic targeted
.parasitic.untargeted
Parasitic untargeted
.parasitic.untargetedSubcategories
Proportion of post-quality parasitic untargeted reads classified to the following sub-categories:
.parasitic.untargetedSubcategories.panel
Parasitic panel members
.parasitic.untargetedSubcategories.ribosomalDna
Parasitic ribosomal DNA (18S)
.parasitic.untargetedSubcategories.other
Parasitic other (not a panel member or ribosomal DNA)
.human
Proportion of post-quality human reads classified to the following categories:
.human.untargeted
Human untargeted
.human.untargetedSubcategories
Proportion of post-quality human untargeted reads classified to the following sub-categories:
.human.untargetedSubcategories.ribosomalDna
Human ribosomal DNA
.human.untargetedSubcategories.codingSequence
Human coding sequence
.human.untargetedSubcategories.other
Human other (not ribosomal DNA or coding sequence)
.internalControl
Proportion of post-quality Internal Control reads classified to the following categories:
.internalControl.targeted
Internal Control targeted
.internalControl.untargeted
Internal Control untargeted
.microbialAndInternalControl
Proportion of post-quality Microbial and Internal Control reads classified to the following categories:
.microbialAndInternalControl.targeted
Microbial and Internal Control targeted
.microbialAndInternalControl.untargeted
Microbial and Internal Control untargeted
.bacterialAmr
Proportion of post-quality bacterial AMR reads classified to the following categories:
.bacterialAmr.targeted
Bacterial AMR targeted
.bacterialAmr.untargeted
Bacterial AMR untargeted
.qcReport.internalControls Node
This section contains information about internal control detection. The value of the .qcReport.internalControls
field is an array of objects containing name and RPKM information for each Internal Control. See the code block below for an example:
.userOptions Node
This section gives information about analysis options specified by the user. The fields are relative to .userOptions
.quantitativeInternalControlName
Quantitative Internal Control used for microorganism absolute quantification (recommendation: Enterobacteria phage T7)
.quantitativeInternalControlConcentration
Quantitative Internal Control concentration (recommendation: 1.21 x 10^7 copies/mL of sample)
.readQcEnabled
Boolean indicating if read QC (trimming and filtering based on quality and read length) is enabled
.readClassificationSensitivity
(VSP V2 only) Sensitivity threshold for classifying reads. Determines whether alignment should proceed for a microorganism and/or reference sequence. Value is an integer with a valid range of 1 to 1000, inclusive
.customPanelFastaFile
(Custom Panel only) Name of the custom reference FASTA file
.customPanelBedFile
(Custom Panel only) Name of the custom reference BED file
.targetReport.microorganisms[] Node
The value of the .targetReport.microorganisms[]
field is an array of objects containing information about detected microorganisms. The following table describes one .targetReport.microorganisms[]
object. The fields are relative to .targetReport.microorganisms[]
.class
Microorganism class ("viral", "bacterial", "fungal", "parasite")
.name
Name of microorganism
.coverage
Proportion of targeted microorganism reference sequence bases that appear in sample sequencing reads
.ani
Average nucleotide identity of consensus sequence to targeted microorganism reference sequences
.medianDepth
Median depth of sample sequencing reads aligned to targeted microorganism reference sequences, indicating the median number of times each targeted microorganism reference sequence base appears in sample sequencing reads
.condensedDepthVector
Read depth across the targeted microorganism reference sequences, condensed to 256 bins
.rpkm
Normalized representation of the number of sample sequencing reads aligned to targeted microorganism reference sequences (targeted Reads mapped Per Kilobase of targeted sequence per Million quality-filtered reads)
.alignedReadCount
Number of sample sequencing reads that aligned to targeted microorganism reference sequences
.kmerReadCount
(UPIP only) Number of sample sequencing reads classified to targeted microorganism reference sequences
.absoluteQuantityRatio
Numerical absolute quantification value. Quantitative internal control required for calculation
.absoluteQuantityRatioFormatted
Formatted absolute quantification value with units. Quantitative internal control required for calculation
.phenotypicGroup
(RPIP, UPIP only) Grouping indicating general association with normal flora, colonization, or contamination from the environment or other sources, as well as general association with disease
.associatedAmrMarkers
(Bacteria only) Information about the bacterial AMR markers associated with the microorganism
.associatedAmrMarkers.applicable
Boolean indicating whether one or more bacterial AMR markers are associated with the microorganism
.associatedAmrMarkers.detected
List of detected bacterial AMR markers associated with the microorganism
.associatedAmrMarkers.predicted
List of predicted bacterial AMR markers associated with the microorganism
.consensusGenomeSequences
(RPIP, VSP V2 viruses only) Information about the majority consensus genome (or segment) sequence
.consensusGenomeSequences.sequence
Consensus genome (or segment) sequence bases
.consensusGenomeSequences.referenceAccession
Accession of the reference genome (or segment) sequence
.consensusGenomeSequences.referenceDescription
Description of the reference genome (or segment) sequence
.consensusGenomeSequences.referenceLength
Length of the reference genome (or segment) sequence
.consensusGenomeSequences.maximumAlignmentLength
Longest contiguous alignment between consensus sequence and reference genome (or segment) sequence
.consensusGenomeSequences.maximumGapLength
Longest contiguous alignment gap (insertion or deletion) between consensus sequence and reference genome (or segment) sequence
.consensusGenomeSequences.maximumUnalignedLength
Longest section of the reference genome (or segment) sequence not aligned to by consensus sequence
.consensusGenomeSequences.coverage
Proportion of reference genome (or segment) sequence bases that appear in sample sequencing reads
.consensusGenomeSequences.ani
Average nucleotide identity of consensus sequence to reference genome (or segment) sequence
.consensusGenomeSequences.alignedReadCount
Number of sample sequencing reads that aligned to reference genome (or segment) sequence
.consensusGenomeSequences.medianDepth
Median depth of sample sequencing reads aligned to reference genome (or segment) sequence, indicating the median number of times each reference genome (or segment) sequence base appears in sample sequencing reads
.consensusGenomeSequences.targetAnnotation
List of targeted region annotations for the reference genome (or segment) sequence. Each annotation is a JSON object with the following fields: start (int), end (int), strand (string: "+", "-"), target_name (string), type (string)
.consensusGenomeSequences.condensedDepthVector
Read depth across the reference genome (or segment) sequence, condensed to 256 bins
.consensusTargetSequences
(RPIP viruses only) Information about the majority targeted region consensus sequences
.consensusTargetSequences.sequence
Consensus targeted region sequence bases
.consensusTargetSequences.name
Name of the targeted region
.consensusTargetSequences.referenceAccession
Accession of the targeted region reference sequence
.consensusTargetSequences.depthVector
Read depth across the targeted region reference sequence, not condensed
.predictionInformation
Information about microorganism prediction results
.predictionInformation.predictedPresent
Boolean indicating whether the microorganism passed its reporting logic algorithm
.predictionInformation.notes
List of notes about the prediction result
.predictionInformation.subpanels
List of pre-defined subpanels that the microorganism belongs to
.predictionInformation.relatedMicroorganisms
Array of objects with information about genetically related microorganisms. See below for details
.variants
(all VSP V2 viruses, RPIP: SARS-CoV-2 & FluA/B/C only) Information about viral variants. See below for details
.targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[] Node
The value of the .targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[]
field is an array of objects containing information about genetically related microorganisms. The following table describes one .targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[]
object. The fields are relative to .targetReport.microorganisms[].predictionInformation[].relatedMicroorganisms[]
.name
Name of related microorganism
.onPanel
Boolean indicating whether the related microorganism is a panel member
.kmerReadCount
(UPIP only) Number of sample sequencing reads classified to related microorganism reference sequences
.coverage
Proportion of related microorganism reference sequence bases that appear in sample sequencing reads
.ani
Average nucleotide identity of consensus sequence to related microorganism reference sequences
.alignedReadCount
Number of sample sequencing reads that aligned to related microorganism reference sequences
.targetReport.microorganisms[].variants[] Node
The value of the .targetReport.microorganisms[].variants[]
field is an array of objects containing information about viral variants for all VSP V2 viruses, RPIP: SARS-CoV-2 & FluA/B/C only. The following table describes one .targetReport.microorganisms[].variants[]
object. The fields are relative to .targetReport.microorganisms[].variants[]
.referenceAccession
Accession of reference genome (or segment) sequence used for variant calling
.segment
(Segmented viruses only) Segment number of reference segment sequence
.ntChange
Nucleotide change associated with variant
.referencePosition
Variant position in viral reference genome (or segment) sequence
.referenceAllele
Reference allele at variant position
.variantAllele
Variant allele
.depth
Variant depth, indicating the number of times variant position appears in sample sequencing reads
.alleleFrequency
Frequency of variant allele in sample sequencing reads
.targetReport.amrMarkers[] Node
The value of the .targetReport.amrMarkers[]
field is an array of objects containing information about detected bacterial AMR markers. The following table describes one .targetReport.amrMarkers[]
object. The fields are relative to .targetReport.amrMarkers[]
.class
Microorganism class ("bacterial")
.cardModelType
Bacterial AMR marker model type in the Comprehensive Antibiotic Resistance Database (CARD) ("homolog", "protein variant", "rRNA variant")
.cardGeneFamily
Bacterial AMR marker gene family in the Comprehensive Antibiotic Resistance Database (CARD)
.name
Bacterial AMR marker name
.cardName
Bacterial AMR marker name in the Comprehensive Antibiotic Resistance Database (CARD)
.ncbiName
Bacterial AMR marker name in the National Center for Biotechnology Information (NCBI) Reference Gene Catalog
.referenceAccession
Accession of the bacterial AMR marker reference sequence
.coverage
Proportion of bacterial AMR marker reference sequence residues that appear in sample sequencing reads (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)
.pid
Percent identity of consensus sequence aligned to bacterial AMR marker reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)
.medianDepth
Median depth of sample sequencing reads aligned to bacterial AMR marker reference sequence, indicating the median number of times each bacterial AMR marker sequence residue appears in sample sequencing reads (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)
.rpkm
Normalized representation of the number of sample sequencing reads aligned to bacterial AMR reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)
.alignedReadCount
Number of sample sequencing reads that aligned to bacterial AMR reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)
.nucleotideConsensusSequence
Nucleotide consensus sequence bases
.proteinConsensusSequence
Protein consensus sequence bases
.nucleotideDepthVector
Read depth across the bacterial AMR marker nucleotide reference sequence, not condensed
.proteinDepthVector
Read depth across the bacterial AMR marker protein reference sequence, not condensed
.associatedMicroorganisms
Information about the microorganisms associated with the bacterial AMR marker
.associatedMicroorganisms.all
List of all microorganisms associated with the bacterial AMR marker
.associatedMicroorganisms.detected
List of detected microorganisms associated with the bacterial AMR marker
.associatedMicroorganisms.predicted
List of predicted microorganisms associated with the bacterial AMR marker
.predictionInformation
Information about bacterial AMR marker prediction results
.predictionInformation.predictedPresent
Boolean indicating whether the bacterial AMR marker passed its reporting logic algorithm
.predictionInformation.confidence
Confidence level of bacterial AMR marker prediction ("high", "medium", "low")
.predictionInformation.notes
List of notes about the prediction result
.targetReport.amrMarkers[].variants[] Node
The value of the .targetReport.amrMarkers[].variants[]
field is an array of objects containing information about variants for bacterial AMR markers with "protein variant" or "rRNA variant" model types. The following table describes one .targetReport.amrMarkers[].variants[]
object. The fields are relative to .targetReport.amrMarkers[].variants[]
.category
Variant category ("Bacterial Variant; Known AMR")
.referenceSourceMicroorganism
Microorganism that reference sequence is associated with in NCBI
.comments
List of additional information regarding the variant
.product
Protein product of gene
.ntChange
Nucleotide change associated with variant
.referencePosition
Variant position in reference sequence
.referenceAllele
Reference allele at variant position
.variantAllele
Variant allele
.depth
Variant depth, indicating the number of times variant position appears in sample sequencing reads
.alleleFrequency
Frequency of variant allele in sample sequencing reads
.annotation
Type of change (e.g. "Nonsynonymous Variant")
.aaChange
Amino acid change associated with variant
.epistaticGroups
List of epistatic groups variant is associated with
.targetReport.customReferences[] Node
This section contains information about custom reference detection results and is only present for custom database analyses. When only a custom reference FASTA file is provided (no BED file), each .targetReport.customReferences[]
object contains information for a single reference sequence. When both a FASTA and BED file are provided, each .targetReport.customReferences[]
object contains information for a single genome/microorganism, which can be a collection of one or more reference sequences. The fields are relative to .targetReport.customReferences[]
.name
Provided name of custom reference sequence, accession, genome, or microorganism
.coverage
Proportion of custom reference sequence bases that appear in sample sequencing reads
.ani
Average nucleolotide identity of consensus sequence to custom reference sequence or, if specified, collection of one or more custom reference sequences
.medianDepth
Median depth of sample sequencing reads aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences, indicating the med\ian number of times each custom reference sequence base appears in sample sequencing reads
.condensedDepthVector
Read depth across custom reference sequence or, if specified, collection of one or more custom reference sequences, condensed to 256 bins
.rpkm
Normalized number of sample sequencing reads aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences (targeted Reads mapped Per Kilobase of targeted sequence per Million quality-filtered reads)
.alignedReadCount
Number of sample sequencing reads that aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences
.consensusSequences
Array of objects with information about each consensus sequence. See below for details
.variants
Array of objects with information about variants detected in custom reference sequence or, if specified, collection of one or more custom reference sequences. See below for details
.targetReport.customReferences[].consensusSequences[] Node
The value of the .targetReport.customReferences[].consensusSequences[]
field is an array of objects containing majority consensus sequence information for a single custom reference sequence. When only a FASTA file is provided (no BED file), there will be only one object in the array. When both a FASTA and BED file are provided, there may be more than one object in the array. The fields are relative to .targetReport.customReferences[].consensusSequences[]
.sequence
Majority consensus sequence bases
.referenceAccession
Accession of custom reference sequence
.referenceDescription
Description of custom reference sequence
.referenceLength
Length of custom reference sequence
.coverage
Proportion of custom reference sequence bases that appear in sample sequencing reads
.ani
Average nucleolotide identity of consensus sequence to custom reference sequence
.medianDepth
Median depth of sample sequencing reads aligned to custom reference sequence, indicating the median number of times each custom reference sequence base appears in sample sequencing reads
.depthVector
Read depth across custom reference sequence, not condensed
.alignedReadCount
Number of sample sequencing reads that aligned to custom reference sequence
.maximumAlignmentLength
Longest contiguous alignment between consensus sequence and custom reference sequence
.maximumGapLength
Longest contiguous alignment gap (insertion or deletion) between consensus sequence and custom reference sequence
.maximumUnalignedLength
Longest section of custom reference sequence not aligned to by consensus sequence
.targetReport.customReferences[].variants[] Node
The value of the .targetReport.customReferences[].variants[]
field is an array of objects containing information about a single detected variant. The fields are relative to .targetReport.customReferences[].variants[]
.ntChange
Nucleotide change associated with variant
.referenceAccession
Accession of custom reference sequence used for variant calling
.referencePosition
Variant position in custom reference sequence
.referenceAllele
Reference allele at variant position
.variantAllele
Variant allele
.depth
Variant depth, indicating the number of times variant position appears in sample sequencing reads
.alleleFrequency
Frequency of variant allele in sample sequencing reads
Last updated