Explify Analysis Pipeline
Description
The Explify Analysis Pipeline offers a dedicated informatics solution for the Illumina Respiratory Pathogen ID/AMR Enrichment Panel Kit (RPIP), Illumina Urinary Pathogen ID/AMR Enrichment Panel Kit (UPIP), and Illumina Viral Surveillance V2 Kit (VSP V2). The application delivers powerful, simple data analysis for simultaneous detection, quantification, and profiling of microorganisms and antimicrobial resistance (AMR) markers.
RPIP: Targeted enrichment of >280 RNA and DNA respiratory pathogens, including SARS-CoV-2, Influenza viruses, Respiratory syncytial virus, Mycobacterium and Legionella species, and >4000 AMR markers.
UPIP: Targeted enrichment of >170 genitourinary pathogens, including fastidious, slow-growing, and anaerobic uropathogens, sexually transmitted microorganisms, and >4000 bacterial AMR markers.
VSP V2: Targeted enrichment for whole-genome sequencing (WGS) of 200 RNA and DNA viruses prioritized as high-risk to public health, zoonotic surveillance, and biotech, and >200 viral AMR markers.
The Explify Analysis Pipeline can also be used to analyze FASTQ/FASTA read files with a set of custom reference sequences.
Command Line Settings
Option | Description |
---|---|
Required Inputs | |
| Enables the Explify Analysis Pipeline. (Default=false) |
| Prefix for all output files. |
| Directory for all output files. |
| Input sample list .tsv file with sample IDs, FASTQs, etc. |
| "RPIP", "UPIP", "VSPv2", "Custom". |
| Set to test panel version (e.g. "1.0.0"). |
| Path to root directory for Explify Database files. |
Optional Inputs | |
| Area for temporary files. Size must be greater than size of all FASTQ files multiplied by 3. |
| Option to load database into RAM if not on ramdisk. (Default=false). |
| Option to turn off read QC on FASTQs before analysis. (Default=false). |
| Option to set internal control from an accepted list. (Default="Enterobacteria phage T7") |
| Option to set internal control concentration. (Default=12100000) |
| Option to set the number of CPUs available for processing. |
| Option to set sensitivity threshold. Range: 0 < Integer < 1000. Only valid for VSPv2. (Default=5). |
| Reference FASTA file. Required for Custom reference DBs. |
| Reference BED file. Optional for Custom reference DBs. |
Example Command Line
Input Details
Sample Input List
Applies to: --explify-sample-list
The sample input list is a column-formatted file with tab separations between the columns (i.e., a .tsv
file).
Notes:
The SampleID values must be unique.
BatchID and RunID are to help users track and manage sample analyses. Often the BatchID is used to track libraries that were prepared together, and the RunID is used to track sequencing runs. They can also be left blank.
The ControlFlag value can be POS, NEG, BLANK, or left empty.
POS is used to indicate a positive control sample.
NEG is used to indicate a negative control sample.
BLANK is used to indicate a blank control sample (e.g. buffer only).
If there are multiple FASTQ files, they are tab delimited.
Please be very careful when editing tsv files. Some editors replace tabs with spaces without alerting the user.
Internal Control
Applies to: --explify-internal-control
, --explify-internal-control-concentration
The user may specify one of the internal controls listed below. If NONE
is specified, the internal control concentration is ignored. These are case-sensitive and must be input exactly as they appear:
Allobacillus halotolerans
Armored RNA Quant Internal Process Control
Enterobacteria phage T7
(This is the default)Escherichia virus MS2
Escherichia virus Qbeta
Escherichia virus T4
Imtechella halotolerans
Phocid alphaherpesvirus 1
Phocine morbillivirus
Truepera radiovictrix
NONE
The internal control concentration is an integer representing the number of copies/mL of sample for the internal control.
Reference Databases
Applies to: --explify-ref-db-dir
, --explify-test-panel-name
, --explify-test-panel-version
, --explify-load-db-ram
,--explify-custom-ref-fasta
, --explify-custom-ref-bed
An Explify Reference Database is required to run the Explify Analysis Pipeline in DRAGEN. The databases are stored remotely and must be downloaded prior to running an analysis. The database download script provided to facilitate the download is described below.
Directory Setup
Prior to downloading the databases, create a directory that will be dedicated to storing them. It is recommended that the directory be on a disk with at least 150 GB of free space. The path to this directory will be used for the -d
parameter when the download script is run in subsequent steps: "explify-databases/" is used in the examples below.
Obtaining the Download Script
Download and management of Explify reference databases is handled by a shell script. The script can be downloaded with the following command:
Seeing What Databases are Available for Download
The search
subcommand can be used to list what databases can be downloaded:
The
-d
argument is the base directory used for storage of the databasesOptionally, when a test panel name is specified with the
-p
argument, the results will be limited to that panelOptionally, setting the
-n
argument will filter the search to databases that have not already been downloaded
Downloading a Database
The download
subcommand is used to download the database files for a test panel:
The
-d
argument is the base directory used for storage of the databasesThe
-p
argument is the test panel nameThe
-v
argument is the test panel versionThe
-n
argument is the number of CPUs that can be used to download the files (defaults to 1)
Additional notes:
In this example, after the UPIP-8.3.0 files are downloaded, additional required files will be downloaded to a subdirectory named "common"
After the files are downloaded, their checksums will be automatically checked
Due to the size of some of the files, this command will take some time. It is best to run it via
screen
ornohup
Listing Downloaded Databases
The list
subcommand is used to view the databases that have already been downloaded:
The
-d
argument is the base directory used for storage of the databasesOptionally, when a test panel name is specified with the
-p
argument, the results will be limited to that panel
Checking Database Integrity
The download
subcommand will automatically check the file checksums after download. The check
subcommand can also be used on its own to check the files:
The
-d
argument is the base directory used for storage of the databasesThe
-p
argument is the test panel nameThe
-v
argument is the test panel versionThe
-n
argument is the number of CPUs that can be used to download the files (defaults to 1)
Using the Databases with the Explify Analysis Pipeline
Assume the Explify database distributable, when unpacked, has a root directory name of /explify-databases
. The database files will be organized in this root directory first by test panel type, then by test panel version:
To run an analysis with RPIP 6.3.0, for example, the following inputs would be needed:
The Explify Analysis Pipeline will use these inputs to navigate to the specified database location, namely /explify-databases/RPIP/6.3.0
.
If the databases are stored on a normal file system, it is recommended that you set --explify-load-db-ram=true
. This will tell the Explify Analysis Pipeline to load the databases into memory for faster analysis. It is also allowable to store the databases on a RAM disk, which reduces load time over many Explify Analysis Pipeline runs. In this case, it is recommended to set --explify-load-db-ram=false
.
Using the Custom Database Option
To use a Custom database, references are supplied through a FASTA file via --explify-custom-ref-fasta
and an optional BED file via --explify-custom-ref-bed
. Note that you must have downloaded the Custom database and set --explify-test-panel-name
to "Custom", and set --explify-test-panel-version
to the version you have downloaded. The supplied Custom Explify Reference Database is used by the Explify Analysis Pipeline to filter out host reads.
In the FASTA file, sequence names must be unique and must not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output.
The BED file must be tab-delimited with at least 4 columns:
chrom: the sequence name as it appears in the FASTA
chromStart: start position (always set to 0)
chromEnd: end position (sequence length)
genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)
segmentName (optional): the name of the segment or gene (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome
Sequence names must match between the FASTA and BED file, and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.
The BED file controls how sequences are labeled in the output JSON. If the Custom Reference FASTA includes sequences from multiple segments, it is recommended to provide this BED file so that the segments are included under the results of that microorganism.
Output Details
The output of the Explify Analysis Pipeline is a single report.json
file containing sample QC and targeted microorganism and AMR marker detection results written to the specified output dirtectory.
Report.json format
Top-Level Node
The fields in the top-level node of the output JSON provide general metadata and version information.
Field | Description |
---|---|
.accession | Sample identifier. |
.deploymentEnvironment | The environment in which the results were produced. |
.batchId | Identifier used for a batch of samples prepared in the lab at the same time. |
.analysisId | Identifier for the analysis. |
.runId | Identifier used for a sequencing run. |
.controlFlag | Indicates whether the sample is a control. It is based on the ControlFlag field in the sample |
.dragenVersion | The DRAGEN release version. |
.analysisPipelineVersion | The analysis pipeline version. |
.testType | "RPIP", "UPIP", "VSPv2", or "Custom". |
.testVersion | Version of the test. |
.testName | Name of the test, e.g. "Explify® Respiratory Pathogen ID/AMR Panel (RPIP) - Data Analysis Solution". |
.testUse | "For Research Use Only. Not for use in diagnostic procedures". |
.reportTime | Time the report was generated. |
.warnings | A list of warnings encountered during the analysis. |
.errors | A list of errors encountered during the analysis. |
.qcReport Node
All of the fields are relative to .qcReport. This section provides information about sampleQc
Field | Description |
---|---|
.sampleQc | Sample QC information. |
.sampleQc.totalRawBases | Number of base pairs in sample before read QC processing. |
.sampleQc.totalRawReads | Number of reads in sample before read QC processing. |
.sampleQc.uniqueReads | Nuber of reads in sample before read QC processing. |
.sampleQc.uniqueReadsProportion | Proportion of unique reads in sample before read QC processing. |
.sampleQc.preQualityMeanReadLength | Average read length before read QC processing. |
.sampleQc.postQualityMeanReadLength | Average read length after read QC processing. |
.sampleQc.postQualityReads | Number of reads in sample after read QC processing. |
.sampleQc.postQualityReadsProportion | Proportion of post-quality reads in smple relative to total raw reads. |
.sampleQc.removedInDehostingReads | Number of host reads in sample removed during dehosting. |
.sampleQc.removedInDehostingReadsProportion | Proportion of host reads in sample removed relative to total raw reads. |
.sampleQc.entropy | Kmer entropy of reads after read QC processing. |
.sampleQc.gContent | Proportion of guanine (G) base calls in reads after read QC processing. |
.sampleQc.libraryQScore | Quality score of the library after read QC processing. |
.sampleQc.enrichmentFactor | Enrichment factor information (calculation requires detection of an appropriate Internal Control). |
.sampleQc.enrichmentFactor.value | Enrichment factor value reflecting how well targeted regions were enriched. |
.sampleQc.enrichmentFactor.category | Enrichment factor category: "poor", "fair", "good", or "not calculated". |
.qcReport.sampleComposition Node
All of the fields are relative to .qcReport.sampleComposition. This section provides information about the composition of the sample.
Field | Description |
---|---|
.readClassification | Proportion of reads classified to the following groups: |
.readClassification.targetedMicrobial | Targeted microbial (non-IC) reference sequences |
.readClassification.targetedInternalControl | Targeted IC reference sequences |
.readClassification.untargeted | Untargeted reference sequences |
.readClassification.ambiguous | More than one pathogen class |
.readClassification.unclassified | Could not be classified |
.readClassification.lowComplexity | Low complexity sequence |
.targetedMicrobial | Proportion of targeted reads classified to the following groups: |
.targetedMicrobial.viral | Viral targeted sequences |
.targetedMicrobial.bacterial | Bacterial targeted sequences |
.targetedMicrobial.fungal | Fungal targeted sequences |
.targetedMicrobial.parasitic | Parasitic targeted sequences |
.targetedMicrobial.bacterialAmr | Bacterial AMR targeted sequences |
.untargeted | Proportion of untargeted reads classified to the following groups: |
.untargeted.viral | Viral untargeted sequences |
.untargeted.bacterial | Bacterial untargeted sequences |
.untargeted.fungal | Fungal untargeted sequences |
.untargeted.parasitic | Parasitic untargeted sequences |
.untargeted.bacterialAmr | Bacterial AMR untargeted sequences |
.untargeted.internalControl | Internal Control (IC) untargeted sequences |
.untargeted.human | Human sequences |
.viral | |
.viral.targeted | |
.viral.untargeted | |
.viral.untargetedSubcategories | |
.viral.untargetedSubcategories.panel | |
.viral.untargetedSubcategories.phage | |
.viral.untargetedSubcategories.other | |
.bacterial | |
.bacterial.targeted | |
.bacterial.untargeted | |
.bacterial.untargetedSubcategories | |
.bacterial.untargetedSubcategories.panel | |
.bacterial.untargetedSubcategories.ribosomalDna | |
.bacterial.untargetedSubcategories.plasmid | |
.bacterial.untargetedSubcategories.other | |
.fungal | |
.fungal.targeted | |
.fungal.untargeted | |
.fungal.untargetedSubcategories | |
.fungal.untargetedSubcategories.panel | |
.fungal.untargetedSubcategories.ribosomalDna | |
.fungal.untargetedSubcategories.other | |
.parasitic | |
.parasitic.targeted | |
.parasitic.untargeted | |
.parasitic.untargetedSubcategories | |
.parasitic.untargetedSubcategories.panel | |
.parasitic.untargetedSubcategories.ribosomalDna | |
.parasitic.untargetedSubcategories.other | |
.human | |
.human.untargeted | |
.human.untargetedSubcategories | |
.human.untargetedSubcategories.ribosomalDna | |
.human.untargetedSubcategories.codingSequence | |
.human.untargetedSubcategories.other | |
.internalControl | |
.internalControl.targeted | |
.internalControl.untargeted | |
.microbialAndInternalControl | |
.microbialAndInternalControl.targeted | |
.microbialAndInternalControl.untargeted | |
.bacterialAmr | |
.bacterialAmr.targeted | |
.bacterialAmr.untargeted |
.qcReport.internalControls Node
The internalControls object is a list that gives the name and RPKM for the 10 possible IC organisms. See the code block below for an example:
.userOptions Node
The fields are relative to .userOptions
Field | Description |
---|---|
.quantitativeInternalControlName | The quantitative Internal Control used for microorganism absolute quantification (recommendation: Enterobacteria phage T7) |
.quantitativeInternalControlConcentration | The quantitative Internal Control concentration (recommendation: 1.21 x 10^7 copies/mL of sample) |
.readQcEnabled | Boolean field that indicates whether read QC (trimming and filtering based on quality and read length) was enabled. |
.readClassificationSensitivity | Sensitivity threshold for classifying reads. Determines whether alignment should proceed for a microorganism and/or reference sequence. Only used for VSPv2. |
.targetReport.microorganisms Node
The fields are relative to .targetReport.microorganisms. The value of the microorganisms field is an array of objects describing organism detections. The following table describes one microorganisms object.
Field | Description |
---|---|
.class | Microorganism class (viral, bacterial, fungal, parasite) |
.name | Name of detected microorganism |
.coverage | Proportion of targeted microorganism sequence bases that appear in sequencing reads |
.ani | Average nucleotide identity of majority consensus sequence to targeted microorganism reference sequences |
.medianDepth | Median depth of reads aligned to targeted microorganism reference sequences, indicating the median number of times each targeted microorganism sequence base appears in sequencing reads |
.condensedDepthVector | The depths across the microorganism's targeted reference genes, condensed (if needed) down to 256 items. |
.rpkm | Normalized representation of the number of reads aligned to targeted microorganism reference sequences (aligned reads per kilobase of targeted sequence per million reads) |
.alignedReadCount | The number of reads that aligned to the organism's target genes. |
.kmerReadCount | The number of reads that were assigned to the microorganism's targeted genes with k-mer classification. |
.absoluteQuantityRatio | Numerical absolute quantification value |
.absoluteQuantityRatioFormatted | Formatted absolute quantification value and units |
.phenotypicGroup | Grouping indicating general association with normal flora, colonization, or contamination from the environment or other sources, as well as general association with disease |
.associatedAmrMarkers | Information about the detected and predicted AMR markers associated with this bacterium. Only present for bacteria. |
.associatedAmrMarkers.applicable | A boolean field that indicates whether the bacterium has one or more AMR markers associated with it in the database. |
.associatedAmrMarkers.detected | A list of the detected AMR markers associated with this bacterium. Only present for bacteria. |
.associatedAmrMarkers.predicted | A list of the predicted AMR markers associated with this bacterium. Only present for bacteria. |
.consensusGenomeSequences | Consensus genome information. Included for RPIP viruses only. |
.consensusGenomeSequences.sequence | The consensus genome (or segment) sequence. |
.consensusGenomeSequences.referenceAccession | The accession for the reference. |
.consensusGenomeSequences.referenceDescription | A description of the reference. |
.consensusGenomeSequences.referenceLength | The length of the reference genome. |
.consensusGenomeSequences.maximumAlignmentLength | The longest contiguous alignment between consensus and reference sequences. |
.consensusGenomeSequences.maximumGapLength | The longest contiguous gap (insertion or deletion) within the alignment between consensus and reference sequences. |
.consensusGenomeSequences.maximumUnalignedLength | The longest section of the reference sequence not aligned to by the consensus sequence. |
.consensusGenomeSequences.coverage | Proportion of reference sequence bases that appear in sequencing reads |
.consensusGenomeSequences.ani | Average nucleotide identity of majority consensus sequence to genome reference sequences |
.consensusGenomeSequences.alignedReadCount | The number of reads that aligned to the organism's target genes. |
.consensusGenomeSequences.medianDepth | Median depth of reads aligned to genome reference sequences, indicating the median number of times each genome sequence base appears in sequencing reads |
.consensusGenomeSequences.targetAnnotation | A list of target annotations for the genome. Each annotation is a JSON object with the following fields: start (int), end (int), strand (string), target_name (string), type (string). |
.consensusGenomeSequences.condensedDpethVector | The depth vector for the genome, condensed to 256 items. |
.consensusTargetSequences | Information about the consensus sequences for the target sequences. |
.consensusTargetSequences.sequence | The consensus sequence for the target. |
.consensusTargetSequences.name | The name of the target sequence. |
.consensusTargetSequences.referenceAccession | The accession of the reference target. |
.consensusTargetSequences.depthVector | The full depth vector for this target gene. |
.predictionInformation | Information about Explify's automated interpretation results. |
.predictionInformation.predictedPresent | Whether Explify interpretation predicts that the organism is present (true/false) |
.predictionInformation.notes | List of notes about the interpretation result. |
.predictionInformation.subpanels | A list of the subpanels that the organism belongs to. |
.predictionInformation.relatedMicroorganisms | An object that gives key metrics for closely-related on- and off-panel organisms that were detected. See below for details. |
.targetReport.microorganisms.relatedMicroorganisms Node
The relatedMicroorganisms object includes a list of the organisms that were considered as part of this organism's interpretation. The fields below describe an object in the relatedOrganisms array.
Field | Description |
---|---|
.name | Related microorganism's name |
.onPanel | Whether the related microorganism is on the panel or not. |
.kmerReadCount | The number of reads assigned to the microorganism using a k-mer based appraoch. This field is only present when this approach is applied. Currently, it is present for UPIP but not RPIP. |
.coverage | The coverage to the microorganism resulting from alignment. |
.ani | The ANI to the microorganism resulting from alignment. |
.alignedReadCount | The read count to the organism resulting from alignment. |
.targetReport.microorganisms.variants Node
The fields are relative to .targetReport.microorganisms.variants. The variants object is only present for select viruses.
Field | Description |
---|---|
.referenceAccession | NCBI accession of reference sequence used for variant calling. |
.segment | (Influenza A only). Segment number of reference sequence |
.ntChange | Nucleotide change associated with the variant |
.referencePosition | Variant position in reference sequence |
.referenceAllele | Reference allele at same position as the variant |
.variantAllele | Variant allele |
.depth | Variant depth, indicating the number of times the variant appears in sequencing reads. |
.alleleFrequency | Frequency of the variant allele in the sequencing reads. |
.targetReport.amrMarkers Node
The fields are relative to .targetReport.amrMarkers. This section provides information about the detected bacterial AMR markers.
Field | Description |
---|---|
.class | Microorganism class (e.g. bacterial) |
.cardModelType | AMR marker detection model specified by CARD (homolog, protein variant, rRNA variant) |
.cardGeneFamily | AMR marker family name in CARD |
.name | AMR marker name |
.cardName | Name of marker in the CARD database |
.ncbiName | Name of marker in the NCBI database |
.referenceAccession | NCBI or CARD accession of AMR marker reference sequence |
.coverage | Proportion of reference sequence residues that appear in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type) |
.pid | Percent identity of majority consensus sequence aligned to reference sequence (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type) |
.medianDepth | Median depth of reads aligned to AMR marker reference sequence, indicating the median number of times each AMR marker sequence residue appears in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type) |
.rpkm | Median depth of reads aligned to AMR marker reference sequence, indicating the median number of times each AMR marker sequence residue appears in sequencing reads (protein alignment for homolog and protein variant model types; DNA alignment for rRNA variant model type) |
.alignedReadCount | The read count to the marker resulting from alignment. |
.nucleotideConsensusSequence | (UPIP only) The nucleotide consensus sequence. |
.proteinConsensusSequence | (UPIP only) The protein consensus sequence. |
.nucleotideDepthVector | The depths across the nucleotide alignment, not condensed. |
.proteinDepthVector | The depths across the protein alignment, not condensed. |
.associatedMicroorganisms | Lists of the detected and predicted organisms associated with this marker. |
.associatedMicroorganisms.all | A list of all organisms associated with this marker. |
.associatedMicroorganisms.detected | A list of the detected organisms associated with this marker. |
.associatedMicroorganisms.predicted | A list of the predicted organisms associated with this marker. |
.predictionInformation | Information about Explify's automated interpretation results. |
.predictionInformation.predictedPresent | Whether Explify interpretation predicts that the marker is present (true/false) |
.predictionInformation.confidence | Whether the AMR marker is predicted with high or medium confidence. |
.predictionInformation.notes | List of notes about the interpretation result. |
.targetReport.amrMarkers.variants Node
The fields are relative to targetReport.amrMarkers.variants. This section provides information about variants detected on select bacterial AMR markers.
Field | Description |
---|---|
.category | "Bacterial Variant; Known AMR" |
.referenceSourceMicroorganism | Microorganism that reference sequence is associated with in NCBI |
.comments | Comments about the variant |
.product | The protein product of the gene |
.ntChange | The nucleotide change |
.referencePosition | The position on the reference sequence |
.referenceAllele | The reference sequence at the position of the variant |
.variantAllele | The variant sequence |
.depth | The depth at the variant position |
.alleleFrequency | The frequency of the variant allele in the read pileup |
.annotation | Type of change (e.g. "Nonsynonymous Variant") |
.aaChange | Amino acid change |
.epistaticGroups | List of epistatic groups the variant is associated with. |
.customReferences Node
Only present and populated for custom reference analyses. When only a fasta file is submitted (no BED file), each customReferences object will be for a single reference. When a BED file is provided, each customReferences object is for a single organism/genome and can be for one or more references. The values in the Field column are relative to targetReport.customReferences.
Field | Description |
---|---|
.alignedReadCount | Number of reads aligned to the reference or organism. |
.ani | Average nucleotide identity of majority consensus sequence to genome reference sequences. |
.condensedDepthVector | The depths across the consensus sequences, condensed (if needed) down to 256 items. |
.consensusSequences | Array of objects with information about each consensus sequence for this reference or organism. |
.coverage | Proportion of reference sequence bases that appear in sequencing reads |
.medianDepth | Median depth of reads aligned to reference sequences, indicating the median number of times each genome sequence base appears in sequencing reads |
.name | Either the name (accession) of the reference or the organism |
.pangoLineage | Pango lineage information for SARS-CoV-2. Only present if pangolin is run. |
.rpkm | Normalized representation of the number of reads aligned to targeted microorganism reference sequences (aligned reads per kilobase of targeted sequence per million reads) |
.variants | Array of objects with information about variants detected in the reference sequences |
.customReferences.consensusSequences Node
consensusSequences is an array of objects with each object describing the results for a single reference. When only a fasta file is submitted (no BED file), there will be only one reference in the array. When a BED file is provided, there could be more than one. The values in the Field column are relative to targetReport.customReferences[].consensusSequences[]
Field | Description |
---|---|
.alignedReadCount | Number of reads aligned to the reference or organism. |
.ani | Average nucleotide identity of majority consensus sequence to genome reference sequences. |
.coverage | Proportion of reference sequence bases that appear in sequencing reads |
.depthVector | Depths for each base in the sequence |
.maximumAlignmentLength | The longest contiguous alignment between consensus and reference sequences. |
.maximumGapLength | The longest contiguous gap between consensus and reference sequences. |
.maximumUnalignedLength | Longest stretch of unaligned sequence |
.medianDepth | Median depth of reads aligned to reference sequences, indicating the median number of times each genome sequence base appears in sequencing reads |
.referenceAccession | Accession of the sequence |
.referenceDescription | Description of the sequence |
.referenceLength | Length of the reference sequence |
.sequence | The consensus sequence |
.customReferences.variants Node
Variants is an array of objects with each object describing a single detected variant. The values in the Field column are relative to targetReport.customReferences[].variants[]
.
Field | Description |
---|---|
.alleleFrequency | Frequency of the variant allele in the sequencing reads. |
.depth | The depth at the variant position |
.ntChange | The nucleotide change |
.referenceAccession | Accession of the associated reference |
.referenceAllele | The reference sequence at the position of the variant |
.referencePosition | The position on the reference sequence |
.variantAllele | The variant sequence |
Last updated