Analysis Output
Last updated
Was this helpful?
Last updated
Was this helpful?
When the analysis run completes, the software generates an analysis output in a specified location with the folder name DRAGEN_WGS_HEME_4.4.4_Analysis_${timestamp}. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID. Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.
This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed.
Results - Contains the final result files from the pipeline.
MetricsOutput.tsv - Contains summary metrics for all samples.
Sample1
Sample1.sv.small_indel_dedup.filtered.vcf.gz — Contains DNA structural variants excluding the indels already present in the hard-filtered.vcf file after applying the DragenSvExtraFilters.
Sample1.hard-filtered.vcf.gz—Contains small variants VCF.
Sample1.cnv.vcf.gz —Contains copy number variants VCF.
Sample1_MetricsOutput.tsv—Contains summary metrics for the specific sample.
Sample1.tn.bw —Contains the BigWig representation of the tangent normalized signal.
Sample1.tumor.baf.bedgraph.gz —Contains the BED graph representation of the B-allele frequency (if available).
Logs_Intermediates - Contains all intermediate files for each step of the pipeline.
DragenSvExtraFilters
SampleSheetValidation
ResourceVerification
RunQc(only when started from BCLs)
FastqGeneration (only when started from BCLs)
FastqValidation
DragenCaller
AdditionalSarjMetrics
SampleAnalysisResults
MetricsOutput
passing_sample_steps.json
pipeline_trace.txt
nextflow_work_logs (ICA only)
On DRAGEN server, Nextflow logs are contained in the Work folder.
All logs in Logs_Intermediates are generated from the running analysis software. Inputs to the running Docker container (for example, the run folder, sample sheet, and FASTQ folder) are mapped from native locations on the server to the following locations in the container:
Run folder
/opt/illumina/run-folder
Sample sheet
/opt/illumina/SampleSheet.csv
FASTQ folder
/opt/illumina/fastq-folder
Resources
/opt/illumina/resources
Analysis output folder
/opt/illumina/analysis-folder
The paths in the log messages refer to paths within the running docker container, not paths on the server.
This section describes the summary output files generated during analysis.
File name: MetricsOutput.tsv
The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each sample.
Run metrics from the analysis module indicate the quality of the sequencing run. Review the following metrics to assess run data quality:
PCT_Q30_R1
Percentage of bases with a quality score ≥ 30 from Read 1.
≥ 80.0 (≥85.0 for NovaSeq X Plus)
PCT_Q30_R2
Percentage of bases with a quality score ≥ 30 from Read 2.
≥ 80.0 (≥85.0 for NovaSeq X Plus)
The values in the Run Metrics section are listed as NA in the following situations:
The analysis was started from FASTQ files.
The analysis was started from BCL files, and the InterOp files are missing or corrupt.
Review the following metrics to assess sample data quality:
TUMOR_ESTIMATED_SAMPLE_CONTAMINATION (NA)
NA
TUMOR_MAPPED_READS_PCT (%)
NA
TUMOR_INSERT_LENGTH_MEDIAN (count)
NA
TUMOR_Q30_BASES_EXCL_DUPS_AND_CLIPPED_BASES (bp)
NA
AVERAGE_AUTOSOMAL_COVERAGE_OVER_GENOME (count)
NA
GC_NORMALIZED_COVERAGE_AT_GCS_20_39 (count)
NA
GC_NORMALIZED_COVERAGE_AT_GCS_60_79 (count)
NA