Analysis Output
Last updated
Was this helpful?
Last updated
Was this helpful?
When the analysis run completes, the software generates an analysis output in a folder named /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}_Analysis_{datetimestamp}
, unless a specific location is specified on the command line. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID. Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.
This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed.
Results - Contains the final result files from the pipeline.
MetricsOutput.tsv - Contains summary metrics for all samples.
Sample1
Sample1_MetricsOutput.tsv—Contains summary metrics for the specific sample.
Sample1.tumor.baf.bedgraph.gz —Contains the BED graph representation of the B-allele frequency (if available).
Sample1.sv.small_indel_dedup.filtered.vcf.gz — Contains DNA structural variants excluding the indels already present in the hard-filtered.vcf file after applying the DragenSvExtraFilters.
Sample1.hard-filtered.vcf.gz—Contains small variants VCF.
Sample1.cnv.vcf.gz —Contains copy number variants VCF.
Logs_Intermediates - Contains all intermediate files for each step of the pipeline.
SampleSheetValidation
ResourceVerification
RunQc(only when started from BCLs)
FastqGeneration (only when started from BCLs)
FastqValidation
DragenCaller
AdditionalSarjMetrics
SampleAnalysisResults
MetricsOutput
DragenSvExtraFilters
passing_sample_steps.json
work - Contains Nextflow execution details for debugging purpose.
errors - Contains an Errors.tsv file if any pipipeline analysis step failed.
SampleSheet.csv - User input sample sheet as provided.
pipeline_trace.txt - Contains Nextflow pipeline step execution status.
timeline_${timestamp}.html - Contains Nextflow pipeline task timeline information.
report_${timestamp}.html - Contains Nextflow pipeline task execution details.
receipt - Contains pipeline analysis CLI parameters and execution environment information.
payload.json - Contains pipeline analysis setup parameters and execution environment information.
nextflow.log - Contains Nextflow pipeline execution log.
analysis.log - Contains Nextflow pipeline standard output.
This section describes the summary output files generated during analysis.
File name: MetricsOutput.tsv
The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each sample.
Run metrics from the analysis module indicate the quality of the sequencing run. Review the following metrics to assess run data quality:
PCT_Q30_R1
Percentage of bases with a quality score ≥ 30 from Read 1.
≥ 80.0 (≥85.0 for NovaSeq X Plus)
PCT_Q30_R2
Percentage of bases with a quality score ≥ 30 from Read 2.
≥ 80.0 (≥85.0 for NovaSeq X Plus)
The values in the Run Metrics section are listed as NA in the following situations:
The analysis was started from FASTQ files.
The analysis was started from BCL files, and the InterOp files are missing or corrupt.
Review the following metrics to assess sample data quality:
TUMOR_ESTIMATED_SAMPLE_CONTAMINATION (NA)
NA
The estimated fraction of reads in a sample that may be from another human source
TUMOR_MAPPED_READS_PCT (%)
NA
Percent of mapped reads in the tumor sample
TUMOR_INSERT_LENGTH_MEDIAN (count)
NA
Median insert length of tumor sample
TUMOR_Q30_BASES_EXCL_DUPS_AND_CLIPPED_BASES (bp)
NA
Bases with a Phred quality score of 30 or higher excluding uplicated reads and clipped bases
AVERAGE_AUTOSOMAL_COVERAGE_OVER_GENOME (count)
NA
Average coverage or sequencing depth across the autosomes (chromosomes 1-22)
GC_NORMALIZED_COVERAGE_AT_GCS_20_39 (count)
NA
Normalized sequencing coverage in genomic regioins with GC content between 20% and 39%
GC_NORMALIZED_COVERAGE_AT_GCS_60_79 (count)
NA
Normalized sequencing coverage in genomic regioins with GC content between 60% and 79%