Analysis Output

When the analysis run completes, the software generates an analysis output in a folder named /staging/DRAGEN_Heme_WGS_Tumor_Only_Pipeline_{version}_Analysis_{datetimestamp}, unless a specific location is specified on the command line. In ICA, analysis output is listed in the Output section of the analysis, where the folder name is a combination of user reference, pipeline name, and a UUID. Within the analysis folder, each analysis step generates a subfolder within the Logs_Intermediates folder.

Output Folders

This section describes each output folder generated during analysis and where to find metric and analytic files when the pipeline is executed.

📂 Results - Contains the final result files from the pipeline.
- 📄 MetricsOutput.tsv - Contains summary metrics for all samples.
  - 📂 Sample1
    📄 Sample1_MetricsOutput.tsv—Contains summary metrics for the specific sample.
    📄 Sample1.tumor.baf.bedgraph.gz —Contains the BED graph representation of the B-allele frequency (if available).
    📄 Sample1.sv.small_indel_dedup.filtered.vcf.gz — Contains DNA structural variants excluding the indels already present in the hard-filtered.vcf file after applying the DragenSvExtraFilters.
    📄 Sample1.hard-filtered.vcf.gz—Contains small variants VCF.
    📄 Sample1.cnv.vcf.gz —Contains copy number variants VCF.
📂 Logs_Intermediates - Contains all intermediate files for each step of the pipeline.
- 📂 SampleSheetValidation
- 📂 ResourceVerification
- 📂 RunQc(only when started from BCLs)
- 📂 FastqGeneration (only when started from BCLs)
- 📂 FastqValidation
- 📂 DragenCaller
- 📂 AdditionalSarjMetrics
- 📂 SampleAnalysisResults
- 📂 MetricsOutput
- 📂 DragenSvExtraFilters
- 📄 passing_sample_steps.json
📂 work - Contains Nextflow execution details for debugging purpose.
📂 errors - Contains an Errors.tsv file if any pipipeline analysis step failed.
📄 SampleSheet.csv - User input sample sheet as provided.
📄 pipeline_trace.txt - Contains Nextflow pipeline step execution status.
📄 timeline_${timestamp}.html - Contains Nextflow pipeline task timeline information.
📄 report_${timestamp}.html - Contains Nextflow pipeline task execution details.
📄 receipt - Contains pipeline analysis CLI parameters and execution environment information.
📄 payload.json - Contains pipeline analysis setup parameters and execution environment information.
📄 nextflow.log - Contains Nextflow pipeline execution log.
📄 analysis.log - Contains Nextflow pipeline standard output.

File Overview

This section describes the summary output files generated during analysis.

Metrics Output

File name: MetricsOutput.tsv

The metrics output file is a final combined metrics report that provides sample status, key analysis metrics, and metadata in a tab-separated values (TSV) file. Sample metrics within the report indicate guideline-suggested lower specification limits (LSL) and upper specification limits (USL) for each sample in the run. One metrics output file is generated for the entire run. An additional file is generated for each sample.

Run Metrics

Run metrics from the analysis module indicate the quality of the sequencing run. Review the following metrics to assess run data quality:

Metric

Description

Recommended Threshold

PCT_Q30_R1

Percentage of bases with a quality score ≥ 30 from Read 1.

≥ 80.0 (≥85.0 for NovaSeq X Plus)

PCT_Q30_R2

Percentage of bases with a quality score ≥ 30 from Read 2.

≥ 80.0 (≥85.0 for NovaSeq X Plus)

The values in the Run Metrics section are listed as NA in the following situations:

The analysis was started from FASTQ files.
The analysis was started from BCL files, and the InterOp files are missing or corrupt.

Sample QC Metrics

Review the following metrics to assess sample data quality:

Metric (UOM)

Recommended Threshold

Description

TUMOR_ESTIMATED_SAMPLE_CONTAMINATION (NA)

The estimated fraction of reads in a sample that may be from another human source

TUMOR_MAPPED_READS_PCT (%)

Percent of mapped reads in the tumor sample

TUMOR_INSERT_LENGTH_MEDIAN (count)

Median insert length of tumor sample

TUMOR_Q30_BASES_EXCL_DUPS_AND_CLIPPED_BASES (bp)

Bases with a Phred quality score of 30 or higher excluding uplicated reads and clipped bases

AVERAGE_AUTOSOMAL_COVERAGE_OVER_GENOME (count)

Average coverage or sequencing depth across the autosomes (chromosomes 1-22)

GC_NORMALIZED_COVERAGE_AT_GCS_20_39 (count)

Normalized sequencing coverage in genomic regioins with GC content between 20% and 39%

GC_NORMALIZED_COVERAGE_AT_GCS_60_79 (count)

Normalized sequencing coverage in genomic regioins with GC content between 60% and 79%

PreviousPost Processing NextAnalysis Methods

Last updated 2 months ago

Was this helpful?