BCL conversion

The Illumina BCL Convert is a standalone local software app that converts the Binary Base Call (BCL) files produced by Illumina™ sequencing systems to FASTQ files. The DRAGEN™ product includes hardware accelerated BCL conversion on the DRAGEN™ platform, which results in improved run times compared to BCL Convert pure software execution.

The DRAGEN BCL conversion is designed to output FASTQ files that match bcl2fastq2 v2.20 output. DRAGEN supports direct conversion from .BCL to the compressed FASTQ.ORA format in order to reduce FASTQ.GZ file size by a ratio up to 5. Refer to the section "DRAGEN ORA compression from BCL" for proper usage.

DRAGEN BCL conversion supports the following features:

  • Demultiplexing samples by barcode with optional mismatch tolerance.

  • Adapter sequence masking or trimming with adjustable matching stringency.

  • UMI sequence tagging and optional trimming.

  • [Optional] Output of FASTQ files for index reads (in gzipped or FASTQ.ORA files)

  • [Optional] Combine all lanes to the same FASTQ output files.

  • High sample count support (100,000 samples)

  • UMI sequences supported in index reads

  • Eliminate skew caused by adapter sequence trimming with 'MinimumAdapterOverlap' setting

  • Support combined (default, compatible with bc2lfastq2) or independent (strict) enforcement of demux conflict detection

  • Support mixed pools by specifying settings for each sample (OverrideCycles, Adapters, etc)

    • Convert all data in a single invocation

    • DRAGEN automatically detects barcode conflicts, even between pools

    • Allows single and dual-index kits to be mixed

    • Undetermined files correctly contain reads that do not map to any sample in any pool

  • Outputs metrics for demultiplexing, quality scores, adapter trimming, unmapped barcodes, & index-hopping detection

  • Outputs per-cycle adapter metrics and per-tile quality & demultiplex metics

  • Convert a subset of tiles specified by regular-expressions using a white-list, a black-list, or both

  • Better support for legacy applications based upon bcl2fastq2 (all off by default):

    • Output metrics in bcl2fastq2 Stats directory format in addition to csv

    • Support FindAdaptersWithIndels setting to match bcl2fastq2 default output

    • Support fastq subdirectories named by sample project, sampleID, & sampleName

System Requirements

When not running on the DRAGEN™ platform, the following requirements should be noted:

  • Minimum 64 GB of RAM (less RAM is required for some smaller flow cell input types, such as NextSeq and iSeq)

  • Storage requirements: sufficient storage for BCL input and FASTQ output on each source and destination storage device (no intermediate output is generated during BCL conversion)

  • Linux CentOS 6 or higher

  • Root access

Installation

For DRAGEN™ products, BCL conversion functionality is included. When using the separate BCL Convert application, note the following:

BCL Convert is installed from an RPM package downloaded from the Illumina support site. Install the RPM package using one of the following commands:

  • To install the software in the default location, enter: rpm --install <rpm package-name>

  • To specify a custom install location, enter: rpm --install --prefix <user-specified directory> <rpm package-name>

The default installation places the executable at /usr/local/bin/bcl-convert.

Run Requirements

BCL Convert & DRAGEN™ require the following files to be present in the run folder to perform BCL conversion:

  • BCL files (*.bcl, *.cbcl)

  • Filter files (*.filter)

  • Position files (*.locs, *.clocs, or s.locs)

  • Aggregated files (*.bci) as applicable

  • The RunInfo.xml file

  • The config.xml file (for older systems) if applicable

  • The SampleSheet.csv file -- Supports v1 and v2. See the Sample Sheet section below for more details.

Command Line Options

The following example command contains the required BCL conversion options for DRAGEN™:

dragen --bcl-conversion-only true --bcl-input-directory <...> --output-directory <...>

For bcl-convert, here are the required options:

bcl-convert --bcl-input-directory <...> --output-directory <...>

There are many optional command line arguments as well. The following is a list of all command-line options:

  • --bcl-conversion-only true---(DRAGEN only) Required for BCL conversion to FASTQ files in the DRAGEN executable.

  • --bcl-input-directory---Indicates the path to the run folder directory (3 levels higher than the BaseCalls directory). Required.

  • --output-directory---Indicates the path to demultiplexed FASTQ output. The directory must not exist unless -f, force is specified. Required.

  • --sample-sheet---Specifies the path to SampleSheet.csv file. --sample-sheet is optional if the SampleSheet.csv file is in the --bcl-input-directory directory.

  • --run-info---Override path to the RunInfo.xml file. By default looked for in the --bcl-input-directory directory.

  • --strict-mode---If set to true, abort if any files are missing or corrupt. The default is false.

  • --first-tile-only---If set to true, only converts the first tile of input (for testing and debugging). The default is false. (Deprecated)

  • --bcl-only-lane <#>---Convert only the specified lane in this conversion run. Default convert all lanes.

  • -f---Convert to output directory even if the directory exists (force).

  • --bcl-use-hw false---(DRAGEN only) Do not use DRAGEN FPGA acceleration during BCL conversion. This allows concurrent execution of BCL conversion with DRAGEN analysis

  • --bcl-sampleproject-subdirectories true---Output FASTQ files to subdirectories based on sample sheet 'Sample_Project' column

  • --no-lane-splitting true---Output all lanes of a flow cell to the same FASTQ files consecutively. Default false.

  • --create-fastq-for-index-reads true---Output FASTQ files for index reads as well as genomic reads. Can only be enabled when an index is present and used for demultiplexing according to the RunInfo.xml file and an OverrideCycles setting. Default false.

  • --bcl-enable-tile-metrics true---Output tile level metrics to the following files when true (default): Demultiplex_Tile_Stats.csv, Quality_Tile_Stats.csv. Files will be output when false but only header will exist.

  • --bcl-only-matched-reads true---Disable outputting unmapped reads to FASTQ files marked as Undetermined. Default false.

  • --tiles ''---Only convert tiles matching a set of regular expressions.

  • --exclude-tiles ''---Do not convert tiles matching a set of regular expressions, even if included in --tiles

  • --no-sample-sheet true---Operate with no sample sheet (no demultiplexing or adapter trimming supported). Default false. This option is not supported for conversion to FASTQ.ORA

  • --output-legacy-stats true---Output metrics in bcl2fastq2 Stats directory format in addition to csv. Default false.

  • --sample-name-column-enabled true---Use Sample_Name SampleSheet column for fastq file names in Sample_Project subdirectories (requires 'bcl-sampleproject-subdirectories true' as well). Default false.

  • --fastq-gzip-compression-level [0-9]---Set gzip compression level for software-compressed fastq files. Default 1.

  • -h, --help---Produces a help message and exits the application.

  • -V, --version---Produces the version number of the application and exits.

  • --ora-reference---Required to output compressed FASTQ.ORA file. Specify the path to the directory that contains the compression reference and index file.

  • --fastq-compression-format---(DRAGEN only) Required for DRAGEN ORA compression to specify the type of compression: use dragen for regular DRAGEN ORA compression, dragen-interleaved for DRAGEN ORA paired compression.

  • --num-unknown-barcodes-reported---# of Top Unknown Barcodes to output (1000 by default)

  • *--bcl-validate-sample-sheet-only---Only validate RunInfo.xml & SampleSheet files (produce no FASTQ files) bcl-validate-sample-sheet-only

(Note that the "fastq-gzip-compression-level" setting will have no effect on blocks compressed by FPGA hardware.)

The following additional options can be used to manually control performance. Use of these options might reduce performance or result in analysis failure, and it is recommended to use the default settings. Contact Illumina Technical Support if issues occur.

  • --shared-thread-odirect-output true---Switch to an alternate file output method that is optimized for sample counts greater than 100,000. This option is not recommended for lower sample counts and/or if using distributed file system output targets such as GPFS or Lustre.

  • --bcl-num-parallel-tiles <#>---Number of tiles processed in parallel. The default is determined dynamically.

  • --bcl-num-conversion-threads <#>---Number of conversion threads per tile. The default is determined dynamically.

  • --bcl-num-compression-threads <#>---Number of CPU threads for gzip-compressing FASTQ output. The default is determined dynamically.

  • --bcl-num-decompression-threads <#>---Number of CPU threads for decompressing input BCL files. The default is determined dynamically.

  • --bcl-num-ora-compression-threads-per-file <#>---Optional for DRAGEN ORA compression. Set the number of threads used per file files. Maximum is 24. Default is 10.

  • --bcl-num-ora-compression-parallel-files <#> ---Optional for DRAGEN ORA compression. Set the number of files processed in parallel. Maximum is 96. Default is 6.

It is recommended to only adjust CPU threads when reducing cores used on a shared machine. The total number of CPU-intensive threads used will be: --bcl-num-parallel-tiles * --bcl-num-conversion-threads + --bcl-num-compression-threads + --bcl-num-decompression-threads.

Tile Filtering

Support for control over which tiles are included in the conversion process comes via two command line options. --tiles provides support for specifying which tiles to include to analysis (a whitelist), while --exclude-tiles provides the option of specifying which tiles to exclude from analysis (a blacklist). Which should be used depends upon convenience of tile-list expression. This feature is a replacement for tiles, ExcludeTiles, and ExcludeTilesLaneX in bcl2fastq2.

Both options use a single regular expression format, given by examples below.

A 4-digit tile specifier to include the first tile of every lane: --tiles 1101 A similar 5-digit tile specifier (NextSeq-only): --tiles 11101 Exclude the first tile of lane 2: --exclude-tiles s_2_1101 ('s_' prefix required if lane is specified) Convert all tiles of lane 2: --tiles s_2 (lane specifier only)

Any digit in the above examples can be replaced with a single-digit range using square brackets:

Select the first tile of both sides: [1-2]101 Select all tiles ending with 5 in lanes 1 & 2: s_[1-2]_[0–9][0–9][0–9]5

Multiple terms are recognized, separated by '+':

Select tile 1102 in lane 1 and all the tiles in the other lanes: s_1_1102+s_[2–8]

Both tiles and exclude-tiles can be used, with tiles first filtering to include only matching terms, then exclude-tiles filtering that result set to exclude tiles matching its terms.

For safety, every term of the regular expression (as separated by '+') used for tiles must match at least one tile entry in the input RunInfo tile list. Every term for exclude-tiles must match at least one tile entry in the set produced by tiles if that option is also used, or the RunInfo tile list otherwise. This is to help ensure that the operator intent matches the programs interpretation.

DRAGEN ORA compression from BCL

BCL files can be converted into FASTQ.ORA using two different methods, which cannot be used at the same time. Choose one or the other method:

  • Method 1: Using command line without a sample sheet:

    • set the path to the directory that contains the compression reference and index file with the --ora-reference command; and

    • specify the type of DRAGEN ORA compression with the '--fastq-compression-format' command. The value can be either dragen for regular DRAGEN ORA compression or dragen-interleaved for DRAGEN ORA paired compression.

  • Method 2: Using command line with a sample sheet:

    • set the path to the directory that contains the compression reference and index file with the --ora-reference command; and

    • specify the type of DRAGEN ORA compression in the sample sheet. See Sample Sheet Settings for proper syntax.

The reference and index files for ORA compression are available via an archive to download on DRAGEN Software Support Site page.

For information about how to use FASTQ.ORA files see Input File Types.

Interleaved compression

The interleaved DRAGEN ORA compression improves the compression up to 10% vs. DRAGEN ORA regular compression. To enable it set --fastq-compression-format to dragen-interleaved. The paired-read file from the nth line of fastq-list.csv generated by the BCL convert tool are then compressed together into a single fastq.ora file with name <filename before "R">-interleaved<_suffix>.fastq.ora (<_suffix> is optional). If decompressing an ORA file that contains paired data, the file is automatically decompressed to two separate files. To map an ORA file that contains paired interleaved data with the DRAGEN mapper, use the --interleaved option during map/align.

Command line examples

The following example command contains the required BCL conversion options to run regular DRAGEN ORA compression from BCL: dragen --bcl-conversion-only true --bcl-input-directory <...> --sample-sheet <...> --ora-reference <...> --fastq-compression-format dragen --output-directory <...>

The following example command contains the required BCL conversion options to run interleaved DRAGEN ORA compression from BCL: dragen --bcl-conversion-only true --bcl-input-directory <...> --sample-sheet <...> --ora-reference <...> --fastq-compression-format dragen-interleaved --output-directory <...>

Sample Sheet

A sample sheet (SampleSheet.csv) records information about samples and their corresponding indexes, and settings that dictate the behavior of the software. The default location of the sample sheet is the input folder. To specify an alternative file location, use the command --sample-sheet <file-path>. When a sample sheet does not exist in the default location and no sample sheet is specified in the command line, the software produces an error unless the '--no-sample-sheet true' option is specified (provided for legacy applications with no demultiplexing, adapter trimming, or other sample-sheet-specified settings supported).

Sample Sheet Versions

BCL Convert and DRAGEN support two sample sheet verions: v1 and v2. The following table displays the different supported options for v1 and v2:

Sample Sheet Settings Section

In addition to the command line options that control the behavior of BCL conversion, you can use the [Settings] section in the sample sheet configuration file to specify how the samples are processed. The following are the sample sheet settings for BCL conversion.

Note that DRAGEN does not support the following sample sheet settings from bcl2fastq:

  • ReverseComplement

OverrideCycles

The OverrideCycles mask elements are semicolon separated. The OverrideCycles setting can be specified in one of the following formats, where the two formats cannot be mixed:

Order Dependent: OverrideCycles,U7N1Y143;I8;I8;U7N1Y143 Order Independent (examples): OverrideCycles,R1:U7N1Y143;I1:I8;I2:I8;R2:U7N1Y143 OverrideCycles,R1:U7N1Y143;R2:U7N1Y143;I1:I8;I2:I8

DRAGEN supports flexible UMI processing during BCL conversion to support more third-party assays, including UMI sequences in index reads and multiple UMI regions per read. UMI sequences are trimmed from FASTQ read sequences and placed in the sequence identifier for each read, as normal.

The following are examples of OverrideCycles settings using 2x151 reads:

No lane splitting

When using --no-lane-splitting true or the corresponding sample sheet setting NoLaneSplitting,true, DRAGEN FASTQ file name convention and FASTQ contents match bcl2fastq2 for the same feature.

DRAGEN only supports this mode when no 'Lane' column is specified in the sample sheet to make sure that all samples are present in all lanes in the same listed order. This is generally expected for flow cells with no fluidic boundaries between lanes.

IndependentIndexCollisionCheck

When this mode is enabled for a lane, each index (i7 & i5) must individually fully resolve to a single barcode within mismatch tolerances, whereas by default ambiguities can be resolved by the other index. Combinatorial (exact) matches are still allowed. See 'Demultiplexing' section below for further clarification.

Sample Sheet Data Section

The data section is required. Headers for the data section should be [Data] or [data] for sample sheet v1 and [BCLConvert_Data] for sample sheet v2. The following data section headings are supported:

Per Sample Settings

DRAGEN/bcl-convert 4.1 and later supports the following settings as columns in the [BCLConvert_Data] section, allowing them to be specified differently for each sample: OverrideCycles, BarcodeMismatchesIndex1, BarcodeMismatchesIndex2, AdapterRead1, AdapterRead2, AdapterBehavior, AdapterStringency.

These per-sample settings can be specified by omitting the setting from the [BCLConvert_Settings] section and instead adding a column to the [BCLConvert_Data] section with that setting name. Settings that do not apply to a sample (e.g. 'index2' if i5 is masked out for that sample) must be blank or 'na' in the entry for that sample.

This feature is only supported on version two (v2) sample sheets, and no setting can be specified both globally and per-sample. Specifying OverrideCycles differently per-sample allows mixing of different pools into the same lane, but must still obey barcode mismatch constraints for all cycles that are used for demultiplexing by any sample in that lane. DRAGEN software will detect all conflicts between samples at the beginning of the conversion run, even between different pools.

Different strategies such as UMI indexes and dual-index inputs can be combined, provided IndependentIndexCollisionCheck is not enabled. Below is an example sample sheet using per-sample-settings for illustration:

[Header] FileFormatVersion,2

[BCLConvert_Settings] AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

[BCLConvert_Data] Sample_ID,index,index2,OverrideCycles 21599,ATAGAGGC,TATAGCCT,Y151;I8;I8;Y151 21600,na,ATAGAGGC,Y151;U8;I8;Y151 21601,GGCTCTG,CCTATCC,Y151;I7N1;I7N1;Y151 21602,ATTACTCG,GGCTCTGA,Y151;I8;I8;U10Y141

Sample Sheet Obsolete Settings

BCL Convert does not support the following settings, and new formats must replace their corresponding old formats, when applicable. Manual changes to the sample sheet can be made to the [Settings] section, but the [Data] section must remain unchanged. If any of the obsolete settings are used in the command line or the sample sheet, the software aborts and returns an error. Also note that some obsolete settings that were previously specified on the command line are now correctly specified in the sample sheet.

Adapter Behavior and Specifications

Read Trimming

UMI Specification

Barcode Mismatches

Masking of Trimmed Reads

Run Instructions

Some additional instructions on running BCL Convert or DRAGEN for BCL conversion.

nohup

It is recommended to use nohup or other protection when executing BCL conversion via the command line in order to prevent a disconnection or terminal closure from terminating the process. This is done by beginning the command line with nohup before the executable you wish to run.

Ulimit Settings

BCL Convert requires high ulimit settings for both the number of open files allowed and maximum user processes. If a run fails due to maximum user processes being set too low, an error message stating "resource temporarily unavailable" occurs. By default, BCL Convert attempts to set the ulimit soft limit for the number of open files (ulimit -n) to 65535 and the maximum user processes to 32768. If those values exceed the hard limits of the system, the soft limit is set to the hard limit. If more than 10,000 samples are provided, then ulimit -n is set to 720000.

Missing File Handling

If --strict-mode is set to false, BCL Convert executes certain behaviors when it finds missing or corrupt files, rather than abort operation. The following are the possible behaviors according to file type and status.

Analysis Methods

Demultiplexing

BCL Convert produces one FASTQ file for each sample for each lane and read. Demultiplexing behaviors are as follows:

  • When a sample sheet contains multiplexed samples, the software:

    • Places reads without a matching index adapter sequence in Undetermined_S0.fastq.

    • Places reads with valid index adapter sequences in the sample FASTQ file.

  • When a sample sheet contains one unindexed sample, all reads are placed in the sample FASTQ files (one each for Read 1 and Read 2).

  • All reads that do not demultiplex to the samples defined in the Data section of the sample sheet are placed in Undetermined_S0.fastq per lane.

  • When the Lane column in the Data section is not used, all lanes are converted. Otherwise, only populated lanes are converted.

Reverse Complement

BCL Convert will demultiplex indices according to the orientation in which the sequencer evaluated the index read(s). A flag in the object of the RunInfo.xml specifies whether each index read was sequencing in the forward or reverse orientation. The flag may be present or not present depending on the sequencing instrument. If the flag is absent, BCL Convert will interpret the index sequences as specified. If present:

  • “IsReverseComplement” flag will be specified as “Y” if the index read was sequenced in the reverse orientation,

  • “IsReverseComplement” flag will be specified as “N” if the index read was sequenced in the forward orientation. BCL Convert will do the following when “Y” is specified for the “IsReverseComplement” flag for an index sequence:

  • The software will reverse the sequence and generate the complement base pair as the index read, where A=T, C=G, and N=N

  • The OverrideCycles value specified will be reversed for the corresponding index read before the reverse complement is taken

  • If a UMI is specified in the corresponding index read, the “r” character will be added at the beginning of the UMI sequence written in the Read Name of the FASTQ file

  • A log message will be displayed indicating that the reverse complement was used for the corresponding index read

Combined vs Independent Index Validation

DRAGEN/bcl-convert version 3.10 introduced a stricter barcode validation system that required each index in a dual-index setup to independently resolve against other samples at the index's mismatch tolerance, rather than allowing the combination of indexes to resolve in the case of a conflict in i7 or i5 individually. This was incompatible with previous versions of DRAGEN/bcl-convert and with bcl2fastq2, but is a more strict validation that may be better suited to the accuracy requirements of unique-dual applications.

In DRAGEN/bcl-convert 4.1.0, we briefly introduced an option, specified per lane, to enable a more relaxed validation compatible with bcl2fastq2 and earlier versions of dragen/bcl-convert. The 'CombinedIndexCollisionCheck' setting enabled relaxed validation on a per-lane basis. However, for DRAGEN/bcl-convert 4.1.6, it was decided to make relaxed validation the default behavior due to its long history, and we instead now introduce a setting to enable stricter validation. The 'IndependentIndexCollisionCheck' setting enables strict validation on the given semi-colon-separated list of lanes. The example below sets lanes 1, 3, & 4 to strict validation mode:

[BCLConvert_Settings] CombinedIndexCollisionCheck,1;3;4

Note that the short-lived 'CombinedIndexCollisionCheck' setting is not supported in DRAGEN/bcl-convert 4.1.6+ and will produce an error if used.

UMI Trimming

The software is capable of trimming unique molecular identifier (UMI) sequences from the genomic or index sequences. The cycles of the sequencing read that correspond to the UMI are specified in the OverrideCycles parameter in the Settings section of the sample sheet. See the Settings Section to set the OverrideCycles parameter.

The following are details of the behavior of reads specified as UMIs:

  • UMIs are trimmed from the sequence by default. Use the TrimUMI setting in the Sample Sheet to include UMIs.

  • UMI sequence can be specified in the index and genomic reads. More than one UMI sequence can be specified per read.

  • The specified UMI cycles are applied to all clusters. There is no mechanism to apply UMI based on lane or sample.

  • UMI sequences can only be specified at the beginning and end of sequencing and index reads. UMIs cannot be located in the middle of a read.

Adapter Trimming and Masking

The software can mask or trim user specified adapter sequences from read data so that those adapter sequences are not passed to any downstream analysis steps. Additional details of the adapter handling capabilities are as follows:

  • The software masks the identified adapter sequence with N so that the overall read length is constant across all clusters in the read.

  • The software trims the identified adapter sequence from the read. The length for each cluster may vary due to trimming.

  • The software assumes that input adapter sequences can only contain A, C, G, or T.

Output Files

FASTQ Files

As converted versions of BCL files, FASTQ files are the primary output of BCL Convert. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.

The software generates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, the software generates two FASTQ files per lane containing all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.

  • If Sample_Name and Sample_Project are both present, and both --sample-name-column-enabled true and --bcl-sampleproject-subdirectories true command lines are used, then output FASTQ files to subdirectories based upon Sample_Project and Sample_ID, and name fastq files by Sample_Name. The same project directory contains the files for multiple samples.

  • If the Sample_ID and Sample_Name columns are specified but do not match, the FASTQ files reside in a subdirectory where files use the Sample_Name value.

  • Reads with unidentified index adapters are recorded in one file named Undetermined_S0_. If a sample sheet includes multiple samples without specified index adapters, the software displays a missing barcode error and ends the analysis.

  • NOTE : The software allows one unindexed sample because identification is not necessary to sequence one sample. However, sequencing multiple samples requires multiplexing so the samples can be identified for analysis.

File Names

The file name format is constructed from fields specified in the sample sheet. The format is as follows.

  • <Sample_ID>_S#_L00#_R#_001.fastq.gz

  • <Sample_ID>—The ID of the sample provided in the sample sheet.

  • S1—The number of the sample based on the order that samples are listed in the sample sheet, starting with 1. In the example, S1 indicates that the sample is the first sample listed for the run.

  • NOTE : Reads that cannot be assigned to any sample are written to a FASTQ file as sample number 0 and excluded from downstream analysis.

  • L001—The lane number of the flow cell, starting with lane 1, to the number of lanes supported. R1—The read. In the example, R1 indicates Read 1. R2 indicates Read 2 of a paired-end run. 001—The last portion of the file name is always 001.

File Format

FASTQ files are text-based files that contain base calls with corresponding Q-scores for each read. Each file has one 4-line entry:

  • A sequence identifier with information about the run and cluster, formatted as:

    @Instrument:RunID:FlowCellID:Lane:Tile:X:Y:UMI Read:Filter:0:IndexSequence or SampleNumber

  • Note: If a UMI is specified in an index read when “isReverseComplement” exists in the RunInfo.xml, the “r” character will be added at the beginning of the UMI sequence written in the Read Name of the FASTQ file

  • The sequence (base calls A, G, C, T, and N, for unknown bases).

  • A plus sign (+) that functions as a separator.

  • The Q-score using ASCII 33 encoding (see Quality Score Encoding).

Sequence Identifier Fields

A complete FASTQ file entry resembles the following example:

`@SIM:1:FCX:1:2106:15337:1063:GATCTGTACGTC 1:N:0:ATCACGGATCTGTACGTCTCTGCNTCACCTCCACCGTGCAACTCATCACGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTA + CCCCCGGGGGGGGGGGG#:CFFGFGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGFGGG`

This behavior can be altered with the CreateFastqForIndexReads and NoLaneSplitting options (see Sample Sheet Settings section above).

Log Files

BCL conversion outputs log files to the Logs/ output subfolder. These include three separate files, Info.log, Warnings.log, and Errors.log, for three increasing levels of severity. All output to these files is also written to the terminal console: Info is written to standard-out, while Errors and Warnings are written to standard-error.

In addition, the file "FastqComplete.txt" is created in the Logs/ subfolder when conversion is complete. This can be used to trigger subsequent action if desired.

BCL Metrics Output

DRAGEN BCL conversion outputs metrics in CSV format to the Reports/ output subfolder. Information provided includes metrics files for demultiplexing, quality scores, adapter sequence trimming, index-hopping detection (for unique-dual indexes only), and the top unmapped barcodes for each lane. In addition, the sample sheet and RunInfo.xml file used during conversion is copied into the Reports/ subdirectory for reference.

Demultiplex Metrics Output File

The following information is included in the Demultiplex_Stats.csv output file.

Quality Metrics Output File

The following information is included in the Quality_Metrics.csv output file.

Adapter Metrics Output File

The following information is included in the Adapter_Metrics.csv output file.

Index Hopping Metrics Output File

For unique dual index inputs, the Index_Hopping_Counts.csv file provides the number of reads mapping to every possible combination of provided index and index2 values, including via mismatch tolerance. The metrics provide visibility into any index-hopping behavior that might be occurring. The samples with both index and index2 values present in the sample sheet are present in the index hopping file for reference. The following information is included in the Index_Hopping_Counts.csv output file.

Top Unknown Barcodes Metrics Output File

Th Top_Unknown_Barcodes.csv file lists the most commonly-encountered barcode sequences in the flow cell input that are not listed in the sample sheet. The 1000 most common unlisted sequences are listed, along with any other sequences with a frequency equivalent to the 1000th most commonly encountered sequence. The following information is included in the Top_Unknown_Barcodes.csv output file.

Per-cycle Adapter Metrics

The following information is included in the Adapter_Cycle_Metrics.csv output file.

Per-tile Metrics:

The format of Demultiplex_Tile_Stats.csv and Quality_Tile_Metrics.csv matches that of Demultiplex_Stats.csv and Quality_Metrics.csv, respectively, save that an additional column is added:

These files provide per-tile data rather than aggregated across the lane and read.

Library Rebalancing Stats

If the 'LibraryInputVolume' setting is provided in the sample sheet, then a 'LibraryRebalancing_Stats.csv' metrics file will also be output in the Reports subdirectory. This is provided for library and pooling QC on the iSeq 100 system. For each read group entry, the following columns are provided:

Please see the following article for more information on use of ths file: https://knowledge.illumina.com/instrumentation/iseq-100/instrumentation-iseq-100-reference_material-list/000002698

Sample_Name and Sample_Project Columns

For the metrics files listed above (apart from Top_Unknown_Barcodes.csv), up to two additional columns may be added to each line if 'bcl-sampleproject-subdirectories' and/or 'sample-name-column-enabled' options are enabled:

FASTQ List Output File

The "fastq_list.csv" output file is located in the output folder with the FASTQ files. The files provides the associations between the sample indexes, lane, and the output FASTQ file names. For information on running DRAGEN using fastq_list.csv, see, lane, and the output fastq file names. The columns of each row are documented below, along with example entries from a test run. For more information on running DRAGEN using fastq_list.csv, see FASTQ CSV File Format.

The following is an example fastq_list.csv output file.

RGID,RGSM,RGLB,RGSS,Lane,Read1File,Read2File
AACAACCA.ACTGCATA.1,1,Lib_XL_347,P5,1,/home/user/dragen_bcl_out/1_S1_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/1_S1_L001_R2_001.fastq.gz
AATCCGTC.ACTGCATA.1,2,Lib_XL_347,X9,1,/home/user/dragen_bcl_out/2_S2_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/2_S2_L001_R2_001.fastq.gz
CGAACTTA.GCGTAAGA.1,3,Lib_XL_347,Op20,1,/home/user/dragen_bcl_out/3_S3_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/3_S3_L001_R2_001.fastq.gz
GATAGACA.GCGTAAGA.1,4,Lib_IL_955,Op20,1,/home/user/dragen_bcl_out/4_S4_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/4_S4_L001_R2_001.fastq.gz

In the above example, the operator added columns in the Data section of the sample sheet labelled 'RGLB' (Library) and 'RGSS' (a custom field with no pre-existing definition), and these values were passed through and assigned to each read group in the generated fastq_list.csv file. Secondary analysis with DRAGEN™ using the fastq-list input option will further retain these assignments into generated BAM files.

It is a good idea to overload RGLB and include valid values to this mandatory BAM tag. Custom tags can be used to add extended data to each read group.

Legacy Stats Output Files

When the “output-legacy-stats” command line option is enabled, DRAGENBCL Convert produces the following metrics to the Reports/legacy output subfolder. These files are identical to the bcl2fastq2.20 report files except for incidences where there was decreased accuracy, non-deterministic output, or incorrect output from bcl2fastq2.20.

##### ConversionStats File

The ConversionStats.xml file contains the lane number for each lane and the following information for each tile:

  • Raw Cluster Count Read Number

  • YieldQ30

  • Yield

  • QualityScore Sum

##### DemultiplexStats File

The DemultiplexingStats.xml contains the flow cell ID and project name. For each sample, index, and lane, the file lists the BarcodeCount, PerfectBarcodeCount, and OneMismatchBarcodeCount (if applicable).

##### Adapter Trimming File

The adapter trimming file is a text-based file that contains a statistics summary of adapter trimming for a FASTQ file. The file contains the fraction of reads with untrimmed bases for each sample, lane, and read number plus the following information:

  • Lane

  • Read

  • Project

  • Sample ID

  • Sample Name

  • Sample Number

  • TrimmedBases

  • PercentageOfBases(beingtrimmed)

##### FastqSummaryF1L# File

A FastqSummaryF1L#.txt file contains the number of raw and passed filter reads for each sample and tile in a lane. The number sign (#) indicates the lane number.

##### DemuxSummaryF1L# File

DemuxSummaryF1L#.txt files, where # indicates the lane number, are generated when the sample sheet contains at least one indexed sample. A file contains the percentage of each tile that each sample occupies. It also lists the 1000 most common unknown index adapter sequences and the total number of reads with each index adapter identified. NOTE : To improve processing speed, the total for each index adapter is based on an estimate from a sampling algorithm.

##### HTML Reports

HTML reports are generated from data in DemultiplexingStats.xml and ConversionStats.xml. The reports reside in Reports\html in the output directory or in the directory specified by the --reports-dir option.

The flow cell summary contains the following information:

  • Clusters(Raw) Clusters(PF)*Yield (MBases)

NOTE : For patterned flow cells, the number of raw clusters is equal to the number of wells on the flow cell.

The lane summary provides the following information for each project, sample, and index sequence specified in the sample sheet:

  • Lane#

  • Clusters(Raw)

  • %oftheLane

  • % Perfect Barcode

  • % One Mismatch

  • Clusters(Filtered)

  • Yield

  • % PF Clusters

  • %Q30Bases

  • Mean Quality Score

  • The Top Unknown Barcodes table in the HTML report provides the count and sequence for the 10 most common unmapped index adapters in each lane.

Resources and References

The BCL Convert support pages on the Illumina support site (https://support.illumina.com/) provide additional resources. These resources include training, compatible products, and other considerations. Always check support pages for the latest versions.

Last updated