BCL conversion

The Illumina BCL Convert is a standalone local software app that converts the Binary Base Call (BCL) files produced by Illumina™ sequencing systems to FASTQ files. The DRAGEN™ product includes hardware accelerated BCL conversion on the DRAGEN™ platform, which results in improved run times compared to BCL Convert pure software execution.

The DRAGEN BCL conversion is designed to output FASTQ files that match bcl2fastq2 v2.20 output. DRAGEN supports direct conversion from .BCL to the compressed FASTQ.ORA format in order to reduce FASTQ.GZ file size by a ratio up to 5. Refer to the section "DRAGEN ORA compression from BCL" for proper usage.

DRAGEN BCL conversion supports the following features:

  • Demultiplexing samples by barcode with optional mismatch tolerance.

  • Adapter sequence masking or trimming with adjustable matching stringency.

  • UMI sequence tagging and optional trimming.

  • [Optional] Output of FASTQ files for index reads (in gzipped or FASTQ.ORA files)

  • [Optional] Combine all lanes to the same FASTQ output files.

  • High sample count support (100,000 samples)

  • UMI sequences supported in index reads

  • Eliminate skew caused by adapter sequence trimming with 'MinimumAdapterOverlap' setting

  • Support combined (default, compatible with bc2lfastq2) or independent (strict) enforcement of demux conflict detection

  • Support mixed pools by specifying settings for each sample (OverrideCycles, Adapters, etc)

    • Convert all data in a single invocation

    • DRAGEN automatically detects barcode conflicts, even between pools

    • Allows single and dual-index kits to be mixed

    • Undetermined files correctly contain reads that do not map to any sample in any pool

  • Outputs metrics for demultiplexing, quality scores, adapter trimming, unmapped barcodes, & index-hopping detection

  • Outputs per-cycle adapter metrics and per-tile quality & demultiplex metics

  • Convert a subset of tiles specified by regular-expressions using a white-list, a black-list, or both

  • Better support for legacy applications based upon bcl2fastq2 (all off by default):

    • Output metrics in bcl2fastq2 Stats directory format in addition to csv

    • Support FindAdaptersWithIndels setting to match bcl2fastq2 default output

    • Support fastq subdirectories named by sample project, sampleID, & sampleName

System Requirements

When not running on the DRAGEN™ platform, the following requirements should be noted:

  • Minimum 64 GB of RAM (less RAM is required for some smaller flow cell input types, such as NextSeq and iSeq)

  • Storage requirements: sufficient storage for BCL input and FASTQ output on each source and destination storage device (no intermediate output is generated during BCL conversion)

  • Linux CentOS 6 or higher

  • Root access

Installation

For DRAGEN™ products, BCL conversion functionality is included. When using the separate BCL Convert application, note the following:

BCL Convert is installed from an RPM package downloaded from the Illumina support site. Install the RPM package using one of the following commands:

  • To install the software in the default location, enter: rpm --install <rpm package-name>

  • To specify a custom install location, enter: rpm --install --prefix <user-specified directory> <rpm package-name>

The default installation places the executable at /usr/local/bin/bcl-convert.

Run Requirements

BCL Convert & DRAGEN™ require the following files to be present in the run folder to perform BCL conversion:

  • BCL files (*.bcl, *.cbcl)

  • Filter files (*.filter)

  • Position files (*.locs, *.clocs, or s.locs)

  • Aggregated files (*.bci) as applicable

  • The RunInfo.xml file

  • The config.xml file (for older systems) if applicable

  • The SampleSheet.csv file -- Supports v1 and v2. See the Sample Sheet section below for more details.

Command Line Options

The following example command contains the required BCL conversion options for DRAGEN™:

dragen --bcl-conversion-only true --bcl-input-directory <...> --output-directory <...>

For bcl-convert, here are the required options:

bcl-convert --bcl-input-directory <...> --output-directory <...>

There are many optional command line arguments as well. The following is a list of all command-line options:

  • --bcl-conversion-only true---(DRAGEN only) Required for BCL conversion to FASTQ files in the DRAGEN executable.

  • --bcl-input-directory---Indicates the path to the run folder directory (3 levels higher than the BaseCalls directory). Required.

  • --output-directory---Indicates the path to demultiplexed FASTQ output. The directory must not exist unless -f, force is specified. Required.

  • --sample-sheet---Specifies the path to SampleSheet.csv file. --sample-sheet is optional if the SampleSheet.csv file is in the --bcl-input-directory directory.

  • --run-info---Override path to the RunInfo.xml file. By default looked for in the --bcl-input-directory directory.

  • --strict-mode---If set to true, abort if any files are missing or corrupt. The default is false.

  • --first-tile-only---If set to true, only converts the first tile of input (for testing and debugging). The default is false. (Deprecated)

  • --bcl-only-lane <#>---Convert only the specified lane in this conversion run. Default convert all lanes.

  • -f---Convert to output directory even if the directory exists (force).

  • --bcl-use-hw false---(DRAGEN only) Do not use DRAGEN FPGA acceleration during BCL conversion. This allows concurrent execution of BCL conversion with DRAGEN analysis

  • --bcl-sampleproject-subdirectories true---Output FASTQ files to subdirectories based on sample sheet 'Sample_Project' column

  • --no-lane-splitting true---Output all lanes of a flow cell to the same FASTQ files consecutively. Default false.

  • --create-fastq-for-index-reads true---Output FASTQ files for index reads as well as genomic reads. Can only be enabled when an index is present and used for demultiplexing according to the RunInfo.xml file and an OverrideCycles setting. Default false.

  • --bcl-enable-tile-metrics true---Output tile level metrics to the following files when true (default): Demultiplex_Tile_Stats.csv, Quality_Tile_Stats.csv. Files will be output when false but only header will exist.

  • --bcl-only-matched-reads true---Disable outputting unmapped reads to FASTQ files marked as Undetermined. Default false.

  • --tiles ''---Only convert tiles matching a set of regular expressions.

  • --exclude-tiles ''---Do not convert tiles matching a set of regular expressions, even if included in --tiles

  • --no-sample-sheet true---Operate with no sample sheet (no demultiplexing or adapter trimming supported). Default false. This option is not supported for conversion to FASTQ.ORA

  • --output-legacy-stats true---Output metrics in bcl2fastq2 Stats directory format in addition to csv. Default false.

  • --sample-name-column-enabled true---Use Sample_Name SampleSheet column for fastq file names in Sample_Project subdirectories (requires 'bcl-sampleproject-subdirectories true' as well). Default false.

  • --fastq-gzip-compression-level [0-9]---Set gzip compression level for software-compressed fastq files. Default 1.

  • -h, --help---Produces a help message and exits the application.

  • -V, --version---Produces the version number of the application and exits.

  • --ora-reference---Required to output compressed FASTQ.ORA file. Specify the path to the directory that contains the compression reference and index file.

  • --fastq-compression-format---(DRAGEN only) Required for DRAGEN ORA compression to specify the type of compression: use dragen for regular DRAGEN ORA compression, dragen-interleaved for DRAGEN ORA paired compression.

  • --num-unknown-barcodes-reported---# of Top Unknown Barcodes to output (1000 by default)

  • *--bcl-validate-sample-sheet-only---Only validate RunInfo.xml & SampleSheet files (produce no FASTQ files) bcl-validate-sample-sheet-only

(Note that the "fastq-gzip-compression-level" setting will have no effect on blocks compressed by FPGA hardware.)

The following additional options can be used to manually control performance. Use of these options might reduce performance or result in analysis failure, and it is recommended to use the default settings. Contact Illumina Technical Support if issues occur.

  • --shared-thread-odirect-output true---Switch to an alternate file output method that is optimized for sample counts greater than 100,000. This option is not recommended for lower sample counts and/or if using distributed file system output targets such as GPFS or Lustre.

  • --bcl-num-parallel-tiles <#>---Number of tiles processed in parallel. The default is determined dynamically.

  • --bcl-num-conversion-threads <#>---Number of conversion threads per tile. The default is determined dynamically.

  • --bcl-num-compression-threads <#>---Number of CPU threads for gzip-compressing FASTQ output. The default is determined dynamically.

  • --bcl-num-decompression-threads <#>---Number of CPU threads for decompressing input BCL files. The default is determined dynamically.

  • --bcl-num-ora-compression-threads-per-file <#>---Optional for DRAGEN ORA compression. Set the number of threads used per file files. Maximum is 24. Default is 10.

  • --bcl-num-ora-compression-parallel-files <#> ---Optional for DRAGEN ORA compression. Set the number of files processed in parallel. Maximum is 96. Default is 6.

It is recommended to only adjust CPU threads when reducing cores used on a shared machine. The total number of CPU-intensive threads used will be: --bcl-num-parallel-tiles * --bcl-num-conversion-threads + --bcl-num-compression-threads + --bcl-num-decompression-threads.

Tile Filtering

Support for control over which tiles are included in the conversion process comes via two command line options. --tiles provides support for specifying which tiles to include to analysis (a whitelist), while --exclude-tiles provides the option of specifying which tiles to exclude from analysis (a blacklist). Which should be used depends upon convenience of tile-list expression. This feature is a replacement for tiles, ExcludeTiles, and ExcludeTilesLaneX in bcl2fastq2.

Both options use a single regular expression format, given by examples below.

A 4-digit tile specifier to include the first tile of every lane: --tiles 1101 A similar 5-digit tile specifier (NextSeq-only): --tiles 11101 Exclude the first tile of lane 2: --exclude-tiles s_2_1101 ('s_' prefix required if lane is specified) Convert all tiles of lane 2: --tiles s_2 (lane specifier only)

Any digit in the above examples can be replaced with a single-digit range using square brackets:

Select the first tile of both sides: [1-2]101 Select all tiles ending with 5 in lanes 1 & 2: s_[1-2]_[0–9][0–9][0–9]5

Multiple terms are recognized, separated by '+':

Select tile 1102 in lane 1 and all the tiles in the other lanes: s_1_1102+s_[2–8]

Both tiles and exclude-tiles can be used, with tiles first filtering to include only matching terms, then exclude-tiles filtering that result set to exclude tiles matching its terms.

For safety, every term of the regular expression (as separated by '+') used for tiles must match at least one tile entry in the input RunInfo tile list. Every term for exclude-tiles must match at least one tile entry in the set produced by tiles if that option is also used, or the RunInfo tile list otherwise. This is to help ensure that the operator intent matches the programs interpretation.

DRAGEN ORA compression from BCL

BCL files can be converted into FASTQ.ORA using two different methods, which cannot be used at the same time. Choose one or the other method:

  • Method 1: Using command line without a sample sheet:

    • set the path to the directory that contains the compression reference and index file with the --ora-reference command; and

    • specify the type of DRAGEN ORA compression with the '--fastq-compression-format' command. The value can be either dragen for regular DRAGEN ORA compression or dragen-interleaved for DRAGEN ORA paired compression.

  • Method 2: Using command line with a sample sheet:

    • set the path to the directory that contains the compression reference and index file with the --ora-reference command; and

    • specify the type of DRAGEN ORA compression in the sample sheet. See Sample Sheet Settings for proper syntax.

The reference and index files for ORA compression are available via an archive to download on DRAGEN Software Support Site page.

For information about how to use FASTQ.ORA files see Input File Types.

Interleaved compression

The interleaved DRAGEN ORA compression improves the compression up to 10% vs. DRAGEN ORA regular compression. To enable it set --fastq-compression-format to dragen-interleaved. The paired-read file from the nth line of fastq-list.csv generated by the BCL convert tool are then compressed together into a single fastq.ora file with name <filename before "R">-interleaved<_suffix>.fastq.ora (<_suffix> is optional). If decompressing an ORA file that contains paired data, the file is automatically decompressed to two separate files. To map an ORA file that contains paired interleaved data with the DRAGEN mapper, use the --interleaved option during map/align.

Command line examples

The following example command contains the required BCL conversion options to run regular DRAGEN ORA compression from BCL: dragen --bcl-conversion-only true --bcl-input-directory <...> --sample-sheet <...> --ora-reference <...> --fastq-compression-format dragen --output-directory <...>

The following example command contains the required BCL conversion options to run interleaved DRAGEN ORA compression from BCL: dragen --bcl-conversion-only true --bcl-input-directory <...> --sample-sheet <...> --ora-reference <...> --fastq-compression-format dragen-interleaved --output-directory <...>

Sample Sheet

A sample sheet (SampleSheet.csv) records information about samples and their corresponding indexes, and settings that dictate the behavior of the software. The default location of the sample sheet is the input folder. To specify an alternative file location, use the command --sample-sheet <file-path>. When a sample sheet does not exist in the default location and no sample sheet is specified in the command line, the software produces an error unless the '--no-sample-sheet true' option is specified (provided for legacy applications with no demultiplexing, adapter trimming, or other sample-sheet-specified settings supported).

Sample Sheet Versions

BCL Convert and DRAGEN support two sample sheet verions: v1 and v2. The following table displays the different supported options for v1 and v2:

Sample Sheet v1Sample Sheet v2

Supports both [Settings] and [settings]. Neither are required.

Supports only [BCLConvert_Settings]. Required.

Unrecognized settings trigger a warning.

Unrecognized settings produce an error and analysis aborts.

Sample Sheet Settings Section

In addition to the command line options that control the behavior of BCL conversion, you can use the [Settings] section in the sample sheet configuration file to specify how the samples are processed. The following are the sample sheet settings for BCL conversion.

Note that DRAGEN does not support the following sample sheet settings from bcl2fastq:

  • ReverseComplement

OptionDefaultValueDescription

AdapterBehavior

trim

trim, mask

Whether adapter should be trimmed or masked.

AdapterRead1

None

Read 1 adapter sequence containing A, C, G, or T

The sequence to trim or mask from the end of Read 1. Can only be specified if the first genomic read is included according to the RunInfo.xml or OverrideCycles.

AdapterRead2

None

Read 2 adapter sequence containing A, C, G, or T

The sequence to trim or mask from the end of Read 2. Can only be specified if the second genomic read is included according to the RunInfo.xml or OverrideCycles.

AdapterStringency

0.9

Float between 0.5 and 1.0

The stringency for matching the read to the adapter using the sliding window algorithm. Can only be specified if AdapterRead1 or AdapterRead2 is specified.

BarcodeMismatchesIndex1

1

0, 1, or 2

The number of allowed mismatches between the first Index Read and index sequence. Can only be specified when index 1 is present and used for demultiplexing for all samples according to the index column and the OverrideCycles setting.

BarcodeMismatchesIndex2

1

0, 1, or 2

The number of allowed mismatches between the second Index Read and index sequence.

MinimumTrimmedReadLength

The minimum of 35 and the shortest non-indexed read length.

0 to the shortest non-indexed read length

Reads trimmed below this point become masked at that point. Can only be specified when index 2 is present and used for demultiplexing for all samples according to the index column and the OverrideCycles setting.

MinimumAdapterOverlap

1

1, 2, or 3

Do not trim detected adapter sequences shorter than this value

MaskShortReads

The minimum of 22 and MinimumTrimmedReadLength.

0 to MinimumTrimmedReadLength

Reads trimmed below this point become masked out.

OverrideCycles

None

Y: Specifies a sequencing read I: Specifies an indexing read U: Specifies a UMI length to be trimmed from read

String used to specify UMI cycles and mask out cycles of a read.

TrimUMI

true

true or false (or 1/0)

If set to 'false', UMI sequences are not trimmed from output FASTQ reads. The UMI is still placed in sequence header. Can only be enabled (1) if a UMI is present for at least 1 sample according to the OverrideCycles setting.

CreateFastqForIndexReads

false

true or false (or 1/0)

If set to 'true', output FASTQ files for index reads as well as genomic reads. Can only be enabled when an index is present and used for demultiplexing according to the index/index2 columns and the OverrideCycles setting.

NoLaneSplitting

false

true or false

If set to true, output all lanes of a flow cell to the same FASTQ files consecutively.

FastqCompressionFormat

gzip

gzip, dragen, dragen-interleaved

Define compression format: If value is gzip, output FASTQ.GZ. If value is dragen, output FASTQ.ORA not interleaving paired reads. If value is dragen-interleaved, output FASTQ.ORA interleaving paired reads in a single FASTQ. You need to specify the directory of DRAGEN ORA reference files via the command line with --ora-reference. This feature can also be controlled all from the command line options.

FindAdaptersWithIndels

false

true or false

Use single-indel-detection adapter trimming (for matching default bcl2fastq2 behavior)

IndependentIndexCollisionCheck

empty

Integer between 1 and the number of lanes that exist according to the RunInfo.xml

Semi-colon-separated list of lanes which will use stricter validation. When enabled for any given lane, a barcode collision among samples in the corresponding lane(s) will be identified if at least one index (index or index2) have a collision. When disabled (default), a barcode collision among samples in the corresponding lane(s) will be identified if both indices (index and index2) have a collision.

LibraryInputVolume

empty

real number

Input volume specified enables metrics output that assists in rebalancing operations. Output is given in the same unit as the input.

OverrideCycles

The OverrideCycles mask elements are semicolon separated. The OverrideCycles setting can be specified in one of the following formats, where the two formats cannot be mixed:

Order Dependent: OverrideCycles,U7N1Y143;I8;I8;U7N1Y143 Order Independent (examples): OverrideCycles,R1:U7N1Y143;I1:I8;I2:I8;R2:U7N1Y143 OverrideCycles,R1:U7N1Y143;R2:U7N1Y143;I1:I8;I2:I8

DRAGEN supports flexible UMI processing during BCL conversion to support more third-party assays, including UMI sequences in index reads and multiple UMI regions per read. UMI sequences are trimmed from FASTQ read sequences and placed in the sequence identifier for each read, as normal.

The following are examples of OverrideCycles settings using 2x151 reads:

Order Dependent SettingOrder Independent SettingDescription

OverrideCycles,U7N1Y143;I8;I8;U7N1Y143

OverrideCycles,R1:U7N1Y143;R2:U7N1Y143;I1:I8;I2:I8

UMI is comprised of the first 7 bps of each genomic read, linked by 1 bps of ignored sequence. This is the format for Illumina non-random UMIs, used in the following products: TruSight Oncology 170 RUO TruSight Oncology 500 RUO IDT for Illumina - UMI Index Anchors

OverrideCycles,Y151;I8;U10;Y151

OverrideCycles,R1:Y151;R2:Y151;I1:I8;I2:U10

Index Read 2 is a 10bps UMI. This is the format for Agilent XT HS.

OverrideCycles,Y151;I8U9;I8;Y151

OverrideCycles,R1:Y151;R2:Y151;I1:I8U9;I2:I8

Index Read 1 contains both an index and a 9bps UMI. This is the format for IDT Dual Index Adapters with UMIs.

OverrideCycles,U3N2Y146;I8;I8;U3N2Y146

OverrideCycles,R1:U3N2Y146;R2:U3N2Y146;I1:I8;I2:I8

UMI is comprised of the first 3 bps of each genomic read, linked by 2 bps of ignored sequence. This is the format for UMIs in SureSelect XT HS 2 and IDT xGen Duplex Seq Adapter

OverrideCycles,Y151;I8;I8;U10N12Y127

OverrideCycles,R1:Y151;R2:U10N12Y127;I1:I8;I2:I8

UMI is at the beginning of Read 2, attached with a linker sequence of length 12.

No lane splitting

When using --no-lane-splitting true or the corresponding sample sheet setting NoLaneSplitting,true, DRAGEN FASTQ file name convention and FASTQ contents match bcl2fastq2 for the same feature.

DRAGEN only supports this mode when no 'Lane' column is specified in the sample sheet to make sure that all samples are present in all lanes in the same listed order. This is generally expected for flow cells with no fluidic boundaries between lanes.

IndependentIndexCollisionCheck

When this mode is enabled for a lane, each index (i7 & i5) must individually fully resolve to a single barcode within mismatch tolerances, whereas by default ambiguities can be resolved by the other index. Combinatorial (exact) matches are still allowed. See 'Demultiplexing' section below for further clarification.

Sample Sheet Data Section

The data section is required. Headers for the data section should be [Data] or [data] for sample sheet v1 and [BCLConvert_Data] for sample sheet v2. The following data section headings are supported:

ColumnDescription

Lane

When specified, the sample and its index apply only to this lane, rather than all lanes. Only one integer allowed.

Sample_ID

The sample ID.

index

The index 1 (i7) barcode sequence. Length of string must match number of first index cycles in RunInfo.xml or number specified in OverrideCycles. Reverse-complement of listed sequence is used if RunInfo has an IsReverseComplement tag with value ‘Y’. A maximum of 27 bases is allowed.

index2

The index 2 (i5) barcode sequence. Length of string must match number of second index cycles in RunInfo.xml or number specified in OverrideCycles. Reverse-complement of listed sequence is used if RunInfo has an IsReverseComplement tag with value ‘Y’. A maximum of 27 bases is allowed.

Sample_Project

If present, and --bcl-sampleproject-subdirectories true command line is used, then output FASTQ files to subdirectories based upon this value

Sample_Name

If present, and both --sample-name-column-enabled true and --bcl-sampleproject-subdirectories true command lines are used, then output FASTQ files to subdirectories based upon Sample_Project and Sample_ID, and name fastq files by Sample_Name

Per Sample Settings

DRAGEN/bcl-convert 4.1 and later supports the following settings as columns in the [BCLConvert_Data] section, allowing them to be specified differently for each sample: OverrideCycles, BarcodeMismatchesIndex1, BarcodeMismatchesIndex2, AdapterRead1, AdapterRead2, AdapterBehavior, AdapterStringency.

These per-sample settings can be specified by omitting the setting from the [BCLConvert_Settings] section and instead adding a column to the [BCLConvert_Data] section with that setting name. Settings that do not apply to a sample (e.g. 'index2' if i5 is masked out for that sample) must be blank or 'na' in the entry for that sample.

This feature is only supported on version two (v2) sample sheets, and no setting can be specified both globally and per-sample. Specifying OverrideCycles differently per-sample allows mixing of different pools into the same lane, but must still obey barcode mismatch constraints for all cycles that are used for demultiplexing by any sample in that lane. DRAGEN software will detect all conflicts between samples at the beginning of the conversion run, even between different pools.

Different strategies such as UMI indexes and dual-index inputs can be combined, provided IndependentIndexCollisionCheck is not enabled. Below is an example sample sheet using per-sample-settings for illustration:

[Header] FileFormatVersion,2

[BCLConvert_Settings] AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

[BCLConvert_Data] Sample_ID,index,index2,OverrideCycles 21599,ATAGAGGC,TATAGCCT,Y151;I8;I8;Y151 21600,na,ATAGAGGC,Y151;U8;I8;Y151 21601,GGCTCTG,CCTATCC,Y151;I7N1;I7N1;Y151 21602,ATTACTCG,GGCTCTGA,Y151;I8;I8;U10Y141

Sample Sheet Obsolete Settings

BCL Convert does not support the following settings, and new formats must replace their corresponding old formats, when applicable. Manual changes to the sample sheet can be made to the [Settings] section, but the [Data] section must remain unchanged. If any of the obsolete settings are used in the command line or the sample sheet, the software aborts and returns an error. Also note that some obsolete settings that were previously specified on the command line are now correctly specified in the sample sheet.

Adapter Behavior and Specifications

BehaviorObsolete Sample Sheet SettingsNew Sample Sheet Settings

Designate the adapter sequences for Read 1 and Read 2 and specify the behavior as trim.

Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA OR TrimAdapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

Designate the same adapter sequence for Read 1 and Read 2 and specify the behavior as mask.

MaskAdapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterBehavior,mask

Designate the adapter sequences for Read 1 and Read 2 and specify the behavior as mask.

MaskAdapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA OR MaskAdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT AND AdapterBehavior,mask

Designate the adapter sequences for Read 1 and Read 2 and specify the behavior as trim. Also specify 0.5 as the adapter stringency.

Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA OR TrimAdapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA (command line) --adapter-stringency 0.5

AdapterRead1,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterRead2,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA AND AdapterStringency, 0.5

Read Trimming

BehaviorObsolete Sample Sheet SettingsNew Sample Sheet Settings

Trim the first 7 bases and last 6 bases of Read 1 for a 151 x 8 x 8 x 151 run.

Read1StartFromCycle,8 Read1EndWithCycle,145

OverrideCycles,N7Y137N6;I8;I8;Y151

UMI Specification

BehaviorObsolete Sample Sheet SettingsNew Sample Sheet Settings

Designate the first 8 cycles of Read 1 and Read 2 as UMIs and trim the trailing base for a 151 x 8 x 8 x 151 run.

Read1UMIStartFromCycle,1 Read1UMILength,8 Read1StartFromCycle,10 Read2UMIStartFromCycle,1 Read2UMILength,6 Read2StartFromCycle,9

OverrideCycles,U8N1Y142;I8;I8;U6N2Y142

Barcode Mismatches

BehaviorObsolete Command Line SettingsNew Sample Sheet Settings

Allow 1 mismatch in the i7 index sequence and 1 mismatch i5 index sequence.

--barcode-mismatches 1 OR --barcode-mismatches 1,1

BarcodeMismatchesIndex1, 1 AND BarcodeMismatchesIndex2, 1

Allow 2 mismatches in the i7 index sequence and 2 mismatches in the i5 index sequence.

--barcode-mismatches 2 OR --barcode-mismatches 2,2

BarcodeMismatchesIndex1, 2 AND BarcodeMismatchesIndex2, 2

Masking of Trimmed Reads

BehaviorObsolete Command Line SettingsNew Sample Sheet Settings

Make sure that all trimmed reads are at least 10 base pairs long after adapter trimming by appending Ns to any read shorter than 10 base pairs.

--minimium-trimmed-read-length 10

MinimumTrimmedReadLength, 10

Make sure that all trimmed reads below 5 base pairs long are masked with Ns.

--mask-short-adapter-reads, 5

MaskShortReads, 5

Run Instructions

Some additional instructions on running BCL Convert or DRAGEN for BCL conversion.

nohup

It is recommended to use nohup or other protection when executing BCL conversion via the command line in order to prevent a disconnection or terminal closure from terminating the process. This is done by beginning the command line with nohup before the executable you wish to run.

Ulimit Settings

BCL Convert requires high ulimit settings for both the number of open files allowed and maximum user processes. If a run fails due to maximum user processes being set too low, an error message stating "resource temporarily unavailable" occurs. By default, BCL Convert attempts to set the ulimit soft limit for the number of open files (ulimit -n) to 65535 and the maximum user processes to 32768. If those values exceed the hard limits of the system, the soft limit is set to the hard limit. If more than 10,000 samples are provided, then ulimit -n is set to 720000.

Missing File Handling

If --strict-mode is set to false, BCL Convert executes certain behaviors when it finds missing or corrupt files, rather than abort operation. The following are the possible behaviors according to file type and status.

File TypeStatusBehavior

*.bcl

Missing or corrupt

All base calls of the cycle in the corresponding lane and tile are replaced with N and a quality score of #.

*.cbcl

Missing or corrupt

All base calls of the cycle in the corresponding lane and surface are replaced with N and a quality score of #.

*.locs

Missing or corrupt

Produce FASTQ files with artificial position data for all reads in the corresponding lane and tiles.

*.filter

Missing or corrupt

No FASTQ entries produced for any reads in the corresponding lane and tiles.

*.bci lane

Missing or corrupt

No FASTQ entries produced for any reads in the corresponding lane and tiles.

Analysis Methods

Demultiplexing

BCL Convert produces one FASTQ file for each sample for each lane and read. Demultiplexing behaviors are as follows:

  • When a sample sheet contains multiplexed samples, the software:

    • Places reads without a matching index adapter sequence in Undetermined_S0.fastq.

    • Places reads with valid index adapter sequences in the sample FASTQ file.

  • When a sample sheet contains one unindexed sample, all reads are placed in the sample FASTQ files (one each for Read 1 and Read 2).

  • All reads that do not demultiplex to the samples defined in the Data section of the sample sheet are placed in Undetermined_S0.fastq per lane.

  • When the Lane column in the Data section is not used, all lanes are converted. Otherwise, only populated lanes are converted.

Reverse Complement

BCL Convert will demultiplex indices according to the orientation in which the sequencer evaluated the index read(s). A flag in the object of the RunInfo.xml specifies whether each index read was sequencing in the forward or reverse orientation. The flag may be present or not present depending on the sequencing instrument. If the flag is absent, BCL Convert will interpret the index sequences as specified. If present:

  • “IsReverseComplement” flag will be specified as “Y” if the index read was sequenced in the reverse orientation,

  • “IsReverseComplement” flag will be specified as “N” if the index read was sequenced in the forward orientation. BCL Convert will do the following when “Y” is specified for the “IsReverseComplement” flag for an index sequence:

  • The software will reverse the sequence and generate the complement base pair as the index read, where A=T, C=G, and N=N

  • The OverrideCycles value specified will be reversed for the corresponding index read before the reverse complement is taken

  • If a UMI is specified in the corresponding index read, the “r” character will be added at the beginning of the UMI sequence written in the Read Name of the FASTQ file

  • A log message will be displayed indicating that the reverse complement was used for the corresponding index read

Combined vs Independent Index Validation

DRAGEN/bcl-convert version 3.10 introduced a stricter barcode validation system that required each index in a dual-index setup to independently resolve against other samples at the index's mismatch tolerance, rather than allowing the combination of indexes to resolve in the case of a conflict in i7 or i5 individually. This was incompatible with previous versions of DRAGEN/bcl-convert and with bcl2fastq2, but is a more strict validation that may be better suited to the accuracy requirements of unique-dual applications.

In DRAGEN/bcl-convert 4.1.0, we briefly introduced an option, specified per lane, to enable a more relaxed validation compatible with bcl2fastq2 and earlier versions of dragen/bcl-convert. The 'CombinedIndexCollisionCheck' setting enabled relaxed validation on a per-lane basis. However, for DRAGEN/bcl-convert 4.1.6, it was decided to make relaxed validation the default behavior due to its long history, and we instead now introduce a setting to enable stricter validation. The 'IndependentIndexCollisionCheck' setting enables strict validation on the given semi-colon-separated list of lanes. The example below sets lanes 1, 3, & 4 to strict validation mode:

[BCLConvert_Settings] CombinedIndexCollisionCheck,1;3;4

Note that the short-lived 'CombinedIndexCollisionCheck' setting is not supported in DRAGEN/bcl-convert 4.1.6+ and will produce an error if used.

UMI Trimming

The software is capable of trimming unique molecular identifier (UMI) sequences from the genomic or index sequences. The cycles of the sequencing read that correspond to the UMI are specified in the OverrideCycles parameter in the Settings section of the sample sheet. See the Settings Section to set the OverrideCycles parameter.

The following are details of the behavior of reads specified as UMIs:

  • UMIs are trimmed from the sequence by default. Use the TrimUMI setting in the Sample Sheet to include UMIs.

  • UMI sequence can be specified in the index and genomic reads. More than one UMI sequence can be specified per read.

  • The specified UMI cycles are applied to all clusters. There is no mechanism to apply UMI based on lane or sample.

  • UMI sequences can only be specified at the beginning and end of sequencing and index reads. UMIs cannot be located in the middle of a read.

Adapter Trimming and Masking

The software can mask or trim user specified adapter sequences from read data so that those adapter sequences are not passed to any downstream analysis steps. Additional details of the adapter handling capabilities are as follows:

  • The software masks the identified adapter sequence with N so that the overall read length is constant across all clusters in the read.

  • The software trims the identified adapter sequence from the read. The length for each cluster may vary due to trimming.

  • The software assumes that input adapter sequences can only contain A, C, G, or T.

Output Files

FASTQ Files

As converted versions of BCL files, FASTQ files are the primary output of BCL Convert. Like BCL files, FASTQ files contain base calls with associated Q-scores. Unlike BCL files, which contain per‑cycle data, FASTQ files contain the per-read data that most analysis applications require.

The software generates one FASTQ file for every sample, read, and lane. For example, for each sample in a paired-end run, the software generates two FASTQ files: one for Read 1 and one for Read 2. In addition to these sample FASTQ files, the software generates two FASTQ files per lane containing all unknown samples. FASTQ files for Index Read 1 and Index Read 2 are not generated because the sequence is included in the header of each FASTQ entry.

  • If Sample_Name and Sample_Project are both present, and both --sample-name-column-enabled true and --bcl-sampleproject-subdirectories true command lines are used, then output FASTQ files to subdirectories based upon Sample_Project and Sample_ID, and name fastq files by Sample_Name. The same project directory contains the files for multiple samples.

  • If the Sample_ID and Sample_Name columns are specified but do not match, the FASTQ files reside in a subdirectory where files use the Sample_Name value.

  • Reads with unidentified index adapters are recorded in one file named Undetermined_S0_. If a sample sheet includes multiple samples without specified index adapters, the software displays a missing barcode error and ends the analysis.

  • NOTE : The software allows one unindexed sample because identification is not necessary to sequence one sample. However, sequencing multiple samples requires multiplexing so the samples can be identified for analysis.

File Names

The file name format is constructed from fields specified in the sample sheet. The format is as follows.

  • <Sample_ID>_S#_L00#_R#_001.fastq.gz

  • <Sample_ID>—The ID of the sample provided in the sample sheet.

  • S1—The number of the sample based on the order that samples are listed in the sample sheet, starting with 1. In the example, S1 indicates that the sample is the first sample listed for the run.

  • NOTE : Reads that cannot be assigned to any sample are written to a FASTQ file as sample number 0 and excluded from downstream analysis.

  • L001—The lane number of the flow cell, starting with lane 1, to the number of lanes supported. R1—The read. In the example, R1 indicates Read 1. R2 indicates Read 2 of a paired-end run. 001—The last portion of the file name is always 001.

File Format

FASTQ files are text-based files that contain base calls with corresponding Q-scores for each read. Each file has one 4-line entry:

  • A sequence identifier with information about the run and cluster, formatted as:

    @Instrument:RunID:FlowCellID:Lane:Tile:X:Y:UMI Read:Filter:0:IndexSequence or SampleNumber

  • Note: If a UMI is specified in an index read when “isReverseComplement” exists in the RunInfo.xml, the “r” character will be added at the beginning of the UMI sequence written in the Read Name of the FASTQ file

  • The sequence (base calls A, G, C, T, and N, for unknown bases).

  • A plus sign (+) that functions as a separator.

  • The Q-score using ASCII 33 encoding (see Quality Score Encoding).

Sequence Identifier Fields

FieldDescription

@

Each sequence identifier line starts with @.

instrument

The instrument ID.

run ID

The run number on the system.

flow cell ID

The flow cell ID.

lane

The flow cell lane number.

tile

The flow cell tile number.

x_pos

The X coordinate of the cluster.

y_pos

The Y coordinate of the cluster.

UMI

Optional. The UMI sequence (A, G, C, T, and N). When the sample sheet specifies UMIs, a plus sign separate4s the Read 1 and Read 2 sequences.

read

1 - Read 1, which is the first read of a paired-end run or the only read of a single-read run. 2 - Read 2, which is the second read of a paired-end run.

is filtered

N - No failed reads are included.

control number

0 - Control bits are not turned on.

index sequence or sample number

The Index Read sequence (A, G, C, T, and N. If the sample sheet indicates indexing, the index adapter sequence is appended to the end of the read identifier. If indexing is not indicated (one sample per lane), the sample number is appended to the read identifier.

A complete FASTQ file entry resembles the following example:

`@SIM:1:FCX:1:2106:15337:1063:GATCTGTACGTC 1:N:0:ATCACGGATCTGTACGTCTCTGCNTCACCTCCACCGTGCAACTCATCACGCAGCTCATGCCCTTCGGCTGCCTCCTGGACTA + CCCCCGGGGGGGGGGGG#:CFFGFGFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGFGGG`

This behavior can be altered with the CreateFastqForIndexReads and NoLaneSplitting options (see Sample Sheet Settings section above).

Log Files

BCL conversion outputs log files to the Logs/ output subfolder. These include three separate files, Info.log, Warnings.log, and Errors.log, for three increasing levels of severity. All output to these files is also written to the terminal console: Info is written to standard-out, while Errors and Warnings are written to standard-error.

In addition, the file "FastqComplete.txt" is created in the Logs/ subfolder when conversion is complete. This can be used to trigger subsequent action if desired.

BCL Metrics Output

DRAGEN BCL conversion outputs metrics in CSV format to the Reports/ output subfolder. Information provided includes metrics files for demultiplexing, quality scores, adapter sequence trimming, index-hopping detection (for unique-dual indexes only), and the top unmapped barcodes for each lane. In addition, the sample sheet and RunInfo.xml file used during conversion is copied into the Reports/ subdirectory for reference.

Demultiplex Metrics Output File

The following information is included in the Demultiplex_Stats.csv output file.

ColumnDescription

Lane

The lane for each metric.

SampleID

The contents of Sample_ID in the sample sheet for this sample.

Index

The contents of index in sample sheet for this sample. For dual-index, the value concatenated with index2.

# Reads

The total number of pass-filter reads mapping to this sample for the lane.

# Perfect Index Reads

The number of mapped reads with barcodes that match the indexes provided in the sample sheet exactly.

# One Mismatch Index Reads

The number of mapped reads with barcodes matched with exactly one base mismatched.

# Two Mismatch Index Reads

The number of mapped reads with barcodes matched with exactly two bases mismatched.

% Reads

The percentage of pass-filter reads mapping to this sample for the lane.

% Perfect Index Reads

The percentage of mapped reads with barcodes that match the indexes provided in the sample sheet exactly.

% One Index Reads

The percentage of mapped reads with barcodes matched with exactly one base mismatched.

% Two Index Reads

The percentage of mapped reads with barcodes matched with exactly two bases mismatched.

Quality Metrics Output File

The following information is included in the Quality_Metrics.csv output file.

ColumnDescription

Lane

The lane number this metric line refers to.

SampleID

The contents of Sample_ID in the sample sheet for this sample.

index

The contents of index in sample sheet for this sample.

index2

The contents of index2 in the sample sheet for this sample.

ReadNumber

The read number this metric line refers to.

Yield

The total number of bases mapping to the sample in this read.

YieldQ30

The total number of bases with quality score >= 30 mapping to the sample in this read.

QualityScoreSum

The sum of quality scores of bases mapping to the sample in this read.

Mean Quality Score (PF)

The mean quality score of bases mapping to the sample in this read.

% Q30

The percentage of bases with quality score >= 30 mapping to the sample in this read.

Adapter Metrics Output File

The following information is included in the Adapter_Metrics.csv output file.

ColumnDescription

Lane

The lane number this metric line refers to.

Sample_ID

The contents of Sample_ID in the sample sheet for this sample.

index

The contents of index in sample sheet for this sample.

index2

The contents of index2 in the sample sheet for this sample.

ReadNumber

The read number this metric line refers to.

AdapterBases

The total number of bases trimmed as adapter from the read in the sample.

SampleBases

The total number of bases not trimmed from the read in the sample.

% Adapter Bases

The percentage of bases trimmed as adapter from the read in the sample.

Index Hopping Metrics Output File

For unique dual index inputs, the Index_Hopping_Counts.csv file provides the number of reads mapping to every possible combination of provided index and index2 values, including via mismatch tolerance. The metrics provide visibility into any index-hopping behavior that might be occurring. The samples with both index and index2 values present in the sample sheet are present in the index hopping file for reference. The following information is included in the Index_Hopping_Counts.csv output file.

ColumnDescription

Lane

The lane for each metric.

SampleID

If the index combination corresponds to a sample, the contents of Sample_ID in the sample sheet for this sample.

index

The contents of index in sample sheet for the sample.

index2

The contents of index 2in sample sheet for the sample.

# Reads

The total number of pass-filter reads mapping to the index and index2 combination.

% of Hopped Reads

The percentage of hopped pass-filter reads mapping to the index and index2 combination.

% of All Reads

The percentage of all pass-filter reads mapping to the index and index2 combination.

Top Unknown Barcodes Metrics Output File

Th Top_Unknown_Barcodes.csv file lists the most commonly-encountered barcode sequences in the flow cell input that are not listed in the sample sheet. The 1000 most common unlisted sequences are listed, along with any other sequences with a frequency equivalent to the 1000th most commonly encountered sequence. The following information is included in the Top_Unknown_Barcodes.csv output file.

ColumnDescription

Lane

The lane for each metric..

index

The first index value of this unlisted sequence

index2

The second index value of this unlisted sequence

# Reads

The total number of pass-filter reads mapping to the index and index2 combination.

% of Unknown Barcodes

The percentage of unknown pass-filter reads mapping to the index and index2 combination.

% of All Reads

The percentage of all pass-filter reads mapping to the index and index2 combination.

Per-cycle Adapter Metrics

The following information is included in the Adapter_Cycle_Metrics.csv output file.

ColumnDescription

Lane

The lane number this metric line refers to.

Sample_ID

The contents of Sample_ID in the sample sheet for this sample.

index

The contents of index in sample sheet for this sample.

index2

The contents of index2 in the sample sheet for this sample.

ReadNumber

The read number this metric line refers to.

Cycle

The cycle number this metric line refers to.

NumClustersWithAdapterAtCycle

The number of clusters where the adapter was detected to begin precisely at this cycle.

% At Cycle

The percentage of all clusters where the adapter was detected to begin precisely at this cycle.

Per-tile Metrics:

The format of Demultiplex_Tile_Stats.csv and Quality_Tile_Metrics.csv matches that of Demultiplex_Stats.csv and Quality_Metrics.csv, respectively, save that an additional column is added:

ColumnDescription

Tile

The tile numeral value this metric line refers to.

These files provide per-tile data rather than aggregated across the lane and read.

Library Rebalancing Stats

If the 'LibraryInputVolume' setting is provided in the sample sheet, then a 'LibraryRebalancing_Stats.csv' metrics file will also be output in the Reports subdirectory. This is provided for library and pooling QC on the iSeq 100 system. For each read group entry, the following columns are provided:

ColumnDescription

Lane

The lane for each metric. Present only if 'Lane' is a column in the sample sheet, otherwise lanes are combined.

SampleID

The contents of Sample_ID in the sample sheet for this sample.

index

The contents of index in sample sheet for this sample.

index2

The contents of index2 in the sample sheet for this sample.

# Reads

The total number of pass-filter reads mapping to this sample for the lane.

% Reads

The percentage of mapped pass-filter reads mapping to this sample for the lane (excluding Undetermined from consideration).

Rebalancing Factor

The ratio of the largest # of reads of any read group and the # of reads of this read group.

Rebalanced Input Volume

The rebalancing factor for this read group multiplied by the LibraryInputVolume setting.

Please see the following article for more information on use of ths file: https://knowledge.illumina.com/instrumentation/iseq-100/instrumentation-iseq-100-reference_material-list/000002698

Sample_Name and Sample_Project Columns

For the metrics files listed above (apart from Top_Unknown_Barcodes.csv), up to two additional columns may be added to each line if 'bcl-sampleproject-subdirectories' and/or 'sample-name-column-enabled' options are enabled:

ColumnDescription

Sample_Project

The Sample_Project value for the sample this metric line refers to.

Sample_Name

The Sample_Name value for the sample this metric line refers to.

FASTQ List Output File

The "fastq_list.csv" output file is located in the output folder with the FASTQ files. The files provides the associations between the sample indexes, lane, and the output FASTQ file names. For information on running DRAGEN using fastq_list.csv, see, lane, and the output fastq file names. The columns of each row are documented below, along with example entries from a test run. For more information on running DRAGEN using fastq_list.csv, see FASTQ CSV File Format.

ColumnDescription

RGID

Read Group

RGSM

Sample ID

RGLB

Library

Lane

Flow cell lane

Read1File

Full path to a valid FASTQ input file

Read2File

Full path to a valid FASTQ input file. Required for paired-end input. If not using paired-end input, leave empty,

The following is an example fastq_list.csv output file.

RGID,RGSM,RGLB,RGSS,Lane,Read1File,Read2File
AACAACCA.ACTGCATA.1,1,Lib_XL_347,P5,1,/home/user/dragen_bcl_out/1_S1_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/1_S1_L001_R2_001.fastq.gz
AATCCGTC.ACTGCATA.1,2,Lib_XL_347,X9,1,/home/user/dragen_bcl_out/2_S2_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/2_S2_L001_R2_001.fastq.gz
CGAACTTA.GCGTAAGA.1,3,Lib_XL_347,Op20,1,/home/user/dragen_bcl_out/3_S3_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/3_S3_L001_R2_001.fastq.gz
GATAGACA.GCGTAAGA.1,4,Lib_IL_955,Op20,1,/home/user/dragen_bcl_out/4_S4_L001_R1_001.fastq.gz,/home/user/dragen_bcl_out/4_S4_L001_R2_001.fastq.gz

In the above example, the operator added columns in the Data section of the sample sheet labelled 'RGLB' (Library) and 'RGSS' (a custom field with no pre-existing definition), and these values were passed through and assigned to each read group in the generated fastq_list.csv file. Secondary analysis with DRAGEN™ using the fastq-list input option will further retain these assignments into generated BAM files.

It is a good idea to overload RGLB and include valid values to this mandatory BAM tag. Custom tags can be used to add extended data to each read group.

Legacy Stats Output Files

When the “output-legacy-stats” command line option is enabled, DRAGENBCL Convert produces the following metrics to the Reports/legacy output subfolder. These files are identical to the bcl2fastq2.20 report files except for incidences where there was decreased accuracy, non-deterministic output, or incorrect output from bcl2fastq2.20.

##### ConversionStats File

The ConversionStats.xml file contains the lane number for each lane and the following information for each tile:

  • Raw Cluster Count Read Number

  • YieldQ30

  • Yield

  • QualityScore Sum

##### DemultiplexStats File

The DemultiplexingStats.xml contains the flow cell ID and project name. For each sample, index, and lane, the file lists the BarcodeCount, PerfectBarcodeCount, and OneMismatchBarcodeCount (if applicable).

##### Adapter Trimming File

The adapter trimming file is a text-based file that contains a statistics summary of adapter trimming for a FASTQ file. The file contains the fraction of reads with untrimmed bases for each sample, lane, and read number plus the following information:

  • Lane

  • Read

  • Project

  • Sample ID

  • Sample Name

  • Sample Number

  • TrimmedBases

  • PercentageOfBases(beingtrimmed)

##### FastqSummaryF1L# File

A FastqSummaryF1L#.txt file contains the number of raw and passed filter reads for each sample and tile in a lane. The number sign (#) indicates the lane number.

##### DemuxSummaryF1L# File

DemuxSummaryF1L#.txt files, where # indicates the lane number, are generated when the sample sheet contains at least one indexed sample. A file contains the percentage of each tile that each sample occupies. It also lists the 1000 most common unknown index adapter sequences and the total number of reads with each index adapter identified. NOTE : To improve processing speed, the total for each index adapter is based on an estimate from a sampling algorithm.

##### HTML Reports

HTML reports are generated from data in DemultiplexingStats.xml and ConversionStats.xml. The reports reside in Reports\html in the output directory or in the directory specified by the --reports-dir option.

The flow cell summary contains the following information:

  • Clusters(Raw) Clusters(PF)*Yield (MBases)

NOTE : For patterned flow cells, the number of raw clusters is equal to the number of wells on the flow cell.

The lane summary provides the following information for each project, sample, and index sequence specified in the sample sheet:

  • Lane#

  • Clusters(Raw)

  • %oftheLane

  • % Perfect Barcode

  • % One Mismatch

  • Clusters(Filtered)

  • Yield

  • % PF Clusters

  • %Q30Bases

  • Mean Quality Score

  • The Top Unknown Barcodes table in the HTML report provides the count and sequence for the 10 most common unmapped index adapters in each lane.

Resources and References

The BCL Convert support pages on the Illumina support site (https://support.illumina.com/) provide additional resources. These resources include training, compatible products, and other considerations. Always check support pages for the latest versions.

Last updated