Fractional (Raw Reads) Downsampling

DRAGEN can subsample a random, fractional percentage of reads from an input file using the fractional downsampler. You can use downsampling to subsample data sets in order to simulate different amounts of sequencing. DRAGEN randomly subsamples reads from primary analysis without any modification (e.g. no trimming, no filtering, etc.).

To enable fractional downsampling, set the --enable-fractional-down-sampler command line option to true.

Any valid sequencing data format that is compatible with the DRAGEN Host Software can be used. For more information on compatible input options, see Input Options.

Determining an Appropriate Downsampling Fraction

DRAGEN generates metrics that can be used to determine an appropriate downsampling fraction:

  • DRAGEN BCL Demux generates a 'Demultiplex_Stats.csv' file that contains the '# Reads' column — i.e., the number of pass-filter read fragments (read pairs) for each sample and lane.

  • DRAGEN Mapping and Aligning generates a '.mapping_metrics.csv' file that contains the 'Total input reads', i.e., the total number of reads (not pairs) present in the original input files.

The fractional downsampling ratio can be estimated from:

  • estimated original coverage = (Total number of reads [not pairs] * estimated read length) / genome size or enrichment region

  • downsampling fraction = estimated original coverage / desired coverage

Adjustments may be required for samples with a high fraction of duplicate-marked reads or short fragments with overlapping mates.

Command Line Options

In addition to enabling the fractional downsampling command line option, you must set the subsample fraction to downsample. To set the subsample fraction, use --down-sampler-normal-subsample and/or --down-sampler-tumor-subsample depending on the input files.

You can also specify a seed using --down-sampler-random-seed to generate different subsamples of the input data set.

Option
Description

--enable-fractional-down-sampler

Set to true to enable fractional downsampling. The default value is false.

--down-sampler-normal-subsample

Specify the fraction of reads to keep as a subsample of normal input data. The default value is 1.0 (100%).

--down-sampler-tumor-subsample

Specify the fraction of reads to keep as a subsample of tumor input data. The default value is 1.0 (100%).

--down-sampler-random-seed

Specify the random seed for different runs of the same input data. The default value is 42.

Last updated

Was this helpful?