Read Trimming

DRAGEN can remove artifacts from reads using hardware accelerated read trimming. Hardware accelerated read trimming is available on U200 and cloud systems, as part of the DRAGEN mapper and adds no additional run time. DRAGEN provides multiple independent trimming filters that target different types of artifacts or use cases. You can enable and configure the artifacts or use cases independently to tailor the read-trimming to your analysis. Read trimming uses two different modes, hard-trimming and soft-trimming.

To enable hard-trimming mode, use --read-trimmers. In hard-trimming mode, potential artifacts are removed from input reads. Reads that are trimmed to fewer than 20 bases are filtered and replaced with a placeholder read that uses 10 N bases. DRAGEN assigns the filtered reads a 0x200 flag set.

DRAGEN contains a novel lossless soft-trimming mode. In soft-trimming mode, reads are mapped as though they had been trimmed, but no bases are removed. To enable the trimmer in soft mode, use --soft-read-trimmers.

Soft-trimming suppresses systematic mismapping of reads that contain trimmable artifacts, without actually losing the trimmed bases in aligned output. Soft-trimming prevents reads with trimmable artifacts, such as Poly-G artifacts, from being mapped to reference G homopolymers, or prevents adapter sequences from being mapped to the matching reference loci. Soft-trimming might map reads to different positions in the reference than they would have been if not using soft-trimming. When using soft-trimmed, DRAGEN does not filter reads and does not map reads with bases that would have been trimmed entirely.

Soft-trimming for Poly-G artifacts is enabled by default on supported systems.

Read Trimming Tools

Fixed-Length Trimming

Fixed-length trimming removes a fixed number of bases from the 5' end of each read. If you are analyzing sequencing data from an amplicon of fixed size and expect the read-length to consistently exceed the length of quality sequence data, you can use the expected number in fixed-length trimming.

Poly-G Trimming

Poly-G artifacts appear on two-channel sequencing systems when the dark base G is called after synthesis has terminated. As a result, DRAGEN calls several erroneous high-confidence G bases on the ends of affected reads. For contaminated samples, many affected reads can be mapped to reference regions with high G content. The affected reads can cause problems for processing downstream.

Quality Trimming

Base quality can degrade over the length of a read toward the 5' end and separate from any artifacts from early termination of synthesis. The lower quality bases can affect mapping and alignment results, and might lead to incorrect variant or methylation calls downstream. The quality trimming tool calculates a rolling average of the base quality inward from the 5' end and removes the minimum number of bases, so the average number of bases is above the threshold specified using --trim-min-quality.

Adapter Trimming

Problems during library preparation, or libraries with smaller inserts can result in the synthesis of high quality reads containing sequence from the adapters used. If not removed before analysis, noninsert bases can reduce mapping efficiency and downstream accuracy. The adapter trimming tool uses the adapter sequences from the input FASTA file, and then removes all hits greater than a specified size. Adapter trimming allows for a 10% mismatch. For 3' adapters, trimming is from the first matching adapter base to the end of the read. For 5' adapters, trimming is from the first (3') matching adapter base to the beginning (5') of the read.

Ambiguous Base Trimming

If quality trimming is not feasible due to reduced yield or other limitations, an alternative option is to remove only explicitly ambiguous bases from the ends of read. If enabled the ambiguous base trimmer applies a simple exact-match search to both ends of all processed reads, regardless of mate-pair status.

Minimum Length Trimming

You can maximize trimmer sensitivity, by using the minimum length trimming tool to remove a fixed number of bases from each read after the trimmer tools above have run. For example, if you would like to remove 5 bp from each read, a 7 bp adapter hit could be missed if five of the bases are removed first. To mitigate this issue, DRAGEN provides an optional minimum trim-length filter.

Maximum Length Trimming

If using libraries of fixed-size inserts, such as small PCR amplicons, it is more convenient to specify a length that all reads should be trimmed to rather than the number of bases to remove. You can use the maximum length trimming tool.

PolyA Tail Trimming

If using RNA libraries, reads overlapping the poly-A tail of the transcripts may contain long poly-A/poly-T sequences at the end of the reads which may result in incorrect alignment. The poly-A trimmer mitigates this by trimming the poly-A tail from the end of the read. See additional description in RNA alignment section.

Read Trimming Metrics

The trimmer generates a metrics file titled \<output prefix\>.trimmer_metrics.csv. Metrics are available on an aggregate level over all input data. The metrics units are in reads or bases.

  • Total input reads Total number of reads in the input files.

  • Total input bases Total number of bases in the input reads.

  • Total input bases R1 Total number of bases in R1 reads.

  • Total input bases R2 Total number of bases in R2 reads.

  • Average input read length Total number of input bases divided by the number of input reads.

  • Total trimmed reads Total number of reads trimmed by at least one base, not including soft-trimming.

  • Total trimmed bases Total number of bases trimmed, not including soft-trimming.

  • Average bases trimmed per read The number of trimmed bases divided by the number of input reads.

  • Average bases trimmed per trimmed read The number of trimmed bases divided by the number of trimmed reads.

  • Remaining poly-G K-mers R1 3prime The number of R1 3' read ends that contain likely Poly-G artifacts after trimming.

  • Remaining poly-G K-mers R2 3prime The number of R2 3' read ends that contain likely Poly-G artifacts after trimming.

  • Total filtered reads The number of reads that were filtered out during trimming.

  • Reads filtered for minimum read length R1 The number of R1 reads that were filtered due to being trimmed below the minimum read length.

  • Reads filtered for minimum read length R2 The number of R2 reads that were filtered due to being trimmed below the minimum read length.

  • <Trimmer tool> trimmed reads The number of reads with at least one base trimmed by TRIMMER. DRAGEN reports the metric for both R1 and R2 mates and the filtering status (unfiltered or filtered) of the trimmed read. The metric includes reads that were trimmed during soft-trimming. Each trimming tool above produces the metric.

  • <Trimmer tool> trimmed bases The number of bases trimmed by TRIMMER. The metric is produced for both R1 and R2 mates and the filtering status (unfiltered or filtered) of the trimmed read. The metric includes bases from reads that were trimmed during soft trimming. Each trimming tool above produces the metric.

Read Trimming Settings

Read trimmer

OptionDescription

--read-trimmers

To enable trimming filters in hard-trimming mode, set to a comma-separated list of the trimmer tools you would like to use (in the order of execution). To disable trimming, set to none. During mapping, artifacts are removed from all reads. The following are valid trimmer names:

  • fixed-len—Fixed-length trimming

  • polyg—Poly-G trimming

  • quality—Quality trimming

  • adapter—Adapter trimming

  • n—Ambiguous base trimming

  • min-len—Minimum length trimming

  • cut-end—Maximum length trimming

  • polya—RNA Poly-A tail trimming. See additional description in RNA alignment section

  • bisulfite—Bisulfite trimming

Read trimming is disabled by default (default: "none").

--soft-read-trimmers

To enable trimming filters in soft-trimming mode, set to a comma-separated list of the trimmer tools you would like to use (in the order of execution). To disable soft trimming, set to none. During mapping, reads are aligned as if trimmed, and bases are not removed from the reads. The following are the valid trimmer names.

  • fixed-len—Fixed-length trimming

  • polyg—Poly-G trimming

  • quality—Quality trimming

  • adapter—Adapter trimming

  • n—Ambiguous base trimming

  • min-len—Minimum length trimming

  • cut-end—Maximum length trimming

  • polya—RNA Poly-A tail trimming. See additional description in RNA alignment section

  • bisulfite—Bisulfite trimming

Soft-trimming is enabled for the polyg filter by default (default: "polyg").

--trimming-only

Disables mapping and alignment to run read-trimming only.

Filtering after the trimmer's execution

OptionDescription

--trim-min-length

Specify a minimum read length allowed after the trimmer execution. DRAGEN filters any reads with a length less than the value after all read-trimming steps are completed (default: 20).

--trim-min-len-read1

Specify a minimum read length allowed for read1 after the trimmer execution. DRAGEN filters any reads with a length of read1 less than the value after all read-trimming steps are completed (default: 20).

--trim-min-len-read2

Specify a minimum read length allowed for read2 after the trimmer execution. DRAGEN filters any reads with a length of read2 less than the value after all read-trimming steps are completed (default: 20).

--trim-filter-dummy-len

Specify the number of N bases in dummy reads that replace filtered reads (default: 10).

--trim-filter-set-flag

If enabled, dummy reads will have their 0x200 SAM flag set (default: true).

Fixed-length trimming

OptionDescription

--trim-r1-5prime

Specify a fixed number of bases to trim from the 5' end of Read 1 (default: 0).

--trim-r1-3prime

Specify a fixed number of bases to trim from the 3' end of Read 1 (default: 0).

--trim-r2-5prime

Specify a fixed number of bases to trim from the 5' end of Read 2 (default: 0).

--trim-r2-3prime

Specify a fixed number of bases to trim from the 3' end of Read 2 (default: 0).

Quality trimming

OptionDescription

--trim-min-quality

Specify the minimum read quality. DRAGEN trims bases from the 3' end of reads with a quality below the value.

--trim-quality-r1-5prime

Specify the quality cutoff below which to trim from the 5' end of read 1.

--trim-quality-r1-3prime

Specify the quality cutoff below which to trim from the 3' end of read 1.

--trim-quality-r2-5prime

Specify the quality cutoff below which to trim from the 5' end of read 2.

--trim-quality-r2-3prime

Specify the quality cutoff below which to trim from the 3' end of read 2.

Adapter trimming

OptionDescription

--trim-adapter-read1

Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 1.

--trim-adapter-read2

Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 2.

--trim-adapter-r1-5prime

Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 1. NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.

--trim-adapter-r2-5prime

Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 2. NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.

--trim-adapter-stringency

Specify the minimum number of adapter bases required for trimming (default: 4).

Bisulfite trimming

OptionDescription

--trim-bisulfite-ends

Enable both 5-Prime and 3-Prime bisulfite trimming.

--trim-bisulfite-5prime

If a 3' adapter was trimmed, trim an additional 2bp from the 3' end, unless the 5' end matches 'CAA' or 'CGA'".

--trim-bisulfite-3prime

If the 5' end matches 'CAA' or 'CGA', trim the first two of these 5' bases.

Minimum-length trimming

OptionDescription

--trim-min-r1-5prime

Specify the minimum number of bases to trim from the 5' end of Read 1 (default: 0).

--trim-min-r1-3prime

Specify the minimum number of bases to trim from the 3' end of Read 1 (default: 0).

--trim-min-r2-5prime

Specify the minimum number of bases to trim from the 5' end of Read 2 (default: 0).

--trim-min-r2-3prime

Specify the minimum number of bases to trim from the 3' end of Read 2 (default: 0).

Maximum-length trimming

OptionDescription

--trim-max-length

Specify the maximum number of bases that can be trimmed from the sequences of both reads.

--trim-max-len-read1

Specify the maximum number of bases that can be trimmed from the sequences of read1.

--trim-max-len-read2

Specify the maximum number of bases that can be trimmed from the sequences of read2.

PolyA trimming

OptionDescription

--trim-polya-min-trim

The minimum number of poly-As required for polya trimming (default: 20).

PolyG trimming

OptionDescription

--trim-polyg-kmer-len

How many bases to check at each read end for poly-G artifact detection (default: 25).

--trim-polyg-kmer-non-g

The maximum number of non-G bases in the K-mer for poly-G artifact detection (default: 2).

--trim-polyg-g-score-r1-5prime

The score for G bases on the 5' end of read 1 (default: 0).

--trim-polyg-g-score-r1-3prime

The score for G bases on the 3' end of read 1 (default: 15).

--trim-polyg-g-score-r2-5prime

The score for G bases on the 5' end of read 2 (default: 0).

--trim-polyg-g-score-r2-3prime

The score for G bases on the 3' end of read 2 (default: 15).

--trim-polyg-min-trim-r1-5prime

The minimum number of G's to trim from the 5' end of read 1 (default: 6).

--trim-polyg-min-trim-r1-3prime

The minimum number of G's to trim from the 3' end of read 1 (default: 6).

--trim-polyg-min-trim-r2-5prime

The minimum number of G's to trim from the 5' end of read 2 (default: 6).

--trim-polyg-min-trim-r2-3prime

The minimum number of G's to trim from the 3' end of read 2 (default: 6).

--trim-polyg-early-exit-threshold

The signed score threshold for poly-G trimming to exit early (default: -500).

PolyX trimming

OptionDescription

--trim-polyx-bases-r1-5prime

The bases to trim for polyX trimming from the 5' end of read 1 (default: empty string "" ).

--trim-polyx-bases-r1-3prime

The bases to trim for polyX trimming from the 3' end of read 1 (default: empty string "" ).

--trim-polyx-bases-r2-5prime

The bases to trim for polyX trimming from the 5' end of read 2 (default: empty string "" ).

--trim-polyx-bases-r2-3prime

The bases to trim for polyX trimming from the 3' end of read 2 (default: empty string "" ).

--trim-polyx-min-trim-r1-5prime

The minimum number of X's to trim from the 5' end of read 1 (default: 20).

--trim-polyx-min-trim-r1-3prime

The minimum number of X's to trim from the 3' end of read 1 (default: 20).

--trim-polyx-min-trim-r2-5prime

The minimum number of X's to trim from the 5' end of read 2 (default: 20).

--trim-polyx-min-trim-r2-3prime

The minimum number of X's to trim from the 3' end of read 2 (default: 20).

Last updated