Structural Variant IGV Tutorial

This tutorial covers how to visualize SVs with an input bam file, and shows examples of evidence reads supporting INS/DEL that are TP/FP. The examples and rules of thumb shared

in this tutorial show one of many ways of examining structural variants in IGV. There is no one-size-fits-all solution in determining the reason/cause for identifying FPs or missing a truth SV event. This tutorial is meant to serve as a guide to initiate investigation.

For tutorials on using IGV in general, see the official guide provided by the IGV team.

To understand DRAGEN SV input/outputs, see the user guide on SV calling.

Components in IGV view

Reference genome and coordinates (top section)
- Make sure to check the reference matches the SV run.
Loaded tracks (mid section)
Reference sequence and reference gene annotations (bottom section)
- Sequence track gives a sense of repetitiveness of the genomic region. Each nucleotide is colored differently.
- RefSeq Genes track is useful when determining the clinical significance of a variant.

Visualizing reads in a SV region using input/output from DRAGEN SV workflow

Make sure the reference genome matches the experiment.
(Assume map-align is enabled.) In DRAGEN SV output directory, load prefix.bam and prefix.sv.vcf.gz into IGV.

IGV configurations on alignments

In this section, we list a few configurations that are useful in highlighting read alignments with examples that shows the effect of these configurations. We will dive into more detailed examples with SV records in the next sections.

From the navigation bar on top, configure alignment views by "View > Preferences... > Alignments". Alternatively, right-clicking the alignment track shows the configuration menu too.
To show all reads with mismatches and soft-clipping, enable the following options:
aln_options
- The soft-clipping and mismatches highlight portions of reads that contain sequences that do not belong to the reference, and highlight the start position of these novel sequences. They potentially indicate the start of the variant and are often used as evidence for a variant breakpoint.
To view the alignments in read pairs, right-click the alignment track and select "View as pairs".
- In the example above, the SV breakpoint in the middle is supported by the soft-clipped reads (reads with colorful bases).
To fit the maximum number of alignments in the window, select "squished" in the right-click menu.
We can isolate and identify reads with unpaired mates, or mates mapped to different chromosomes by grouping alignments by "chromosome of mate".
- Note: Unmapped reads, or reads mapped to a different location are not shown in IGV even if their mates are mapped in the current window. In the example below, alignments are divided by groups that are categorized by the mapping position of the mate of the read in the current view. For example, the "UNMAPPED" group shows reads mapped with mates that are not mapped. (If you also enable "view as pairs", you will see that these reads are not connected to any other read.)
To identify abnormally oriented read pairs and read pairs with abnormal lengths, we can color the alignments by "pair orientation and insert size".
- The insert size is useful when viewing duplications and deletions. Insert sizes will be abnormally large for reads supporting deletions, and abnormally small for duplications.
- The orientation is useful in viewing evidence for inversions, duplications and translocations. (Details and examples on variant types are in later sections.)
- In the example below, we also grouped the alignments by pair orientation. In IGV, LR (left-right) is the normal orientation in most SBS library prep, which corresponds to FR (forward-reverse) in alternative naming schemes. "LR" means that the read on the upstream is mapped on the forward strand and the read on the downstream is mapped on the reverse strand. The "UNKNOWN" group refers to reads with unmapped mates.
A close-up view of RL, LR(normal), RR and LL orientations in IGV.
In the example below, the green color highlights read pairs that are in "RL" orientation. The blue color highlights read pairs that are too close to each other. The red color highlights read pairs that are too far from each other. The threshold of "too far" or "too close" can be set in preferences (next bullet point). This example shows a large tandem duplication.
To set the thresholds of highlighting based on read pair distance (insert size), right-click the alignment track and select "Set insert size options...".
insert_size_option
- By setting either the absolute threshold distance number (upper panel), or the percentile (lower panel), we can control how lenient we are at highlighting the abnormal read pairs.
- We normally use 5% and 95% as a default setting for insert size thresholds.
We can also color the reads by read strand to distinguish between forward and reverse reads from the right-click menu. By default, red means forward and blue/purple means reverse. This setting is not specific to SV types, but generally as a visualization technique to show strandness of alignments.
If two reads of a pair are far apart, we can still view them in details by zooming into both regions in split-screen mode. This can be done by right-clicking one alignment, and select "View mate region in split screen". This view is useful when viewing translocations or gene fusions.
split_screen_view
The selected alignment is highlighted in red in the above example.

Expected types of discordant reads supporting SV hypotheses

We generate SV hypothesis based on observation of "discordant" reads.

A discordant alignment refers to a read pair in which

the reads do not have the expected orientation
the distance between reads is too long/short
one or both of the reads contain clipped segment or large variant
one read is unmapped.

Each type of SV will have its own "signature" mix of discordant read pairs. Based on the types of discordant alignments, we can generate a hypothesis on the type of SV.

Example truth SV events of different SV types

The examples in this section are from an SV run on HG002 sample that has 50x coverage, pair-end reads with length 150 and an insert size of 570 bases.

Simple Insertions

In simple insertions (where we don't have repeated sequence on the flanks), we see a mix of clipped/unmapped reads. The orange color indicates novel sequence that does not belong to the reference. Thus, any orange sequence will be unmapped.

right mate (orange) unmapped
left mate (orange) unmapped
entire read pair missing (will not show in IGV)
right mate clipped on the right
left mate clipped on the left

When we view simple insertion loci in IGV, we can tell where the breakpoint is by the pileup of the soft-clipped reads:

Reads with unmapped mates in the region. IGV setting: group by chromosome of mate; color by read strand.

Reads with soft-clipped mates in the region. IGV setting: color by insert size and orientation; view as pairs.

Insertions with homologies on the flanks

Some insertions include duplicated flanking sequences, which results in noisier or more ambiguity breakpoints. In the illustration below (a), the first section of the inserted sequence is equal to a section of sequence on the right flank of the reference (colored purple). These sequences are recorded as HOMSEQ in VCF record.

As a result, reads mapped to this location will appear to be clipped at two positions on the left and right sides of HOMSEQ. This results in an ambiguous breakpoint location. In the IGV, the region corresponding to the HOMSEQ will appear to have a higher coverage, as reads covering both the start of the inserted sequence and the HOMSEQ flank will be mapped there. There will also be a "gap" between clipped parts of the alignments.

(In the called variant, it is arbitrary whether HOMSEQ is at the beginning or the end of the inserted sequence. It is a matter of representation since the ground truth is unknown.)

VCF record:

chr5    21991531    DRAGEN:INS:96350:0:0:0:0:0  T   TTATATATAATTGTTATATATATAACAATTATATATAACTATATATAACAATTA  999 PASS    SVTYPE=INS;SVLEN=53;CIGAR=1M53I;CONTIG=TTAGC...TGAGAGA;CIPOS=0,11;HOMLEN=11;HOMSEQ=TATATATAATT  GT:GQ:PL:PR:SR:SB:FS:VF 1/1:215:999,218,0:17,31:0,78:0,0,36,42:0:17,95

Tandem Duplication Insertions

Alignments to TANDUP events are often composed of several features: noisy alignments, mis-oriented pairs, and elevated coverage. In larger tandem duplication events, we see read pairs that are oriented abnormally, because the reads can be mapped to any unit of repeats. Similarly, we see reads with abnormal insert sizes as well. These events tend to be noisy in terms of alignments. In the example below, we also see an increase in coverage in the surrounding areas of the call.

chr1    248753016   DRAGEN:DUP:TANDEM:25794:0:0:8:0:0   C   CACACCA...ACACACCAC 999 PASS    SVTYPE=INS;SVLEN=806;DUPSVLEN=806;IMPRECISE;CIPOS=-519,519;CIEND=-496,496   GT:GQ:PL:PR:VF  0/1:999:999,0,999:117,13:117,13

Deletions

Deletions are supported by reads with unexpectedly large insert sizes (colored red if alignments are colored by insert size), and low coverage in the region. Again, we can confirm from the visualization that this is SV is a heterozygous variant.

VCF record:

chr17   41265460    DRAGEN:DEL:271637:0:3:0:0:0 TTGTGACCACCTGCAGCAGCACACCCTGCTGCCAGCCCTCCTGCTGTGTGTCCAGC...AC   T   999 PASS    SVTYPE=DEL;SVLEN=-10305;CIGAR=1M10305D;CONTIG=TGTG...ATT;CIPOS=0,68;CIEND=0,68;HOMLEN=68;HOMSEQ=TGTGACCA...CCTT GT:GQ:PL:PR:SR:SB:FS:VF 0/1:820:999,0,817:49,17:11,3:4,7,2,1:2.688:53,17

On a side note, the number of reads supporting the ref/alt alleles can be found in VF field in FORMAT column. In the example above, the number of reads support ref/alt are 53 and 17, respectively. For details on VF, check out SV Variant Allele Fraction (VAF) Calculation.

An example of a homozygous deletion where the number of reads supporting ref/alt are 1 and 77, respectively.

VCF record:

chr3    116556424   DRAGEN:DEL:63184:0:0:0:0:0  TAAAAAATC...AAAAAAAAAAA T   999 PASS    SVTYPE=DEL;SVLEN=-324;CIGAR=1M324D;CONTIG=ACTAAGG...TCATCTGA;CIPOS=0,13;HOMLEN=13;HOMSEQ=AAAAAATCTTCAG; GT:GQ:PL:PR:SR:SB:FS:VF 1/1:198:999,201,0:1,45:0,34:0,0,21,13:0:1,77

Inversions

Inversions are supported by reads that are mis-oriented, especially LL and RR.

The diagram below shows intuitively how an "LR" read becomes "LL" when the reference sequence is inverted.

VCF record:

chr12   12392078    DRAGEN:INV:212800:0:1:1:0:0 C   <INV> 999 PASS    END=12393533;SVTYPE=INV;SVLEN=1455;IMPRECISE;CIPOS=-612,613;CIEND=-450,451;EVENT=DRAGEN:INV:212800:0:1:0:0:0;JUNCTION_QUAL=999;INV5 GT:GQ:PL:PR:VF  0/1:304:999,0,301:20,54:20,54

IGV options: group by orientation; color by insert size and orientation.

The inversion events often occur with homology insertion or deletions on the flanking sequences. In the above example, the existence of homologies is exhibited by soft-clipped reads on the right side of the variant.

Translocations

Interchromosomal translocations are supported by read pairs that are aligned on separate chromosomes. In DRAGEN SV vcfs, the translocations are recorded as a pair of BNDs (breakends). The mate of each BND call is recorded in "MATEID" info field. The example below is a translocation between chr15 and ch17 in a somatic sample.

chr15   74033607    DRAGEN:BND:6731:0:1:1:0:0:1 G   G[chr17:40336055[   .   PASS    KNOWNSV=PML_RARA;CONTIGHITS=2;SVTYPE=BND;POS=74033607;MATEID=DRAGEN:BND:6731:0:1:1:0:0:0;CONTIG=GCGGC...TATTTTT;BND_DEPTH=120;MATE_BND_DEPTH=115;AF=.;NSAMP=0;MSQ=.;BLACKLIST_AF=0;BLACKLIST_VarID=.;CSQ=G[chr17|intron_variant|MODIFIER|PML|ENSG00000140464|Transcript|ENST00000268058|protein_coding||6/8||||||||||1||HGNC|HGNC:9113||q24.1   GT:PR:SR    ./.:62,29:77,46
chr17   40336052    DRAGEN:BND:6731:0:1:0:0:0:1 T   T[chr15:74033623[   .   PASS    KNOWNSV=PML_RARA;CONTIGHITS=2;SVTYPE=BND;POS=40336052;MATEID=DRAGEN:BND:6731:0:1:0:0:0:0;CONTIG=GGTGGGA...ATAACAG;BND_DEPTH=111;MATE_BND_DEPTH=114;AF=.;NSAMP=0;MSQ=.;BLACKLIST_AF=0;BLACKLIST_VarID=.;CSQ=T[chr15|intron_variant|MODIFIER|RARA|ENSG00000131759|Transcript|ENST00000254066|protein_coding||2/8||||||||||1||HGNC|HGNC:9864||q21.2,T[chr15|downstream_gene_variant|MODIFIER|RARA-AS1|ENSG00000265666|Transcript|ENST00000581080|antisense_RNA|||||||||||4814|-1||HGNC|HGNC:49577||q21.2   GT:PR:SR    ./.:62,31:81,38

The reads supporting both translocation breakends can be visualized simultaneously using the split-screen option. In the example below, the reads are grouped by "chromosome of the mate", are "viewed as pairs" and are "colored by insert size and orientation". Note that most alignments belong to the group "chr15" and "chr17" on each split screen. This indicates that there is a cluster of read pairs that are mapped to chr15 and chr17, which indicates a translocation event.

Examples of False Positives

The examples shown in this section does not include ALL false positive cases, but show some possible causes and common patterns in alignments around FPs. These patterns are usually due to noisy signals from difficult-to-map sequences or sequencing errors.

FP deletions

Example 1

In this example, we saw noisy reads with no clear break points or no significant reduction in coverage. The evidence for this breakpoint may come from fragments (highlighted in red) that are shorter than most fragments, and some mismatched bases either due to population SNPs or mapping error.

chr5  87287971  DRAGEN:DEL:102710:0:0:0:3:1  CACCATATATATACCATATATACACACACCATATATATACCATATATACACACACCATATATATAT  C  361  PASS  SVTYPE=DEL;SVLEN=-65;CIGAR=1M65D;CONTIG=...TTTA;CIPOS=0,75;HOMLEN=75;HOMSEQ=ACCATATATATACCATATATACACACACCATATATATACCATATATACACACACCATATATATATACCATATATA  GT:GQ:PL:PR:SR:SB:FS:VF  0/1:361:363,0,504:53,56:0,0:0,0,0,0:0:53,56

Example 2

This example shows an FP DEL that is 327 bases long. There is a 30-base T sequence that is covered by a significantly high number of reads, and some reads are aligned here with long soft-clips. This pattern indicates potential mapping error due to repetitive sequences. Additionally, there are also some shorter fragments. Together, they generate the evidence for the FP.

FP insertions

Example 1

The FP in this example coincides with a LINC gene, which are known to be highly repetitive and mobile. The alignment in this region is noisy.

chr14  97514289  DRAGEN:INS:246186:0:0:0:0:0  TC  TTTTTTCCTTCCTCCCTTCCTCCCTCCCTTCCTTCCTCCCCTCCTTCCTTTTTT  931  PASS  SVTYPE=INS;SVLEN=53;CIGAR=1M53I1D;CONTIG=TCAC...TCCCA  GT:GQ:PL:PR:SR:SB:FS:VF  1/1:58:936,61,0:12,25:0,23:0,0,13,10:0:12,39

Example 2

In this example, very few reads can be used as evidence compared to total coverage.

chr15   27967877    DRAGEN:INS:247853:0:0:0:0:0 A   ATCATTGGTTGGTTCTGTTCTTTCACTGACGTCCTGGAATAAGTCAGCAGCG    139 PASS    SVTYPE=INS;SVLEN=51;CIGAR=1M51I;CONTIG=TGTAA...TCTTCAATGGGGGTATTAC;CIPOS=0,129;HOMLEN=129;HOMSEQ=TCATTGG...TTTCACTG     GT:GQ:PL:PR:SR:SB:FS:VF 0/1:109:141,0,106:30,34:0,0:0,0,0,0:0:30,34

Example 3

In this example, alignments have poly-A segments (green) that are sequencing errors. The evidence was generated by noises.

chr5    161104486   DRAGEN:INS:109908:0:0:0:0:0 T   TAGAATAA...AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA    405 PASS    END=161104486;SVTYPE=INS;SVLEN=359;CIGAR=1M359I;CONTIG=TTCACATAT...TATTTATT;CIPOS=0,17;HOMLEN=17;HOMSEQ=AGAATAAAATCATTTCA   GT:GQ:PL:PR:SR:SB:FS:VF 0/1:405:407,0,486:25,9:22,10:15,7,5,5:3.583:22,11

PreviousStructural Variant De Novo Quality Scoring NextVNTR Calling

Last updated 3 months ago

Was this helpful?