SMN Caller
Last updated
Was this helpful?
Last updated
Was this helpful?
Disruption of all copies of the SMN1 gene in an individual causes spinal muscular atrophy (SMA). SMN1 has a high identity paralog, SMN2. SMN2 differs only in approximately 10 SNVs and small indels. For example, hg19 chr5:70247773 C->T affects splicing and largely disrupts the production of functional SMN protein from SMN2. Due to the high-similarity duplication combined with common-copy number variation, standard whole-genome sequencing (WGS) analysis does not produce complete variant calling results for SMN. Since 95% of SMA cases result from the absence of the functional C (SMN1) allele in any copy of SMN¹, a targeted calling solution can be effective in detecting SMA.
DRAGEN offers the following two independent components that can call the SMN1 copy number from a germline sample.
DRAGEN-STR
SMN Caller
SMA calling is implemented together with repeat expansion detection using sequence-graph realignment to align reads to a single reference that represents SMN1 and SMN2.
In addition to the standard diploid genotype call, SMA Calling with DRAGEN-STR uses a direct statistical test to check for presence of any C allele. If a C allele is not detected, the sample is called affected, otherwise unaffected.
SMA calling is only supported for human whole-genome sequencing with PCR-free libraries.
To enable SMA calling along with repeat expansion detection, set the --repeat-genotype-enable
option to true
. For information on graph-alignment options, see .
To activate SMA calling, the variant specification catalog file must include a description of the targeted SMN1/SMN2 variant. The <INSTALL_PATH>/resources/repeat-specs/experimental
folder contains example files.
The <output-file-prefix>.repeat.vcf
file includes SMN output along with any targeted repeats. SMN output is represented as a single SNV call at the splice-affecting position in SMN1 with SMA status in the following custom fields.
VARID
SMN marks the SMN call.
GT
Genotype call at this position using a normal (diploid) genotype model.
DST
SMA status call: + indicates detected - indicates undetected ? indicates undetermined.
AD
Total read counts supporting the C and T allele.
RPL
Log10 likelihood ratio between the unaffected and affected models. Positive scores indicate the unaffected model is more likely.
The SMN Caller calls SMN1 and SMN2 copy numbers and detects the presence of a SNP, NM_000344.4:c.*3+80T>G
that is associated with the two-copy SMN1 allele. The caller is derived from the method implemented in Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data.²
The SMN Caller performs the following steps:
Determines total and intact SMN copy numbers
Calls SMN1 copy number at eight differentiating sites
Determines copy number for NM_000344.4:c.*3+80T>G
Two common copy-number variants (CNVs) in SMN1 and SMN2 include whole gene CNV and a partial gene deletion of exons 7 and 8. Reads that align to either SMN1 or SMN2 are counted. The read counts in exon 1 through exon 6 are used to determine total SMN copy number. The read counts in exon 7 and 8 are used to determine the SMN copies that do not have the exon 7 and 8 deletion (intact SMN copy number). To estimate the SMN copy number for these two regions, read counts are normalized to a diploid baseline derived from 3000 preselected 2 kb regions across the genome. The 3000 normalization regions are randomly selected from the portion of the reference genome that has stable coverage across population samples. The SMN Caller then calculates the number of SMN copies that have the exon 7 and 8 deletion by subtracting the intact SMN copy number from the total SMN copy number.
To calculate the SMN1 copy number, the caller uses eight predefined differentiating sites in exons 7 and 8 of SMN1 and SMN2. One of these sites is the splice site variant used for SMA calling with DRAGEN-STR (see SMA Calling With DRAGEN-STR). The caller selects differentiating sites at positions that have sequence differences between SMN1 and SMN2 where calling the SMN1 copy number is most likely to be correct based on sequencing data from the 1000 Genomes Project.
For each differentiating site, the SMN1-specific and SMN2-specific alleles are counted in reads mapping to either SMN1 or the homologous region in SMN2. The caller uses a binomial model to calculate the likelihood of each possible SMN1 copy number from the two gene-specific counts given the intact SMN copy number calculated in the previous step.
NM_000344.4:c.*3+80T>G
For this high-homology region SNP, reads mapping to either SMN1 or SMN2 are used for variant calling. The number of reads containing the variant allele and the nonvariant allele are counted and then a binomial model that incorporates the sequencing error rate is used to determine the most likely variant allele copy number (0 for nonvariant).
For SMN caller, the fields are defined as follows.
fullLengthCopyNumber
Copy number of intact SMN (exons 7 & 8)
nonnegative integer
totalCopyNumber
Copy number of total SMN (exons 1 to 6)
nonnegative integer
smn1CopyNumber
Copy number of intact SMN1
nonnegative integer or null for no-call
smn2CopyNumber
Copy number of intact SMN2
nonnegative integer or null for no-call
smn2Delta78CopyNumber
Copy number of SMN2Δ7–8 (deletion of exon 7 and 8)
nonnegative integer
fullLengthCopyNumberFloat
Raw normalized depth of intact SMN (exons 7 & 8)
string representing nonnegative floating point number
totalCopyNumberFloat
Raw normalized depth of total SMN (exons 1 to 6)
string representing nonnegative floating point number
variants
a json array containing info about specific SMN variants
json-array
Each variant reported in the variants
array will have the fields below.
alleleId
HGVS identifier of the variant allele
string
alleleCopyNumber
Copy number of the allele in the called genotype
nonnegative integer
genotypeQuality
Phred-scaled quality for the called genotype
nonnegative integer
filter
Filter for the called genotype
string. "PASS" when not filtered
¹Wirth B. An update of the mutation spectrum of the survival motor neuron gene (SMN1) in autosomal recessive spinal muscular atrophy (SMA). Human Mutation. 2000;15(3):228-237. doi:10.1002/(SICI)1098-1004(200003)15:3<228::AID-HUMU3>3.0.CO;2-9
²Chen X, Sanchis-Juan A, French CE, et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genetics in Medicine. 2020;22(5):945-953. doi:10.1038/s41436-020-0754-0
For information about enabling the SMN caller see .
The SNP (also referred to as g.27134T>G) has been reported in the literature to be associated with the two-copy SMN1 allele.
The SMN Caller prints out its calls in the targeted caller output file, <output-file-prefix>.targeted.json
that also contains calls from other targets (see ). An example of the SMN caller content in this file is shown below.
The variant NM_000344.4:c.*3+80T>G
is also reported in VCF format. See for details about how these variants are reported in VCF.