In this post I will show how to solve some Smoove issues I came across. I’m using smoove v0.2.8 installed with conda.

1. Duphold error

fatal.nim(49)            sysFatal
Error: unhandled exception: index -1 not in 0 .. 14130 [IndexDefect]

Solution The duphold version in the conda smoove environment is v0.2.1, need to update it to version 0.2.3, which is not available on conda.
First download duphold from github and check the installation.

wget https://github.com/brentp/duphold/releases/download/v0.2.3/duphold
chmod +x ./duphold
./duphold -h # to check if it's installed correctly

Add duphold to your path so this version is the version to be used by smoove.

PATH=<directory/containing/duphold>:$PATH

Since I’m using smoove in a Snakemake pipeline, I add the previous line before my smoove command.

2. Svtyper error

unrecognized arguments: --max_ci_dist 0

Solution The svtyper in the smoove conda environment is quite old, you need to update it.

conda install svtyper=0.7.0

3. SR=0 for all variants

When mapping reads with bwa mem, if you use the -M flag, the split reads are maked as secondary and they will not be used by smoove to get split-read support. This will cause all the variants in your VCF file to have SR=0. If you don’t use split-read support in your analysis, you can run smoove as usual, if you want split-read support values in your structural variant file you can use the following solution.
This solution was created by Martijn Derks.

It requires python 2, and the pysam and argparse packages as well as samtools

Solution

module load samtools

sname=`samtools view -H <sample>.bam | grep '^@RG' | sed "s/.*SM:\([^\t]*\).*/\1/g" | uniq`

python bamgroupreads.py -f -M -i <sample>.bam | samblaster --ignoreUnmated -M -a -e -d <sample>.disc.sam -s <smoove call output dir>/<sample>.split.sam -o /dev/null

grep -v "SAMBLASTER" <smoove call output dir>/<sample>.split.sam > $sname.tmp.sam
mv $sname.tmp.sam <smoove call output dir>/<sample>.split.sam
grep -v "SAMBLASTER" <smoove call output dir>/<sample>.disc.sam > $sname.tmp.sam
mv $sname.tmp.sam <smoove call output dir>/<sample>.disc.sam

samtools sort -@ 12 -O bam <smoove call output dir>/<sample>.split.sam > <smoove call output dir>/$sname.split.bam
samtools sort -@ 12 -O bam <smoove call output dir>/<sample>.disc.sam > <smoove call output dir>/$sname.disc.bam

rm <smoove call output dir>/<sample>.split.sam <smoove call output dir>/<sample>.disc.sam

The final split and disc bam files should be in the same directory as the smoove call outdir. Smoove will then use these split and disc bam files for the smoove call step.

For an example on how to use this fix with smoove, see my Snakemake population level structural variant calling pipeline