Skip to content
Kamil S. Jaron edited this page Nov 24, 2025 · 3 revisions

This subtask uses .sma file with annotation of coverages to individual smudges and the original k-mer database. The combination of the two will then be used to extract all the sequence of k-mer pairs for each of the smudges.

Example using data from the Saccharomyces tutorial

smudgeplot extract kmer_db saccharomyces_L5.sma -o example

There will be two output files example.2A2B.txt and kmer_db.3A1B.txt for k-mers from the two smudges. Each of the files is formatted as following

cg(g/a)atgcctctaataccaatc
ccgctttatta(g/a)gcgaagcgg
cg(a/g)atgcgtgaccgtatccgc
aatcataaaaaa(t/c)accggtat
aatcataaaaaaaaat(t/c)gagg
ccgctttca(g/a)tgtgctagaac
aatcataaaaatgacaac(g/a)at
ccgctttcaaagaat(t/c)gaaga
aatcataaaactcatctag(g/a)g
cag(c/t)atgatggttccaagacc

We currently provide the feature as it is, and working on its utility. You might find useful wikipage about mapping of kmers using bwa if you would like to start experimenting with this feature.

Usage

usage: smudgeplot extract [-h] [-t T] [-o O] [-tmp TMP] [--verbose] infile sma

Extract kmer pair sequences from a FastK k-mer database.

positional arguments:
  infile      Input FastK database (.ktab) file.
  sma         Input annotated k-mer pair file (.sma).

options:
  -h, --help  show this help message and exit
  -t T        Number of threads (default 4)
  -o O        The pattern used to name the output (kmerpairs).
  -tmp TMP    Directory where all temporary files will be stored (default /tmp).
  --verbose   verbose mode

Clone this wiki locally