kam

Alignment-free variant detection for Twist duplex UMI sequencing.

kam replaces the four-tool chain HUMID + Jellyfish + km + kmtools with a single Rust binary that preserves molecule-level evidence from raw FASTQ reads through to variant calls. It supports two operating modes: somatic discovery and tumour-informed monitoring.

Features

Alignment-free: no reference alignment required at call time.
Molecule-aware: every k-mer carries a MoleculeEvidence record with distinct molecule count, duplex support, and strand breakdown.
Two modes: discovery (all alt paths) and tumour-informed monitoring (--target-variants).
Tumour-informed monitoring: precision 1.0 at all VAF levels. Background cfDNA variants are eliminated because they do not match the expected somatic alleles.
Fast: 16--22 seconds per sample on a single core at 2 M read pairs.

Benchmark results

Evaluated on the Twist cfDNA Pan-Cancer Reference Standard v2 (24 samples, 375 truth variants, 3 concentrations, 8 VAF levels). Configuration: --max-vaf 0.35 --min-family-size 2 --target-variants.

Concentration	VAF	Sensitivity	SNV sens	Indel sens	Precision
15 ng	2%	61.3%	80.0%	38.8%	1.0
30 ng	2%	59.2%	77.1%	37.6%	1.0
5 ng	2%	51.7%	68.8%	31.2%	1.0
15 ng	0.5%	40.0%	52.7%	24.7%	1.0
30 ng	0.5%	46.1%	61.0%	28.2%	1.0
0% VAF (all)	—	0 FPs	—	—	—

Runtime: 16--22 s per sample. Peak RSS: 1.8--2.0 GB.

Installation

From crates.io

cargo install kam-bio

This installs the kam binary.

From source

Requires Rust 1.75 or later.

cargo build --release

The binary is at target/release/kam.

Usage

For detailed documentation, guides, and examples, see the User Manual.

End-to-end run (recommended)

kam run \
  --r1 sample_R1.fastq.gz \
  --r2 sample_R2.fastq.gz \
  --targets targets_100bp.fa \
  --output-dir results/

Tumour-informed monitoring mode

Supply a VCF of expected somatic variants. Only calls matching an entry in the target set are reported; all other calls are labelled NotTargeted.

kam run \
  --r1 sample_R1.fastq.gz \
  --r2 sample_R2.fastq.gz \
  --targets targets_100bp.fa \
  --output-dir results/ \
  --max-vaf 0.35 \
  --min-family-size 2 \
  --target-variants known_variants.vcf

Individual pipeline stages

The pipeline can also be run stage by stage:

# 1. Assemble molecules from raw reads.
kam assemble --r1 R1.fastq.gz --r2 R2.fastq.gz --output molecules.bin

# 2. Build a k-mer index against target sequences.
kam index --molecules molecules.bin --targets targets.fa --output index.bin

# 3. Walk de Bruijn graph paths.
kam pathfind --index index.bin --targets targets.fa --output paths.bin

# 4. Call variants from scored paths.
kam call --paths paths.bin --targets targets.fa --output calls.vcf

Key options

Flag	Default	Description
`--min-family-size N`	1	Minimum reads per UMI family. Set to 2 to remove singletons.
`--max-vaf F`	—	Discard calls above this VAF (removes germline heterozygotes).
`--min-alt-molecules N`	2	Minimum alt molecules to emit a call.
`--min-confidence F`	0.99	Minimum posterior confidence.
`--target-variants VCF`	—	Enable tumour-informed monitoring mode.

Chemistry

kam supports configurable UMI chemistries via config.toml. Presets are available for common protocols:

Preset	UMI	Skip	Duplex
`twist-umi-duplex`	5 bp	2 bp	Yes
`simplex-12bp`	12 bp	0 bp	No
`simplex-9bp`	9 bp	0 bp	No
`simplex-8bp`	8 bp	0 bp	No

See examples/ for config files covering each chemistry, and the Configuration Reference for all options.

Architecture

kam-core      — shared types: Molecule, ConsensusRead, MoleculeEvidence
kam-assemble  — molecule assembly from raw FASTQ (replaces HUMID)
kam-index     — k-mer indexing with molecule provenance (replaces Jellyfish)
kam-pathfind  — de Bruijn graph construction and path walking (replaces km)
kam-call      — statistical variant calling and tumour-informed filtering
kam           — CLI binary wiring all stages together

Development

Run tests and quality checks before committing:

cargo test
cargo clippy -- -D warnings
cargo fmt -- --check

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
docs		docs
examples		examples
kam-assemble		kam-assemble
kam-call		kam-call
kam-core		kam-core
kam-index		kam-index
kam-pathfind		kam-pathfind
kam		kam
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
claude_loop.sh		claude_loop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

kam

Features

Benchmark results

Installation

From crates.io

From source

Usage

End-to-end run (recommended)

Tumour-informed monitoring mode

Individual pipeline stages

Key options

Chemistry

Architecture

Development

Licence

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

kam

Features

Benchmark results

Installation

From crates.io

From source

Usage

End-to-end run (recommended)

Tumour-informed monitoring mode

Individual pipeline stages

Key options

Chemistry

Architecture

Development

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages