Alignment-free variant detection for Twist duplex UMI sequencing.
kam replaces the four-tool chain HUMID + Jellyfish + km + kmtools with a single Rust binary that preserves molecule-level evidence from raw FASTQ reads through to variant calls. It supports two operating modes: somatic discovery and tumour-informed monitoring.
- Alignment-free: no reference alignment required at call time.
- Molecule-aware: every k-mer carries a
MoleculeEvidencerecord with distinct molecule count, duplex support, and strand breakdown. - Two modes: discovery (all alt paths) and tumour-informed monitoring
(
--target-variants). - Tumour-informed monitoring: precision 1.0 at all VAF levels. Background cfDNA variants are eliminated because they do not match the expected somatic alleles.
- Fast: 16--22 seconds per sample on a single core at 2 M read pairs.
Evaluated on the Twist cfDNA Pan-Cancer Reference Standard v2 (24 samples,
375 truth variants, 3 concentrations, 8 VAF levels).
Configuration: --max-vaf 0.35 --min-family-size 2 --target-variants.
| Concentration | VAF | Sensitivity | SNV sens | Indel sens | Precision |
|---|---|---|---|---|---|
| 15 ng | 2% | 61.3% | 80.0% | 38.8% | 1.0 |
| 30 ng | 2% | 59.2% | 77.1% | 37.6% | 1.0 |
| 5 ng | 2% | 51.7% | 68.8% | 31.2% | 1.0 |
| 15 ng | 0.5% | 40.0% | 52.7% | 24.7% | 1.0 |
| 30 ng | 0.5% | 46.1% | 61.0% | 28.2% | 1.0 |
| 0% VAF (all) | — | 0 FPs | — | — | — |
Runtime: 16--22 s per sample. Peak RSS: 1.8--2.0 GB.
cargo install kam-bioThis installs the kam binary.
Requires Rust 1.75 or later.
cargo build --releaseThe binary is at target/release/kam.
For detailed documentation, guides, and examples, see the User Manual.
kam run \
--r1 sample_R1.fastq.gz \
--r2 sample_R2.fastq.gz \
--targets targets_100bp.fa \
--output-dir results/Supply a VCF of expected somatic variants. Only calls matching an entry in the
target set are reported; all other calls are labelled NotTargeted.
kam run \
--r1 sample_R1.fastq.gz \
--r2 sample_R2.fastq.gz \
--targets targets_100bp.fa \
--output-dir results/ \
--max-vaf 0.35 \
--min-family-size 2 \
--target-variants known_variants.vcfThe pipeline can also be run stage by stage:
# 1. Assemble molecules from raw reads.
kam assemble --r1 R1.fastq.gz --r2 R2.fastq.gz --output molecules.bin
# 2. Build a k-mer index against target sequences.
kam index --molecules molecules.bin --targets targets.fa --output index.bin
# 3. Walk de Bruijn graph paths.
kam pathfind --index index.bin --targets targets.fa --output paths.bin
# 4. Call variants from scored paths.
kam call --paths paths.bin --targets targets.fa --output calls.vcf| Flag | Default | Description |
|---|---|---|
--min-family-size N |
1 | Minimum reads per UMI family. Set to 2 to remove singletons. |
--max-vaf F |
— | Discard calls above this VAF (removes germline heterozygotes). |
--min-alt-molecules N |
2 | Minimum alt molecules to emit a call. |
--min-confidence F |
0.99 | Minimum posterior confidence. |
--target-variants VCF |
— | Enable tumour-informed monitoring mode. |
kam supports configurable UMI chemistries via config.toml. Presets are
available for common protocols:
| Preset | UMI | Skip | Duplex |
|---|---|---|---|
twist-umi-duplex |
5 bp | 2 bp | Yes |
simplex-12bp |
12 bp | 0 bp | No |
simplex-9bp |
9 bp | 0 bp | No |
simplex-8bp |
8 bp | 0 bp | No |
See examples/ for config files covering each chemistry, and the Configuration Reference for all options.
kam-core — shared types: Molecule, ConsensusRead, MoleculeEvidence
kam-assemble — molecule assembly from raw FASTQ (replaces HUMID)
kam-index — k-mer indexing with molecule provenance (replaces Jellyfish)
kam-pathfind — de Bruijn graph construction and path walking (replaces km)
kam-call — statistical variant calling and tumour-informed filtering
kam — CLI binary wiring all stages together
Run tests and quality checks before committing:
cargo test
cargo clippy -- -D warnings
cargo fmt -- --checkMIT