Skip to content

trentzz/varforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VarForge

CI Crates.io License: MIT

VarForge is a fast, single-binary Rust tool for generating synthetic cancer sequencing test data with controlled ground truth. It produces realistic FASTQ and BAM files with known mutations, tumour parameters, UMI barcodes, structural variants, and cfDNA fragment profiles for benchmarking bioinformatics pipelines.


Features

Feature VarForge BAMSurgeon ART NEAT
Single binary, no runtime deps Yes No No No
Somatic mutations (SNV/indel/MNV) Yes Yes No Partial
Structural variants (DEL/INS/INV/DUP/TRA) Yes No No No
SV signatures (HRD, TDP, chromothripsis) Yes No No No
COSMIC SBS signature weighting Yes No No No
Tumour purity / clonal architecture Yes No No No
Paired tumour-normal simulation Yes Partial No No
Germline variant simulation Yes No No Yes
cfDNA fragment model Yes No No No
Long-read fragment model Yes No No No
Duplex UMI barcodes Yes No No No
FFPE / oxoG artefacts Yes No No No
Longitudinal / multi-sample series Yes No No No
Copy number alterations Yes No No No
GC bias model Yes No Partial No
Hybrid-capture / amplicon model Yes No No No
Microsatellite instability (MSI) Yes No No No
Truth VCF output Yes Partial No Yes
YAML configuration Yes No No No

Installation

From crates.io

cargo install varforge

From source

git clone https://github.com/varforge/varforge
cd varforge
cargo build --release
./target/release/varforge --help

Requires Rust 1.74 or later. No C libraries are needed. The entire dependency stack is pure Rust.


Quickstart

The following example generates a 30x WGS tumour sample with 5000 random somatic mutations.

1. Create a config file (quickstart.yaml):

reference: /data/ref/hg38.fa

output:
  directory: out/quickstart
  fastq: true
  truth_vcf: true

sample:
  name: TUMOUR
  coverage: 30.0

tumour:
  purity: 0.70

mutations:
  random:
    count: 5000
    vaf_min: 0.05
    vaf_max: 0.60
    snv_fraction: 0.80
    indel_fraction: 0.15
    mnv_fraction: 0.05

seed: 42

2. Validate the config:

varforge validate --config quickstart.yaml

3. Run the simulation:

varforge simulate --config quickstart.yaml

4. Inspect the output:

out/quickstart/
  TUMOUR_R1.fastq.gz       # Read 1 FASTQ
  TUMOUR_R2.fastq.gz       # Read 2 FASTQ
  truth.vcf.gz             # Ground-truth VCF with all injected variants
  manifest.tsv             # Sample metadata (name, coverage, purity, paths)

CLI overrides allow any config value to be changed at the command line:

varforge simulate --config quickstart.yaml \
    --coverage 60 --purity 0.5 --seed 99

Variable substitution lets configs use placeholders resolved at runtime:

reference: ${reference}
varforge simulate --config quickstart.yaml --set reference=/data/ref/hg38.fa

Presets skip the config entirely for common scenarios:

varforge simulate --config quickstart.yaml --preset wgs
varforge simulate --config quickstart.yaml --preset cancer:melanoma

CLI Reference

varforge [OPTIONS] <COMMAND>

Commands:
  simulate         Run a simulation from a YAML config
  validate         Validate a YAML config without running
  edit             Spike variants into an existing BAM file
  learn-profile    Learn an error/quality profile from a real BAM file
  benchmark-suite  Run a VAF x coverage benchmark grid

Global options:
  -t, --threads <N>          Number of threads (default: all available cores)
      --log-level <LEVEL>    error | warn | info | debug | trace (default: info)

simulate

varforge simulate --config <FILE> [OPTIONS]

Options:
  -c, --config <FILE>           Path to YAML configuration file (required)
  -o, --output-dir <DIR>        Override output directory
      --seed <N>                Override random seed
      --coverage <F>            Override coverage depth (x)
      --read-length <N>         Override read length (bp)
      --purity <F>              Override tumour purity (0.0-1.0)
      --fragment-mean <F>       Override fragment mean length (bp)
      --fragment-sd <F>         Override fragment length standard deviation (bp)
      --random-mutations <N>    Generate N random mutations (no VCF needed)
      --vaf-range <MIN-MAX>     VAF range for random mutations (e.g. 0.001-0.05)
      --preset <NAME>           Apply a named preset (see Presets section)
      --set <KEY=VALUE>         Set config variables; replaces ${key} in YAML (repeatable)
      --list-presets            List all available presets and exit
      --dry-run                 Validate config and estimate output size only

validate

varforge validate --config <FILE>

Parses the YAML config and checks all fields for consistency. Exits with status 0 if valid, non-zero otherwise with a descriptive error message.

edit

varforge edit --bam <IN.bam> --vcf <VARIANTS.vcf> --output <OUT.bam>

Spikes variants from a VCF directly into an existing BAM file without re-simulating reads. Useful for adding a handful of known mutations to a real or previously simulated dataset.

learn-profile

varforge learn-profile --bam <BAM> --output <PROFILE.json>

Learns an empirical base-quality and error profile from a real BAM file. The resulting JSON can be referenced from the quality.profile_path config field to produce reads with a realistic, data-driven quality model instead of the parametric default.

benchmark-suite

varforge benchmark-suite --config <FILE> --vafs 0.01,0.05,0.1 --coverages 100,500,1000

Runs a grid of simulations across specified VAF and coverage values. Each combination produces its own output directory. Useful for generating sensitivity curves and limit-of-detection analyses.


Configuration Reference

All simulation parameters are specified in a YAML file. Only reference and output.directory are required. Everything else has a sensible default.

Top-level fields

Field Type Default Description
reference path (required) Path to FASTA reference genome
output OutputConfig (required) Output format and directory
sample SampleConfig see below Read generation parameters
fragment FragmentConfig see below Insert size distribution
quality QualityConfig see below Base quality model
tumour TumourConfig null Tumour purity and clonal architecture
mutations MutationConfig null Somatic mutation injection
umi UmiConfig null UMI barcode configuration
artifacts ArtifactConfig null Sequencing artefact simulation
copy_number list of CopyNumberConfig null Copy number alterations
gc_bias GcBiasConfig null GC content coverage bias
capture CaptureConfig null Hybrid-capture or amplicon enrichment model
germline GermlineConfig null Germline SNP/indel simulation
paired PairedConfig null Matched tumour-normal pair mode
samples list of SampleEntry null Multi-sample / longitudinal series
chromosomes list of strings null (all) Restrict simulation to named chromosomes
regions_bed path null Restrict simulation to BED file regions
vafs list of floats null Batch mode: one run per VAF value
preset string null Chemistry preset name (applied before YAML values)
performance PerformanceConfig see below Streaming pipeline tuning
seed integer null (random) Random seed for reproducibility
threads integer null (all cores) Worker thread count

output

output:
  directory: out/my_run   # required
  fastq: true             # write gzip-compressed FASTQ files (default: true)
  bam: false              # write coordinate-sorted BAM (default: false)
  truth_vcf: true         # write ground-truth VCF (default: true)
  germline_vcf: true      # write germline truth VCF when germline is enabled (default: true)
  manifest: true          # write manifest.tsv (default: true)
  single_read_bam: false  # single-read BAM for long-read platforms (default: false)
  mapq: 60                # mapping quality for BAM records (default: 60)

sample

sample:
  name: TUMOUR_01       # sample name used in file names and read headers (default: SAMPLE)
  read_length: 150      # read length in bp (default: 150)
  coverage: 30.0        # mean target coverage depth (default: 30.0)
  platform: illumina    # sequencing platform tag; written to BAM @RG header (optional)

fragment

Controls the insert size (fragment length) distribution.

fragment:
  model: normal    # normal | cfda (default: normal)
  mean: 300.0      # mean fragment length in bp (default: 300.0)
  sd: 50.0         # standard deviation in bp (default: 50.0)

Fragment models:

  • normal: Gaussian distribution. Suitable for standard library prep from fresh-frozen tissue or cell lines.
  • cfda: Short, nucleosome-phased distribution reflecting cell-free DNA in plasma. Typical mean ~167 bp with mononucleosomal and dinucleosomal peaks and 10 bp periodicity from nucleosome positioning.

cfDNA-specific options (only apply when model: cfda):

fragment:
  model: cfda
  ctdna_fraction: 0.05     # fraction of tumour-derived shorter fragments (default: derived from purity)
  mono_sd: 20.0            # SD of mononucleosomal peak in bp (default: 20.0)
  di_sd: 30.0              # SD of dinucleosomal peak in bp (default: 30.0)
  end_motif_model: plasma  # plasma cfDNA 4-mer end motif rejection sampling (optional)

Long-read fragment model (PacBio, Nanopore):

fragment:
  long_read:
    mean: 15000    # mean length in bp (default: 15000)
    sd: 5000       # standard deviation in bp (default: 5000)
    min_len: 1000  # minimum fragment length (default: 1000)
    max_len: 100000  # maximum fragment length (default: 100000)

When long_read is set, the log-normal sampler is used instead of the normal or cfDNA sampler. Combine with output.single_read_bam: true for realistic long-read BAM output.


quality

quality:
  mean_quality: 36        # mean Phred quality score for the first cycle (default: 36)
  tail_decay: 0.003       # per-cycle quality decay rate (default: 0.003)
  profile_path: null      # optional path to empirical profile JSON from learn-profile

If profile_path is set, the empirical profile overrides mean_quality and tail_decay.


tumour

tumour:
  purity: 0.70    # fraction of cells that are tumour (0.0-1.0; default: 1.0)
  ploidy: 2       # tumour ploidy (default: 2)
  msi: false      # microsatellite instability mode (default: false)
  clones:         # optional list of clones for subclonal architecture
    - id: trunk
      ccf: 1.0           # cancer cell fraction (0.0-1.0)
    - id: subclone_a
      ccf: 0.40
      parent: trunk      # parent clone ID (optional; omit for founding clone)

When clones is empty, all mutations are assigned to a single clonal population at the specified purity. When clones are defined, mutations are distributed across the clone tree and their effective VAF is:

VAF = purity x CCF / ploidy

When msi: true, indel rates at homopolymer and dinucleotide repeat loci are elevated to simulate MSI-high tumours.


mutations

mutations:
  vcf: /path/to/variants.vcf.gz   # optional: inject specific variants from VCF
  random:                          # optional: add random somatic mutations
    count: 5000                    # number of mutations to generate
    vaf_min: 0.001                 # minimum VAF (default: 0.001)
    vaf_max: 0.50                  # maximum VAF (default: 0.5)
    snv_fraction: 0.80             # fraction that are SNVs (default: 0.80)
    indel_fraction: 0.15           # fraction that are indels (default: 0.15)
    mnv_fraction: 0.05             # fraction that are MNVs (default: 0.05)
    signature: SBS7a               # COSMIC SBS signature for weighted base selection (optional)
  sv_signature: HRD                # SV signature: HRD, TDP, or CHROMOTHRIPSIS (optional)
  sv_count: 10                     # number of SVs to generate for the signature (default: 10)
  include_driver_mutations: false  # inject driver mutations from cancer preset (default: false)

snv_fraction + indel_fraction + mnv_fraction must sum to exactly 1.0.

Both vcf and random may be specified simultaneously. VCF variants are injected first, then random mutations are added at non-overlapping positions.

SV signatures generate structural variants with biologically realistic size and type distributions:

  • HRD: large deletions (100 kbp to 10 Mbp) characteristic of homologous recombination deficiency.
  • TDP: short tandem duplications (1 kbp to 10 kbp) characteristic of the tandem duplicator phenotype.
  • CHROMOTHRIPSIS: clustered rearrangements (deletions, inversions, duplications) on a single chromosome.

COSMIC SBS signatures weight the alternate base selection by trinucleotide context probabilities from the COSMIC catalogue. For example, signature: SBS7a produces UV-type C>T mutations in dipyrimidine contexts typical of melanoma.


umi

umi:
  length: 8              # UMI barcode length in bases (default: 8)
  duplex: false          # enable duplex (double-stranded) UMI mode (default: false)
  pcr_cycles: 10         # number of PCR amplification cycles (default: 10)
  family_size_mean: 3.0  # mean read family size (default: 3.0)
  family_size_sd: 1.5    # standard deviation of family size (default: 1.5)
  inline: true           # prepend UMI to read sequence (default: false)

When inline: true, the UMI is prepended to the read sequence (e.g. for fgbio ExtractUmisFromBam). When inline: false, the UMI is written into the read name (e.g. @READ:ACGTACGT).

When duplex: true, each molecule is tagged with a strand-specific UMI pair supporting duplex consensus calling tools such as fgbio CallDuplexConsensusReads.


artifacts

artifacts:
  ffpe_damage_rate: 0.02    # C>T deamination rate (0.0-1.0; null = disabled)
  oxog_rate: 0.01           # 8-oxoG C>A transversion rate (0.0-1.0; null = disabled)
  duplicate_rate: 0.15      # PCR duplicate fraction (0.0-1.0; null = disabled)
  pcr_error_rate: 0.001     # PCR substitution error rate per base (null = disabled)

All fields are optional. Omit the entire artifacts block (or set individual rates to null) to disable artefact simulation.


copy_number

copy_number:
  - region: "chr7:55000000-55200000"   # chrom:start-end (1-based, inclusive)
    tumor_cn: 4                        # tumour copy number (default: 2)
    normal_cn: 2                       # normal copy number (default: 2)
    major_cn: 3                        # major allele CN for LOH modelling (optional)
    minor_cn: 1                        # minor allele CN for LOH modelling (optional)

Multiple entries may be listed. Overlapping regions are applied in order (last wins). Read depth in each region is scaled proportionally to tumor_cn / normal_cn.


gc_bias

gc_bias:
  enabled: true      # apply GC bias model (default: true when block is present)
  model: default     # default | flat | custom (default: "default")
  severity: 1.0      # bias multiplier: 0 = none, 1 = realistic, 2 = extreme (default: 1.0)

The default model applies an empirical coverage reduction at GC extremes (< 30 % or > 70 % GC). Setting severity: 0 disables the effect while keeping the block present.


capture

capture:
  enabled: true                             # activate capture model (default: true)
  mode: panel                               # panel | amplicon (default: panel)
  targets_bed: /data/panels/panel.bed       # path to capture target BED (optional)
  off_target_fraction: 0.20                 # fraction of reads mapping off-target (default: 0.2)
  coverage_uniformity: 0.30                 # per-target LogNormal sigma (0 = uniform; default: 0.3)
  edge_dropoff_bases: 50                    # exponential dropoff at target edges in bp (default: 50)
  primer_trim: 0                            # bases to trim from read ends in amplicon mode (default: 0)
  coverage_cv_target: 0.25                  # warn if achieved CV exceeds this (optional)
  on_target_fraction_target: 0.95           # warn if on-target fraction falls below this (optional)

In panel mode, reads are distributed across targets with off-target spillover. In amplicon mode, fragments exactly span each target region with no off-target reads. When targets_bed is omitted, the capture model distributes reads uniformly across whichever chromosomes or regions are active.


germline

germline:
  het_snp_density: 0.6     # heterozygous SNPs per kbp (default: 0.6)
  hom_snp_density: 0.3     # homozygous SNPs per kbp (default: 0.3)
  het_indel_density: 0.05   # heterozygous indels per kbp (default: 0.05)
  vcf: /path/to/germline.vcf  # use specific germline variants instead of random (optional)

Germline variants are assigned VAF 0.5 (heterozygous) or 1.0 (homozygous). They appear in the separate germline_truth.vcf output.


paired

paired:
  normal_coverage: 30.0                   # coverage for the normal sample (default: 30.0)
  normal_sample_name: NORMAL              # sample name for normal output (default: NORMAL)
  tumour_contamination_in_normal: 0.0     # tumour contamination in normal (0.0-1.0; default: 0.0)

When present, VarForge runs two simulations: one tumour sample (with all somatic and germline variants) and one normal sample (germline only). Outputs are written to tumour/ and normal/ sub-directories under output.directory.


samples (multi-sample / longitudinal)

When samples is present, VarForge generates one output sub-directory per entry and a combined manifest.tsv. Each entry shares the top-level reference, mutations, tumour, and fragment settings but can override coverage, tumour fraction, and fragment model independently.

samples:
  - name: timepoint_1
    coverage: 1000.0
    tumour_fraction: 0.05        # ctDNA fraction for this sample (default: 1.0)
    fragment_model: cfda         # override fragment model (optional)
    clonal_shift:                # per-clone CCF adjustments at this timepoint (optional)
      subclone_a: 0.10
  - name: timepoint_2
    coverage: 1000.0
    tumour_fraction: 0.002
    fragment_model: cfda

performance

performance:
  output_buffer_regions: 64    # max region batches buffered in streaming channel (default: 64)

Higher values use more memory but provide better overlap between compute and I/O. Lower values reduce peak memory.


Presets Reference

Presets are named configuration bundles that set sensible defaults for common scenarios. A preset is applied before the YAML config values, so explicit YAML fields always win.

varforge simulate --config base.yaml --preset <NAME>
varforge simulate --list-presets

Built-in presets

Preset Coverage Fragment Mutations UMI Notes
small 1x normal 100 random (chr22 only) no Smoke test; completes in ~30 s
panel 500x normal 50 random inline 8-mer Targeted panel benchmarking
wgs 30x normal 5 000 random no Whole-genome variant calling
cfdna 200x cfda (167 bp) 200 random, VAF 0.1-5% duplex Liquid biopsy simulation
ffpe 30x normal 500 random no FFPE artefacts enabled
umi 1 000x normal 50 random duplex 9-mer High-depth duplex consensus

Cancer-type presets

Cancer presets are accessed with the cancer: namespace prefix. Each preset sets biologically realistic mutation counts, VAF ranges, tumour purity, and mutation-type fractions based on published COSMIC mutational signatures.

varforge simulate --config base.yaml --preset cancer:melanoma
Preset Cancer Dominant Signature Typical TMB Purity
cancer:lung_adeno Lung adenocarcinoma SBS4 (smoking, C>A) ~8 mut/Mb 60%
cancer:colorectal Colorectal (MSS) SBS1/SBS5 (aging, C>T) ~5 mut/Mb 65%
cancer:breast_tnbc Triple-negative breast SBS3 (HRD, flat) ~5 mut/Mb 55%
cancer:melanoma Cutaneous melanoma SBS7a/b (UV, C>T) ~30 mut/Mb 70%
cancer:aml Acute myeloid leukaemia SBS1/SBS5 (aging) ~1 mut/Mb 80%
cancer:prostate Prostate adenocarcinoma SBS1/SBS5 (aging) ~2 mut/Mb 50%
cancer:pancreatic Pancreatic ductal SBS1/SBS5 (aging) ~3 mut/Mb 25%
cancer:glioblastoma Glioblastoma (IDH-wt) SBS1/SBS5 (aging) ~4 mut/Mb 65%

Example Configs

Ready-to-use example configs are in the examples/ directory.

File Use case
examples/minimal.yaml Simplest possible simulation (defaults only)
examples/wgs_30x.yaml Standard 30x WGS tumour with random mutations
examples/panel_umi.yaml Targeted panel with inline 8-mer UMI
examples/cfdna_monitoring.yaml cfDNA longitudinal series (4 timepoints)
examples/ffpe_artifacts.yaml FFPE-damaged tumour sample
examples/tumor_normal.yaml Matched tumour/normal pair
examples/subclonal.yaml Four-clone tumour with copy number alterations
examples/high_depth.yaml 1000x duplex UMI for low-VAF detection
examples/custom_mutations.yaml Inject specific variants from a VCF file
examples/twist_duplex_benchmark.yaml Twist duplex capture panel with HRD SVs

Output Formats

FASTQ headers

Read names follow the format:

@{SAMPLE}:{CHROM}:{POS}:{READ_NUM}[:UMI={BARCODE}]

Example: @TUMOUR:chr7:55191822:1:UMI=ACGTACGT

The UMI suffix is only present when a umi block is configured and inline: false.

BAM tags

When BAM output is enabled (output.bam: true), the following non-standard tags are written:

Tag Type Description
RX Z Raw UMI sequence
MI Z Molecule ID (read family identifier)
tp i 1 if the read carries a simulated somatic variant, 0 otherwise
cl Z Clone ID the variant was assigned to

Truth VCF fields

The truth VCF written to truth.vcf.gz uses the following INFO fields:

Field Description
VAF Target variant allele frequency
CLONE Clone ID the variant was assigned to
CCF Cancer cell fraction of the assigned clone
TYPE Variant type: SNV, INDEL, MNV, or SV

Use Case Recipes

Benchmarking a somatic variant caller

# Generate matched tumour/normal with known variants.
reference: /data/ref/hg38.fa
output:
  directory: out/caller_bench
  bam: true
  truth_vcf: true
samples:
  - name: TUMOUR
    coverage: 60.0
    tumour_fraction: 1.0
  - name: NORMAL
    coverage: 30.0
    tumour_fraction: 0.0
tumour:
  purity: 0.65
mutations:
  random:
    count: 1000
    vaf_min: 0.05
    vaf_max: 0.60
    snv_fraction: 0.80
    indel_fraction: 0.15
    mnv_fraction: 0.05
seed: 42

Run your caller against TUMOUR/ with NORMAL/ as the matched normal. Evaluate with:

bcftools stats --apply-filters PASS \
    caller_output.vcf.gz truth.vcf.gz

Benchmarking UMI deduplication (fgbio)

Use examples/panel_umi.yaml with inline: true and then pipe through fgbio:

fgbio ExtractUmisFromBam \
    --input out/panel_umi/PANEL_UMI.bam \
    --output extracted.bam \
    --read-structure 8M+T 8M+T

fgbio GroupReadsByUmi \
    --input extracted.bam \
    --output grouped.bam \
    --strategy paired

fgbio CallMolecularConsensusReads \
    --input grouped.bam \
    --output consensus.bam \
    --min-reads 1

Liquid biopsy sensitivity curve

Generate cfDNA samples at multiple tumour fractions and measure detection rate at each:

for TF in 0.10 0.05 0.01 0.005 0.001; do
    varforge simulate --config examples/cfdna_monitoring.yaml \
        --output-dir out/tf_${TF} \
        --purity ${TF} \
        --seed 42
done

SV signature benchmarking

Generate data with HRD-type structural variants for SV caller evaluation:

mutations:
  random:
    count: 500
    snv_fraction: 0.80
    indel_fraction: 0.15
    mnv_fraction: 0.05
  sv_signature: HRD
  sv_count: 20

FFPE artefact filter development

Use examples/ffpe_artifacts.yaml to generate data with realistic FFPE damage, then evaluate your artefact filter:

  • True positives: variants present in truth.vcf.gz
  • Artefacts: C>T calls not in the truth VCF with strand bias (use the tp BAM tag to distinguish)

Performance

VarForge uses a streaming architecture: rayon workers generate reads in parallel and send them through a bounded crossbeam channel to a dedicated writer thread. Memory scales with channel depth, not dataset size.

Thread count

varforge simulate --config cfg.yaml --threads 16

Or set in the config:

threads: 16

Restricting scope

For development and testing, restrict to one or a few chromosomes:

chromosomes:
  - chr22

Or to a BED file of target regions:

regions_bed: /data/panels/hotspot_panel.bed

Approximate runtimes (8-core laptop, hg38)

Scenario Coverage Mutations Time
small preset 1x, chr22 100 ~30 s
Panel (chr7, chr12, chr17) 500x 50 ~2 min
WGS 30x 30x, all 5 000 ~25 min
WGS 60x 60x, all 5 000 ~50 min
Ultra-deep panel 1000x 1000x, 3 chroms 50 ~8 min

Memory usage

Peak memory is approximately:

2 x read_length x threads x (coverage / 30) MB

For 30x WGS with 150 bp reads on 8 threads: ~600 MB. For 1000x panel on 8 threads: ~200 MB (limited region).


Comparison with Other Tools

Scenario Recommended tool
Realistic Illumina base-quality profiles VarForge (parametric or learn-profile)
Controlled somatic variant spike-in VarForge or BAMSurgeon
Spike into a real patient BAM BAMSurgeon (preserves real read background)
Simple read generation, no mutations ART or NanoSim
Whole-genome de novo simulation NEAT or VarForge
cfDNA / liquid biopsy data VarForge (only tool with native cfDNA model)
UMI-tagged duplex sequencing VarForge (only tool with native duplex model)
FFPE artefact simulation VarForge (only tool with FFPE + oxoG model)
Multi-sample longitudinal series VarForge (only tool with native time-series)
Structural variant signatures VarForge (only tool with HRD/TDP/chromothripsis models)
Long-read data VarForge (log-normal fragment model) or NanoSim

VarForge is the right choice when you need a complete, reproducible, ground-truth dataset for pipeline benchmarking and do not have access to real patient sequencing data. BAMSurgeon is the better choice when you need to insert a small number of variants into an existing real-data BAM while preserving the authentic read background.


Licence

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors