Skip to content

Latest commit

 

History

History
55 lines (51 loc) · 3.55 KB

File metadata and controls

55 lines (51 loc) · 3.55 KB

Progress Log

2026-04-01

  • Reviewed docs/superpowers/plans/2026-03-31-m7-3-m7-5-benchmark-eval-foundation.md and corrected the plan steps for:
    • deterministic evaluation accuracy calculation
    • handleRunEvaluation reply wiring so evaluationResults is returned together with evaluationJob
    • evaluation artifact persistence on a fresh jobs_root
    • touched-scope coverage commands so benchmark persistence paths are included
  • Verification summary for the M7.3-M7.5 plan update:
    • make proto: pass
    • pytest touched-scope Python suite: 50 passed
    • scratch-path Swift test for ControlPlaneServiceTests/executeHandlesOpsRunEvaluationThroughTheModelOperationsWorker: pass
  • Metrics report:
    • changed-line coverage for the touched Python scope: N/A
    • reason: the current uncommitted change set for this review transaction is documentation-only, so scripts/python_changed_line_coverage.py reported TOTAL 100.00% 0/0 and exited non-zero because there were no measurable changed Python lines

2026-03-31

  • Audited M6 implementation against child plans.
  • Confirmed Python quantization benchmark, gate, and focused test suite pass with explicit PYTHONPATH.
  • Identified remaining work for M6 closure:
    • benchmark evidence gap for active KV and sparse prefill
    • runbook gap for sparse-prefill verification
    • lock-scope semantics gap for family or protected-scope conflicts
  • Added docs/plans/2026-03-31-m6-completion-closure.md.
  • Added docs/runbooks/m6-acceleration-benchmarks.md.
  • Added Python tests for:
    • linked quantized-artifact upload conflict locking
    • sparse-prefill metrics exposure in phase2_metrics_report.py
    • sparse-prefill probe collection in the Phase 2 direct worker report
  • Updated quantization manifests to carry protected_scope metadata.
  • Updated upload conflict locking to use linked quantization identity before falling back to raw artifact paths.
  • Extended scripts/phase2_metrics_report.py with a prefill_sparse probe and sparse-prefill counters in the output.
  • Verification summary:
    • pytest focused M6 Python suite: 39 passed
    • scripts/quantization_benchmarks.py --json: profile_count = 7, smoke_pass_rate = 100.0
    • scripts/quantization_release_gate.py --json: passed = true
    • scripts/phase5_model_ops_metrics.py: quantize job_ms=0.965, artifact_bytes=670, manifest_bytes=1923
    • live make phase2-metrics --json with MELIX_RUNTIME_DIR=.runtime/m6-phase2:
      • decode_active_kv_quantized.active_kv_quantization_ratio = 25
      • decode_active_kv_quantized.tokens_per_second = 41.22
      • prefill_sparse.sparse_prefill_accepted_skip_count = 1
      • prefill_sparse.accelerated_prefill_gain_pct = 83
  • Committed M6 closure as 2f270b9 (feat: close m6 acceleration completion gaps).
  • Began M7 with docs/plans/2026-03-31-m7-1-m7-2-benchmark-schema-foundation.md.
  • Landed initial M7 foundation changes in the working tree:
    • typed benchmark and evaluation schema messages in control-plane proto
    • Python benchmark schema helpers under worker/productization/benchmark_schemas.py
    • release-gate benchmark evidence now carries structured job and results
    • control-plane ops.run_bench now assembles typed benchmark job and result payloads
  • Verification so far for M7 foundation:
    • services/mlx-worker-python/tests/test_benchmark_schemas.py: pass
    • services/mlx-worker-python/tests/test_release_gates.py: pass
    • scratch-path Swift test for ControlPlaneServiceTests/executeHandlesOpsRunBenchThroughTheModelOperationsWorker: still compiling or pending final result at handoff time