Releases: orenlab/codeclone
CodeClone 2.0.0b4: with first-class MCP, VS Code, Claude, and Codex surfaces
MCP server
- Add
help(topic=...)tool for workflow guidance, baseline semantics, analysis profile, and review-state routing
(tool count: 20 → 21). - Add
analysis_profilehelp topic for explicit conservative-first / deeper-review threshold guidance. - Enrich
_SERVER_INSTRUCTIONSwith triage-first workflow, budget-aware drill-down, and conservative-first threshold
guidance so MCP-capable clients receive structured behavioral context on connect. - Optimize MCP payloads: short finding IDs (sha256-based for block clones), compact
derivedsection projection,
boundedmetrics_detailwith pagination. - Fix MCP initialize metadata so
serverInfo.versionreports the CodeClone package version rather than the underlying
mcpruntime version.
Report contract
- Bump canonical report schema to
2.3. - Add
metrics.overloaded_modules— report-only module-hotspot ranking by size, complexity, and coupling pressure. - Surface Overloaded Modules across JSON, text/markdown, HTML, and MCP without affecting findings, health, or gates.
- Normalize the canonical family name and MCP/report output to
overloaded_modules;god_modulesremains accepted as a
read-only MCP input alias during transition.
CLI and HTML
- Align CLI and HTML scope summaries with canonical inventory totals.
- Redesign Overview tab: Executive Summary becomes 2-column (Issue Breakdown + Source Breakdown) with scan scope in
the section subtitle; Overloaded Modules section replaces the earlier stretched module-hotspot layout.
Documentation
- Add Health Score chapter: scoring inputs, report-only layers, phased expansion policy.
- Document that future releases may lower scores due to broader scoring model, not only worse code.
IDE and client integration (preview)
- Add VS Code extension (
codeclone-mcpclient) with baseline-aware triage, source drill-down, Explorer decorations,
and HTML-report bridging. - Add conservative, deeper-review, and custom analysis profiles to the VS Code extension and pass them through to MCP.
- Add limited Restricted Mode: onboarding works in untrusted workspaces, analysis stays gated until trust is granted.
- Add Node unit tests, extension-host smoke tests, and
.vsixpackaging. - Tighten the VS Code extension to current VS Code UX guidance: one primary editor action, titled Quick Picks,
per-view icons, non-button tree details, and a hard minimum local CodeClone version gate (>= 2.0.0b4). - Add Claude Desktop
.mcpbbundle wrapper for the localcodeclone-mcplauncher with pre-loaded review instructions,
explicit launcher settings, platform auto-discovery (macOS, Linux, Windows), local-stdio enforcement, signal
forwarding, and deterministic package build smoke. - Add a native Codex plugin with repo-local discovery metadata, bundled
codeclone-mcpconfig, pre-loaded instructions,
and two skills: conservative-first full review and quick hotspot discovery.
Internal
- Extract shared
_json_iomodule for deterministic JSON serialization across baseline, cache, and report paths. - Remove low-signal structural clone noise surfaced by stricter analysis passes without touching golden fixture debt.
CodeClone 2.0.0b3: MCP, UX and Platform Tightening
2.0.0b3 is the release where CodeClone stops looking like "a strong analyzer with extras" and starts looking like a coherent platform: canonical-report-first, agent-facing, CI-native, and product-grade.
Licensing & packaging
- Re-license source code to MPL-2.0 while keeping documentation under MIT.
- Ship dual
LICENSE/LICENSE-docsfiles and sync SPDX headers.
MCP server (new)
- Add optional
codeclone[mcp]extra withcodeclone-mcplauncher (stdioandstreamable-http). - Introduce a read-only MCP surface with 20 tools, fixed resources, and run-scoped URIs for analysis, changed-files
review, run comparison, findings / hotspots / remediation, granular checks, and gate preview. - Add bounded run retention (
--history-limit),--allow-remoteguard, and rejectcache_policy=refreshto preserve
read-only semantics. - Optimize MCP payloads for agents with short ids, compact summaries/cards, bounded
metrics_detail, and slim
changed-files / compare-runs responses — without changing the canonical report contract. - Make MCP explicitly triage-first and budget-aware: clients are guided toward summary/triage → hotspots /
check_*→
single-finding drill-down instead of broad early listing. - Add
cache.freshnessmarker andget_production_triage/codeclone://latest/triagefor compact production-first
overview. - Improve run-comparison honesty:
compare_runsnow reportsmixed/incomparable, andclones_onlyruns surface
health: unavailableinstead of placeholder values. - Harden repository safety: MCP analysis now requires an absolute repository root and rejects relative roots like
.
to avoid analyzing the wrong directory. - Fix hotlist key resolution for
production_hotspotsandtest_fixture_hotspots. - Bump cache schema to
2.3(stale metric entries rebuilt, not reused).
Report contract
- Bump canonical report schema to
2.2. - Add canonical
meta.analysis_thresholds.design_findingsprovenance and move threshold-aware design findings fully
into the canonical report, so MCP and HTML read the same design-finding universe. - Add
derived.overview.directory_hotspotsand render it in the HTML Overview tab asHotspots by Directory.
CLI
- Add
--changed-only,--diff-against, and--paths-from-git-difffor changed-scope review and gating with
first-class summary output.
SARIF
- Stabilize
primaryLocationLineHash(line numbers excluded), add run-uniqueautomationDetails.id/
startTimeUtc, set explicitkind: "fail", and move ancillary fields toproperties.
HTML report
- Add
Hotspots by Directoryto the Overview tab, surfacing directory-level concentration forall,clones, and low-cohesion findings with scope-aware badges and compact counts. - Add IDE picker (PyCharm, IDEA, VS Code, Cursor, Fleet, Zed) with persistent selection.
- Add clickable file-path deep links across all tabs and stable
finding-{id}anchors.
GitHub Action
- Ship Composite Action v2 with configurable quality gates, SARIF upload to Code Scanning, and PR summary comments.
CodeClone 2.0.0b2: fix UI errors and update deps
Dependencies
- Upgrade requests (dev dep) to 2.33.0 for extract_zipped_paths security fix (CVE-2026-25645)
HTML
- Fix page-level horizontal scrolling in wide table tabs by constraining overflow to local table wrappers (#14).
- Fix mobile header brand block layout on narrow viewports (#15).
- Make mobile navigation tabs sticky and horizontally scrollable with scroll-shadow affordance.
- Keep Overview KPI micro-badges inside cards at extreme browser/mobile widths.
- Restyle Report Provenance summary badges to match the card-style badge language used across the report.
CodeClone 2.0.0b1: evolves from a structural clone detector into a baseline-aware code-health and CI governance tool for Python
Major upgrade: CodeClone evolves from a structural clone detector into a baseline-aware code-health and CI governance tool for Python.
Architecture
- Stage-based pipeline (
pipeline.py): discovery → processing → analysis → reporting → gating. - Domain layers:
models.py,metrics/,report/,grouping.py. - Baseline schema
2.0, report schema2.1, cache schema2.2;fingerprint_versionremains1.
Code-Health Analysis
- Seven health dimensions: clones, complexity, coupling, cohesion, dead code, dependencies, coverage.
- Piecewise clone scoring curve: mild penalty below 5% density, steep 5–20%, aggressive above 20%.
- Dimension weights: clones 25%, complexity 20%, cohesion 15%, coupling 10%, dead code 10%, dependencies 10%, coverage 10%.
- Grade bands: A ≥90, B ≥75, C ≥60, D ≥40, F <40.
Detection Thresholds
- Lowered function-level
--min-locfrom 15 to 10 (configurable via CLI/pyproject.toml). - Lowered block fragment gate from loc≥40/stmt≥10 to loc≥20/stmt≥8.
- Lowered segment fragment gate from loc≥30/stmt≥12 to loc≥20/stmt≥10.
- All six thresholds configurable via
[tool.codeclone]inpyproject.toml.
Detection Quality
- Conservative dead-code detector: skips tests, dunders, visitors, protocol stubs.
- Module-level PEP 562 hooks (
__getattr__,__dir__) are treated as non-actionable dead-code candidates. - Exact qualname-based liveness with import-alias resolution.
- Canonical inline suppression syntax:
# codeclone: ignore[dead-code]on declarations. - Structural finding families:
duplicated_branches,clone_guard_exit_divergence,clone_cohort_drift.
Configuration and CLI
- Config from
pyproject.tomlunder[tool.codeclone]; precedence: CLI > pyproject.toml > defaults. - Optional-value report flags:
--html,--json,--md,--sarif,--textwith deterministic default paths. --open-html-report,--timestamped-report-paths,--cipreset.- Explicit
--no-progress/--progress,--no-color/--colorflag pairs.
HTML Report
- Overview: KPI grid with health gauge (baseline delta arc), Executive Summary (issue breakdown + source breakdown),
Health Profile radar chart. - KPI cards show baseline-aware tone:
✓ baselinedpill when all items are accepted debt,+Nred badge for
regressions. - Get Badge modal: grade-only and score+grade variants, shields.io preview, Markdown/HTML embeds, copy feedback.
- Report Provenance modal with section cards, SVG icons, boolean badges.
- Responsive layout with dark/light theme toggle and system theme detection.
Baseline and Contracts
- Unified baseline flow: clone keys + optional metrics in one file.
- Metrics snapshot integrity via
meta.metrics_payload_sha256. - Report contract: canonical
meta/inventory/findings/metrics+ derivedsuggestions/overview+integrity. - SARIF:
%SRCROOT%anchoring,baselineState, rich rule metadata. - Cache compatibility now keys off the full six-threshold analysis profile
(function + block + segment thresholds), not only the top-level function gate.
Performance
- Unified AST collection pass (merged 3 separate walks).
- Suppression fast-path: skip tokenization when
codeclone:absent. - Cache dirty flag: skip
save()on warm path when nothing changed. - Adaptive multiprocessing, batch statement hashing, deferred HTML import.
Docs and Publishing
- MkDocs site with Material theme and GitHub Pages workflow.
- Live sample reports (HTML, JSON, SARIF).
- PyPI-facing README now uses published docs URLs instead of repo-relative doc links.
Packaging
- Package metadata stays explicitly beta (
2.0.0b1,Development Status :: 4 - Beta). pyproject.tomlmoved to SPDX-stylelicense = "MIT"andproject.license-filesfor modern setuptools builds without release-time deprecation warnings.
Stability
- Exit codes unchanged:
0/2/3/5. - Fingerprint contract unchanged:
BASELINE_FINGERPRINT_VERSION = "1". - Coverage gate:
>=99%.
CodeClone 1.4.4: Perfomance Fix
Performance
- Optimized HTML snippet rendering hot path:
- file snippets now reuse cached full-file lines and slice ranges without
repeated full-file scans - Pygments modules are loaded once per importer identity instead of
re-importing for each snippet
- file snippets now reuse cached full-file lines and slice ranges without
- Optimized block explainability range stats:
- replaced repeated full
ast.walk()scans per range with a per-file
statement index +bisectwindow lookup
- replaced repeated full
Tests
- Preserved existing golden/contract behavior for
1.4.xand kept report output
semantics unchanged while improving runtime overhead.
Contract Notes
- No baseline/cache/report schema changes.
- No clone detection or fingerprint semantic changes.
CodeClone 1.4.3: Cache compatibility now respects --min-loc/--min-stmt
Cache Contract
- Cache schema bumped from
v1.2tov1.3. - Added signed analysis profile to cache payload:
payload.ap.min_locpayload.ap.min_stmt
- Cache compatibility now requires
payload.apto match current CLI analysis thresholds. On mismatch, cache is ignored withcache_status=analysis_profile_mismatchand analysis continues without cache.
CLI
- CLI now constructs cache context with effective
--min-locand--min-stmtvalues, so cache reuse is consistent
with active analysis thresholds.
Tests
- Added regression coverage for analysis-profile cache mismatch/match behavior in:
tests/test_cache.pytests/test_cli_inprocess.py
Contract Notes
- Baseline contract is unchanged (
schema v1.0,fingerprint version 1). - Report schema is unchanged (
v1.1); cache metadata adds a newcache_statusenum value.
CodeClone 1.4.2: maintenance update
Overview
This patch release is a maintenance update. Determinism remains guaranteed: reports are stable and ordering is
unchanged.
Performance & Implementation Cleanup
process_file()now uses a singleos.stat()call to obtain both size (size guard) andst_mtime_ns/st_size(file
stat signature), removing a redundantos.path.getsize()call.- Discovery logic was deduplicated by extracting
_discover_files(); quiet/non-quiet behavior differs only by UI status
wrapper, not by semantics or filtering. - Cache path wiring now precomputes
wire_mapso_wire_filepath_from_runtime()is evaluated once per key.
Hash Reuse for Block/Segment Analysis
extract_blocks()andextract_segments()accept optionalprecomputed_hashes. When provided, they reuse hashes
instead of recomputing.- The extractor computes function body hashes once and passes them to both block and segment extraction when both
analyses run for the same function.
Scanner Efficiency (No Semantic Change)
iter_py_files()now filters candidates before sorting, so only valid candidates are sorted. The final order remains
deterministic and equivalent to previous behavior.
Contract Tightening
precomputed_hashestype strengthened:list[str] | None→Sequence[str] | None(read-only intent in the type
contract).- Added
assert len(precomputed_hashes) == len(body)in bothextract_blocks()andextract_segments()to catch
mismatched inputs early (development-time invariant).
Testing & Determinism
- Byte-identical JSON reports verified across repeated runs; differences, when present, are limited to
volatile/provenance meta fields (e.g., cache status/path, timestamps), while semantic payload remains stable. - Unit tests updated to mock
os.statinstead ofos.path.getsizewhere applicable (test_process_file_stat_error,
test_process_file_size_limit).
Notes
- No changes to:
- detection semantics / fingerprints
- baseline hash inputs (
payload_sha256semantic payload) - exit code contract and precedence
- schema versions (baseline v1.0, cache v1.2, report v1.1)
CodeClone 1.4.1: fix UI
CLI
- Semantic summary colors: clone counts →
bold yellow, file metrics → neutralbold - Phase separator, bold report paths, "Done in X.Xs" timing line
HTML Report
- HiDPI chart canvas, hit-line markers with Pygments, cross-browser
<select> - Platform-aware shortcut labels (
⌘/Ctrl+), color-coded section borders - Compact code lines, proper tab-bar for novelty filter, polished transitions
- Rounded-rect badges (
6px), tighter card radii (10px), cleaner empty states
CodeClone 1.4.0: stabilizes the baseline contract for long-term CI reuse without changing clone-detection semantics
Overview
This release stabilizes the baseline contract for long-term CI reuse without changing clone-detection semantics. Key
improvements include baseline schema standardization, enhanced cache efficiency, and hardened IO/contract behavior for
CI environments.
Baseline Schema & Compatibility
Stable v1 Schema
- Baseline now uses stable v1 schema with strict top-level
meta+clonesobjects - Compatibility gated by
schema_version,fingerprint_version, andpython_tag(independent of package patch/minor
version) - Trust validation requires
meta.generator.nameto becodeclone - Legacy 1.3 baseline layouts treated as untrusted with explicit regeneration guidance
Integrity & Hash Calculation
- Baseline integrity uses canonical
payload_sha256over semantic payload (functions,blocks,
fingerprint_version,python_tag) - Intentionally excluded from
payload_sha256:schema_version(compatibility gate only)meta.generator.name(trust gate only)meta.generator.versionandmeta.created_at(informational only)
- Hash inputs remain stable across future 1.x patch/minor releases
- Baseline regeneration required only when
fingerprint_versionorpython_tagchanges
Migration Notes
- Early 1.4.0 development snapshots (before integrity canonicalization fix) may require one-time
codeclone . --update-baseline - After this one-time update, baselines are stable for long-term CI use
File System & Storage
Atomic Operations
- Baseline writes use atomic
*.tmp+os.replacepattern (same filesystem requirement) - Configurable size guards:
--max-baseline-size-mb--max-cache-size-mb
Baseline Trust Model
- Normal mode: Untrusted baseline triggers warning and comparison against empty baseline
- CI preset (
--ci): Untrusted baseline causes fast-fail with exit code2 - Deterministic behavior ensures predictable CI outcomes
CLI & Exit Codes
Exit Code Contract (explicit and stable)
0- Success2- Contract error (unreadable files, untrusted baseline, integrity failures)3- Gating failure (new clones, threshold violations)5- Internal error
Exit Code Priority
- Contract errors (exit
2) override gating failures (exit3) when both conditions present
CI/Gating Modes
- In CI/gating modes (
--ci,--fail-on-new,--fail-threshold):- Unreadable or decode-failed source files treated as contract errors (exit
2) - Prevents incomplete analysis from passing CI checks
- Unreadable or decode-failed source files treated as contract errors (exit
Error Handling
- Standardized internal error UX:
INTERNAL ERRORwith reason and actionable next steps - New
--debugflag (alsoCODECLONE_DEBUG=1) includes traceback + runtime environment details - CLI help now includes canonical exit-code descriptions plus
Repository/Issues/Docslinks
Reporting Enhancements
JSON Report (v1.1 Schema)
- Compact deterministic layout with top-level
meta+files+groups - Explicit
group_item_layoutfor array-based group records - New
groups_splitstructure withnew/knownkeys per section - Deterministic
meta.groups_countsaggregates - Legacy alias sections removed (
function_clones,block_clones,segment_clones)
TXT Report (aligned to report meta v1.1)
- Normalized metadata/order as stable contract
- Explicit section metrics:
locfor functions,sizefor blocks/segments - Sections split into
(NEW)and(KNOWN)for functions/blocks/segments - With untrusted baseline:
(KNOWN)sections empty, all groups in(NEW)
HTML Report (aligned to report meta v1.1)
- New baseline split controls:
New duplicates/Known duplicates - Consistent filtering behavior across report types
- Block explainability now core-owned (
block_group_facts) - Expanded
Report Provenancesection displays full meta information block
Cross-Format Metadata
- All formats (HTML/TXT/JSON) now include:
baseline_payload_sha256andbaseline_payload_sha256_verifiedfor audit traceability- Cache contract fields:
cache_schema_version,cache_status,cache_used - Baseline audit fields and trust status
Documentation
- Added the contract documentation book
docs/book/.
CodeClone 1.3.0: improves detection precision, determinism, and auditability, adds segment-level reporting, refreshes the HTML report UI, and hardens baseline/cache contracts for CI usage.
Overview
This release improves detection precision, determinism, and auditability, adds segment-level reporting, refreshes the HTML report UI, and hardens baseline/cache contracts for CI usage.
Breaking (CI): baseline contract checks are stricter. Legacy or mismatched baselines must be regenerated.
Detection Engine
- Safe normalization upgrades: local logical equivalence, proven-domain commutative canonicalization, and preserved symbolic call targets.
- Internal CFG metadata markers were moved to the
__CC_META__::...namespace and emitted as synthetic AST names to prevent collisions with user string literals. - CFG precision upgrades: short-circuit micro-CFG, selective
try/exceptraise-linking, loopbreak/continuejump semantics,for/while ... else, and orderedmatch/except. - Deterministic traversal and ordering improvements for stable clone grouping/report output.
- Segment-level internal detection added with strict candidate->hash confirmation; remains report-only (not part of baseline/CI fail criteria).
- Segment report noise reduction: overlapping windows are merged and boilerplate-only groups are suppressed using deterministic AST criteria.
Baseline & CI
- Baseline format is versioned (
baseline_version,schema_version) and legacy baselines fail fast with regeneration guidance. - Added tamper-evident baseline integrity for v1.3+ (
generator,payload_sha256). - Added configurable size guards:
--max-baseline-size-mb,--max-cache-size-mb. - Behavioral hardening: in normal mode, untrusted baseline states are ignored with warning and compared as empty; in
--fail-on-new/--ci, they fail fast with deterministic exit codes.
Update baseline after upgrade:
codeclone . --update-baselineCLI & Reports
- Added
--version,--cache-path(legacy alias:--cache-dir), and--cipreset. - Added strict output extension validation for
--html/.html,--json/.json,--text/.txt. - Summary output was redesigned for deterministic, cache-aware metrics across standard and CI modes.
- User-facing CLI messages were centralized in
codeclone/ui_messages.py. - HTML/TXT/JSON reports now include consistent provenance metadata (baseline/cache status fields).
- Clone group/report ordering is deterministic and aligned across HTML/TXT/JSON outputs.
HTML UI
- Refreshed layout with improved navigation and dashboard widgets.
- Added command palette and keyboard shortcuts.
- Replaced emoji icons with inline SVG icons.
- Hardened escaping (text + attribute context) and snippet fallback behavior.
Cache & Security
- Cache default moved to
<root>/.cache/codeclone/cache.jsonwith legacy path warning. - Cache schema was extended to include segment data (
CACHE_VERSION=1.1). - Cache integrity uses constant-time signature checks and deep schema validation.
- Invalid/oversized cache is ignored deterministically and rebuilt from source.
- Added security regressions for traversal safety, report escaping, baseline/cache integrity, and deterministic report ordering across formats.
- Fixed POSIX parser CPU guard to avoid lowering
RLIMIT_CPUhard limit.
Documentation & Packaging
- Updated README and docs (
architecture,cfg,SECURITY,CONTRIBUTING) to reflect current contracts and behaviors. - Removed an invalid PyPI classifier from package metadata.