Conversation
…payloads with u8 tag sidecar arrays Split 16-byte (payload, tag) slots into separate u64 payload arrays and u8 tag arrays for lists, tuples, dicts, value stack (r13=payloads, r15=tags), and frame locals. Fix numerous caller-saved register clobber bugs exposed by the refactor: eval_exception_unwind r8 (root cause of 8 test failures), frame_free rdx reload, op_make_cell rcx recovery, op_send tag width, func_call rsi clobber, list_ass_subscript r10/r11 across realloc/XDECREF, GC traverse/clear rdx→r15 and 16B→split stride, sort error path double-free and pointer restoration. Also convert exc_group from stale 16-byte list access to split arrays. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CLAUDE.md) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all SmallStr (TAG_SMALLSTR) code paths across the codebase. All strings are now heap-allocated PyStrObject* with TAG_PTR. This eliminates ~500 lines of SmallStr spill/extract logic from builtins, repr, val, marshal, methods, opcodes, and type implementations. Delete src/pyo/smallstr.asm entirely. Remove TAG_SMALLSTR constant and SMALLSTR_LEN macro from macros.inc. Bug fixes found during code review: 1. set_repr: skip-empty check tested key payload==0 instead of key_tag, causing SmallInt(0), None, and False entries to be silently omitted from set repr output. Fix: check key_tag for TAG_NULL/TOMBSTONE. 2. str_mod .sm_arg_none: returned none_singleton pointer with garbage edx tag. Callers (.sm_str, .sm_int, .sm_repr) read edx, producing undefined behavior. Fix: return canonical (0, TAG_NONE). 3. builtin_print: checked payload before tag after obj_str, violating the tag-first convention. Fix: check tag only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bstr tuple_type_call used r14 for ob_item_tags without saving — caused nqueens SIGFPE crash when tuple() destroyed eval loop's co_consts pointer. Also save r10/r11 across INCREF_VAL in copy loop. Save/restore r14 in sre_match_get_group_str and sre_substr_from_state unicode paths. Remove dead ob_item_tags load in str_method_join. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Six performance optimizations for the Value64 split representation: 1. DISPATCH macro: remove trace_opcodes check (2 fewer instr/dispatch) 2. op_call: walking-pointer arg copy replaces index-math loop (~5 fewer instr/arg) 3. tuple_getslice: step=1 bulk memcpy fast path 4. UNPACK_SEQUENCE: pre-advance r13/r15, unified tuple/list fill loop 5. list_getslice: step=-1 reversed slice fast path (memcpy + in-place reverse) 6. r14 repurpose: locals_tag_base register (-1 load per LOAD_FAST/STORE_FAST, +1 load per LOAD_CONST via new eval_co_consts global) Also: list_ass_subscript insert loop uses bulk memcpy instead of per-element. Benchmark results vs baseline (A/B, same system): spectral_norm: 0.93x → 1.01x (+8%) nqueens: 0.77x → 0.79x (+3%) nbody: 1.04x → 1.06x (+2%) fannkuch: 0.50x → 0.47x (-6%, LOAD_CONST-heavy inner loop) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates the VM/runtime from 128-bit “fat values” (payload+qword tag in 16-byte slots) to a 64-bit payload representation with sidecar u8 tag arrays, and removes the SmallStr inline-string path (strings become heap-only).
Changes:
- Refactors VM value stack + locals to use
u64[]payloads andu8[]tag sidecars (r13payload SP,r15tag SP), including new push/pop macros and frame fields. - Updates list/tuple/dict storage to split payloads and tags (
ob_item+ob_item_tags) and shrinksDictEntry(byte tags,DICT_TOMBSTONE=0xFF). - Removes SmallStr support across string creation, repr/str/hash/truthiness, and deletes
src/pyo/smallstr.asm.
Reviewed changes
Copilot reviewed 47 out of 47 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/frame.asm | Allocates/frees the locals+stack tag sidecar and updates frame layout to payload+tag pointers. |
| src/eval.asm | Threads new stack/tag pointers through interpreter entry/resume/unwind and globalizes co_names/co_consts payload+tag pointers. |
| include/macros.inc | Introduces VPUSH_VAL/VPOP_VAL, frame push helpers, and updates RC macros/tag constants for u8 tags. |
| include/object.inc | Updates list/tuple layout to payload+tag arrays; shrinks dict entries to byte tags + padding and introduces DICT_TOMBSTONE. |
| src/pyo/tuple.asm | Converts tuple storage/iteration/slicing/concat/repeat/etc. to payload+tag arrays with explicit heap alloc/free for arrays. |
| src/repr.asm | Converts list/tuple/dict/set repr to new container layouts and removes SmallStr spill paths. |
| src/pyo/str.asm | Makes str_from_cstr/str_new heap-only and removes SmallStr-aware compare/copy paths. |
| src/object.asm | Removes SmallStr handling from obj_repr, obj_str, obj_hash, obj_is_true, and print paths. |
| src/pyo/set.asm | Removes SmallStr equality handling; set still uses a distinct SetEntry layout (24B, qword key_tag, tombstone 0xDEAD). |
| src/gc.asm | Updates list/tuple/dict traversal/clear for sidecar tags; keeps set traversal/clear as dict passthrough. |
| src/opcodes_* / src/pyo/* | Updates many opcode handlers and runtime helpers to use sidecar tags and new dict entry/tag access patterns. |
| nofat.md | Adds a design briefing documenting the Value64+sidecar-tag migration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DISPATCH macro no longer references trace_opcodes (trace check was removed for performance). Clean up the unused extern declarations. Tracing still works via the centralized eval_dispatch path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
set entries are 24-byte (hash+key+key_tag_qword) with SET_ENTRY_SIZE=24, while dict entries are 32-byte (hash+key+value+byte_tags+padding). The old code tail-jumped to dict_traverse/dict_clear_gc which iterate with DICT_ENTRY_SIZE=32 stride and read byte key_tag at offset 24 — completely wrong offsets for set entries after the first slot. Implement dedicated set_traverse and set_clear_gc that use the correct 24-byte stride, qword key_tag checks (0 for empty, 0xDEAD for tombstone), and only visit/decref the key (no value field in sets). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use SET_ENTRY_SIZE, SET_ENTRY_KEY, SET_ENTRY_KEY_TAG, SET_TOMBSTONE constants instead of hardcoded 24, 8, 16, 0xDEAD in set_repr loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.