Skip to content

Updates#3

Merged
jgarzik merged 8 commits intomainfrom
updates
Feb 22, 2026
Merged

Updates#3
jgarzik merged 8 commits intomainfrom
updates

Conversation

@jgarzik
Copy link
Copy Markdown
Owner

@jgarzik jgarzik commented Feb 22, 2026

No description provided.

jgarzik and others added 5 commits February 22, 2026 17:09
…payloads with u8 tag sidecar arrays

Split 16-byte (payload, tag) slots into separate u64 payload arrays and u8 tag
arrays for lists, tuples, dicts, value stack (r13=payloads, r15=tags), and
frame locals. Fix numerous caller-saved register clobber bugs exposed by the
refactor: eval_exception_unwind r8 (root cause of 8 test failures), frame_free
rdx reload, op_make_cell rcx recovery, op_send tag width, func_call rsi
clobber, list_ass_subscript r10/r11 across realloc/XDECREF, GC traverse/clear
rdx→r15 and 16B→split stride, sort error path double-free and pointer
restoration. Also convert exc_group from stale 16-byte list access to split
arrays.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…CLAUDE.md)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove all SmallStr (TAG_SMALLSTR) code paths across the codebase. All
strings are now heap-allocated PyStrObject* with TAG_PTR. This eliminates
~500 lines of SmallStr spill/extract logic from builtins, repr, val,
marshal, methods, opcodes, and type implementations. Delete
src/pyo/smallstr.asm entirely. Remove TAG_SMALLSTR constant and
SMALLSTR_LEN macro from macros.inc.

Bug fixes found during code review:

1. set_repr: skip-empty check tested key payload==0 instead of key_tag,
   causing SmallInt(0), None, and False entries to be silently omitted
   from set repr output. Fix: check key_tag for TAG_NULL/TOMBSTONE.

2. str_mod .sm_arg_none: returned none_singleton pointer with garbage
   edx tag. Callers (.sm_str, .sm_int, .sm_repr) read edx, producing
   undefined behavior. Fix: return canonical (0, TAG_NONE).

3. builtin_print: checked payload before tag after obj_str, violating
   the tag-first convention. Fix: check tag only.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bstr

tuple_type_call used r14 for ob_item_tags without saving — caused nqueens
SIGFPE crash when tuple() destroyed eval loop's co_consts pointer.
Also save r10/r11 across INCREF_VAL in copy loop. Save/restore r14 in
sre_match_get_group_str and sre_substr_from_state unicode paths.
Remove dead ob_item_tags load in str_method_join.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Six performance optimizations for the Value64 split representation:

1. DISPATCH macro: remove trace_opcodes check (2 fewer instr/dispatch)
2. op_call: walking-pointer arg copy replaces index-math loop (~5 fewer instr/arg)
3. tuple_getslice: step=1 bulk memcpy fast path
4. UNPACK_SEQUENCE: pre-advance r13/r15, unified tuple/list fill loop
5. list_getslice: step=-1 reversed slice fast path (memcpy + in-place reverse)
6. r14 repurpose: locals_tag_base register (-1 load per LOAD_FAST/STORE_FAST,
   +1 load per LOAD_CONST via new eval_co_consts global)

Also: list_ass_subscript insert loop uses bulk memcpy instead of per-element.

Benchmark results vs baseline (A/B, same system):
  spectral_norm: 0.93x → 1.01x (+8%)
  nqueens:       0.77x → 0.79x (+3%)
  nbody:         1.04x → 1.06x (+2%)
  fannkuch:      0.50x → 0.47x (-6%, LOAD_CONST-heavy inner loop)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the VM/runtime from 128-bit “fat values” (payload+qword tag in 16-byte slots) to a 64-bit payload representation with sidecar u8 tag arrays, and removes the SmallStr inline-string path (strings become heap-only).

Changes:

  • Refactors VM value stack + locals to use u64[] payloads and u8[] tag sidecars (r13 payload SP, r15 tag SP), including new push/pop macros and frame fields.
  • Updates list/tuple/dict storage to split payloads and tags (ob_item + ob_item_tags) and shrinks DictEntry (byte tags, DICT_TOMBSTONE=0xFF).
  • Removes SmallStr support across string creation, repr/str/hash/truthiness, and deletes src/pyo/smallstr.asm.

Reviewed changes

Copilot reviewed 47 out of 47 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/frame.asm Allocates/frees the locals+stack tag sidecar and updates frame layout to payload+tag pointers.
src/eval.asm Threads new stack/tag pointers through interpreter entry/resume/unwind and globalizes co_names/co_consts payload+tag pointers.
include/macros.inc Introduces VPUSH_VAL/VPOP_VAL, frame push helpers, and updates RC macros/tag constants for u8 tags.
include/object.inc Updates list/tuple layout to payload+tag arrays; shrinks dict entries to byte tags + padding and introduces DICT_TOMBSTONE.
src/pyo/tuple.asm Converts tuple storage/iteration/slicing/concat/repeat/etc. to payload+tag arrays with explicit heap alloc/free for arrays.
src/repr.asm Converts list/tuple/dict/set repr to new container layouts and removes SmallStr spill paths.
src/pyo/str.asm Makes str_from_cstr/str_new heap-only and removes SmallStr-aware compare/copy paths.
src/object.asm Removes SmallStr handling from obj_repr, obj_str, obj_hash, obj_is_true, and print paths.
src/pyo/set.asm Removes SmallStr equality handling; set still uses a distinct SetEntry layout (24B, qword key_tag, tombstone 0xDEAD).
src/gc.asm Updates list/tuple/dict traversal/clear for sidecar tags; keeps set traversal/clear as dict passthrough.
src/opcodes_* / src/pyo/* Updates many opcode handlers and runtime helpers to use sidecar tags and new dict entry/tag access patterns.
nofat.md Adds a design briefing documenting the Value64+sidecar-tag migration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jgarzik and others added 3 commits February 22, 2026 21:46
DISPATCH macro no longer references trace_opcodes (trace check
was removed for performance). Clean up the unused extern declarations.
Tracing still works via the centralized eval_dispatch path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
set entries are 24-byte (hash+key+key_tag_qword) with SET_ENTRY_SIZE=24,
while dict entries are 32-byte (hash+key+value+byte_tags+padding). The old
code tail-jumped to dict_traverse/dict_clear_gc which iterate with
DICT_ENTRY_SIZE=32 stride and read byte key_tag at offset 24 — completely
wrong offsets for set entries after the first slot.

Implement dedicated set_traverse and set_clear_gc that use the correct
24-byte stride, qword key_tag checks (0 for empty, 0xDEAD for tombstone),
and only visit/decref the key (no value field in sets).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use SET_ENTRY_SIZE, SET_ENTRY_KEY, SET_ENTRY_KEY_TAG, SET_TOMBSTONE
constants instead of hardcoded 24, 8, 16, 0xDEAD in set_repr loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jgarzik jgarzik merged commit 5d7e936 into main Feb 22, 2026
1 check passed
@jgarzik jgarzik deleted the updates branch February 22, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants