Skip to content
Merged
16 changes: 10 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,20 @@ Callee-saved registers hold global interpreter state:
|----------|------|
| `rbx` | Bytecode IP (into co_code[]) |
| `r12` | Current frame (PyFrame*) |
| `r13` | Value stack top |
| `r13` | Value stack top (payload array, u64[]) |
| `r14` | co_consts data ptr (&tuple.ob_item[0]) |
| `r15` | co_names data ptr (&tuple.ob_item[0]) |
| `r15` | Tag stack top (sidecar tag array, u8[]) |
| `ecx` | Opcode arg on handler entry |

co_names is accessed via `LOAD_CO_NAMES reg` / `LOAD_CO_NAMES_TAGS reg` macros (reads from `eval_co_names` / `eval_co_names_tags` globals), not a dedicated register.

**Critical rule:** Never hold live values in caller-saved regs (rax, rcx, rdx, rsi, rdi, r8-r11) across `call` or `DECREF`/`DECREF_REG`. Use push/pop or callee-saved regs instead. `DECREF_REG` calls `obj_dealloc` which clobbers all caller-saved regs.

## 128-bit Fat Values
## Value64 Representation

Values are split into 64-bit payloads stored in `u64[]` arrays and 8-bit tags stored in separate `u8[]` sidecar arrays. The value stack uses `r13` (payload top) and `r15` (tag top). Containers (list, tuple, dict) store `ob_item` (u64[]) and `ob_item_tags` (u8[]) separately. Frame locals use `localsplus` (u64[]) and `locals_tag_base` (u8[]).

All values are 128-bit (payload, tag) pairs in 16-byte slots. Tags: `TAG_NULL=0`, `TAG_SMALLINT=1`, `TAG_FLOAT=2`, `TAG_NONE=3`, `TAG_BOOL=4`, `TAG_PTR=0x105`. SmallInts store raw signed i64 in payload (full 64-bit range), zero heap alloc/refcount. `INCREF_VAL`/`DECREF_VAL` check `TAG_RC_BIT` (bit 8) to decide refcounting. Functions return `(rax=payload, edx=tag)`.
Tags (u8): `TAG_NULL=0`, `TAG_SMALLINT=1`, `TAG_FLOAT=2`, `TAG_NONE=3`, `TAG_BOOL=4`, `TAG_PTR=0x85`. Bit 7 (`TAG_RC_BIT=0x80`) means payload is a refcounted heap pointer. SmallInts store raw signed i64 in payload (full 64-bit range), zero heap alloc/refcount. `INCREF_VAL`/`DECREF_VAL` check `TAG_RC_BIT` to decide refcounting. Functions return `(rax=payload, edx=tag)`.

## Source Layout

Expand All @@ -64,7 +68,7 @@ All values are 128-bit (payload, tag) pairs in 16-byte slots. Tags: `TAG_NULL=0`
Defined in `include/*.inc`. All objects start with `PyObject` (ob_refcnt +0, ob_type +8).

- **PyTypeObject** (types.inc, 192 bytes): tp_call +64, tp_getattr +72, tp_setattr +80, tp_as_number +128, tp_as_sequence +136, tp_as_mapping +144
- **PyFrame** (frame.inc): code +8, globals +16, locals +32, localsplus +72 (variable-size)
- **PyFrame** (frame.inc): code +8, globals +16, locals +32, stack_tag_ptr +64, locals_tag_base +96, localsplus +104 (variable-size u64[])
- **PyCodeObject** (object.inc): co_consts, co_names, co_code starts at +112

## Opcode Handler Pattern
Expand All @@ -77,7 +81,7 @@ op_example:
DISPATCH ; jmp eval_dispatch
```

Stack macros: `VPUSH reg`, `VPOP reg`, `VPEEK reg`, `VPEEK_AT reg, offset`
Stack macros: `VPUSH_PTR reg`, `VPUSH_INT reg`, `VPUSH_FLOAT reg`, `VPUSH_NONE`, `VPUSH_BOOL reg`, `VPUSH_VAL pay, tag`, `VPOP reg` (payload only), `VPOP_VAL pay, tag`, `VPEEK reg`

## Named Frame-Layout Constants

Expand Down
25 changes: 13 additions & 12 deletions STYLE.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,11 @@ Repeat the register convention comment block at the top of every
; Register convention (callee-saved, preserved across handlers):
; rbx = bytecode instruction pointer (current position in co_code[])
; r12 = current frame pointer (PyFrame*)
; r13 = value stack top pointer
; r13 = value stack top pointer (payload array, u64[])
; r14 = co_consts tuple data pointer (&tuple.ob_item[0])
; r15 = co_names tuple data pointer (&tuple.ob_item[0])
; r15 = tag stack top pointer (sidecar tag array, u8[])
;
; co_names accessed via LOAD_CO_NAMES / LOAD_CO_NAMES_TAGS macros (globals).
; ecx = opcode argument on entry (set by eval_dispatch)
; rbx has already been advanced past the 2-byte instruction word.
```
Expand Down Expand Up @@ -230,16 +231,16 @@ with raw arithmetic unless implementing a new stack macro.
Prefer typed pushes (`VPUSH_PTR`, `VPUSH_INT`) over `VPUSH` when the
type is statically known — they avoid branching.

## Fat Value Return/Push Macros
## Value Return/Push Macros

Always use these macros for fat value return patterns. Never inline the
Always use these macros for value return patterns. Never inline the
equivalent instructions — inlining is a source of bugs.

| Macro | Expansion | Use when |
|-------|-----------|----------|
| `RET_NULL` | `xor eax, eax` / `xor edx, edx` | Error return: (0, TAG_NULL) |
| `RET_TAG_SMALLINT` | `mov edx, TAG_SMALLINT` | Return SmallInt (caller sets rax) |
| `SPUSH_PTR reg` | `sub rsp, 16` / `mov [rsp], reg` / `mov qword [rsp+8], TAG_PTR` | Build 16-byte fat arg on stack for tp_call |
| `SPUSH_PTR reg` | `sub rsp, 16` / `mov [rsp], reg` / `mov qword [rsp+8], TAG_PTR` | Build fat arg on stack for tp_call |

## Refcounting Macros

Expand All @@ -249,21 +250,21 @@ equivalent instructions — inlining is a source of bugs.
| `DECREF reg` | Known heap pointer (saves/restores rdi) |
| `DECREF_REG reg` | Known heap pointer (does NOT save rdi) |
| `XDECREF reg` | Possibly NULL heap pointer |
| `INCREF_VAL pay, tag` | 128-bit fat value |
| `DECREF_VAL pay, tag` | 128-bit fat value (clobbers rdi + caller-saved) |
| `XDECREF_VAL pay, tag` | 128-bit fat value, NULL-safe |
| `INCREF_VAL pay, tag` | Value64 (payload + u8 tag) |
| `DECREF_VAL pay, tag` | Value64 (clobbers rdi + caller-saved) |
| `XDECREF_VAL pay, tag` | Value64, NULL-safe |

`DECREF_REG` and `DECREF_VAL` contain `call obj_dealloc` which **clobbers
all caller-saved registers** when the refcount reaches zero.

## Addressing Idioms

**Localsplus indexing** (16 bytes/slot = ×8 × ×2 via LEA):
**Localsplus indexing** (8 bytes/payload slot + separate u8 tag array):

```nasm
lea rdx, [rcx*8] ; slot * 8
mov rdi, [r12 + rdx*2 + PyFrame.localsplus] ; payload
mov r9, [r12 + rdx*2 + PyFrame.localsplus + 8] ; tag
mov rdi, [r12 + rcx*8 + PyFrame.localsplus] ; payload from u64[]
mov rdx, [r12 + PyFrame.locals_tag_base] ; tag array base (u8[])
movzx esi, byte [rdx + rcx] ; tag from u8[]
```

**Forward bytecode jumps** (instruction words → bytes = ×2):
Expand Down
15 changes: 9 additions & 6 deletions include/frame.inc
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,15 @@ struc PyFrame
.builtins: resq 1 ; +24: ptr to builtins dict
.locals: resq 1 ; +32: ptr to locals dict (or NULL for fast locals)
.instr_ptr: resq 1 ; +40: saved bytecode pointer
.stack_ptr: resq 1 ; +48: saved value stack pointer
.stack_base: resq 1 ; +56: ptr to bottom of value stack
.return_offset: resd 1 ; +64: return offset
.nlocalsplus: resd 1 ; +68: number of locals + cells + frees
.func_obj: resq 1 ; +72: ptr to function object (for closures)
.localsplus: ; +80: PyObject*[] array (variable size)
.stack_ptr: resq 1 ; +48: saved payload stack pointer
.stack_base: resq 1 ; +56: ptr to bottom of payload stack
.stack_tag_ptr: resq 1 ; +64: saved tag stack pointer
.stack_tag_base: resq 1 ; +72: ptr to bottom of tag stack
.return_offset: resd 1 ; +80: return offset
.nlocalsplus: resd 1 ; +84: number of locals + cells + frees
.func_obj: resq 1 ; +88: ptr to function object (for closures)
.locals_tag_base: resq 1 ; +96: ptr to locals tag array
.localsplus: ; +104: Value64 payload array (variable size)
endstruc

FRAME_HEADER_SIZE equ PyFrame.localsplus
Expand Down
4 changes: 1 addition & 3 deletions include/gc.inc
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,10 @@ GC_PREV_MASK equ ~3 ; mask to extract prev pointer (clear low 2 bi
lea %1, [%2 + GC_HEAD_SIZE]
%endmacro

; VISIT_FAT — call visit callback on a fat value slot if it's a heap pointer
; VISIT_FAT — call visit callback on a value if it's a heap pointer
; r14 must be loaded with the visit callback function pointer before use
; rdi is set to the payload (object pointer) for the callback
%macro VISIT_FAT 2 ; %1 = payload_reg, %2 = tag_reg (64-bit)
bt %2, 63
jc %%skip ; SmallStr — skip
test %2, TAG_RC_BIT
jz %%skip ; no RC bit — not a heap pointer
test %1, %1
Expand Down
Loading