Neomacs Architecture: Redesigning Emacs in Rust

What's Wrong with Official Emacs — Redesign Opportunities

The official Emacs was designed in 1984 for single-core CPUs, text terminals, and kilobyte-sized files. Many of its core design decisions are fundamentally mismatched with modern hardware. This section identifies what to redesign and why.

1. Single-Threaded Lisp Machine

The problem: Modern CPUs have 8-32+ cores. Emacs uses one. Everything — Lisp execution, redisplay, process I/O, file I/O, garbage collection — runs sequentially on a single thread.

What breaks:

lsp-mode receives JSON from language server → parses on main thread → blocks redisplay
magit runs git process → reads output → blocks typing
Font-lock fontifies large file → blocks everything
GC runs → everything freezes (50-200ms pauses)

What neomacs can do: True multi-threaded Elisp:

Render thread (already done in neomacs)
Process I/O thread — read subprocess output without blocking Lisp
Parallel fontification — font-lock on background thread
Concurrent GC — no stop-the-world pauses
Parallel Lisp execution — multiple Lisp threads with proper synchronization

2. CPU-Only Rendering

The problem: Emacs draws every glyph via CPU → X11/Cairo. The GPU sits completely idle. A modern GPU has ~10,000 shader cores designed for exactly this workload (rendering thousands of small textured quads).

What neomacs already does well: wgpu rendering, glyph atlas caching, batched instanced drawing, shader effects. This is largely solved.

What could be better:

GPU-side glyph rasterization (currently cosmic-text rasterizes on CPU, uploads to GPU)
Compute shader for glyph positioning (minor — CPU layout is fast enough)

3. Persistent Pixel Buffers vs Immediate Mode

The problem: Official Emacs backends (X11, GTK) use retained mode — draw once to a pixel buffer, only redraw changed regions. This was optimal for 1990s X11 (network-transparent, slow connections). But modern GPUs prefer immediate mode — clear and redraw everything each frame. GPU rendering is so fast that the complexity of tracking dirty regions isn't worth it.

Official Emacs (retained):
  Frame 1: Draw everything to pixmap
  Frame 2: Diff desired vs current → redraw 3 changed rows
  Frame 3: Diff → redraw 1 changed row
  Cost: O(changed) per frame, but complex bookkeeping (~7k LOC in dispnew.c)

Neomacs (immediate):
  Frame 1: Clear → draw everything
  Frame 2: Clear → draw everything
  Frame 3: Clear → draw everything
  Cost: O(total) per frame, but trivial code and GPU handles it in <1ms

What neomacs does well: Already uses immediate mode. The entire dispnew.c matrix diffing system (~7.6k LOC) becomes unnecessary.

4. Character-Cell Thinking

The problem: Emacs was designed for fixed-width character terminals. Internally, many things are measured in "columns" not pixels:

current-column returns column number (character count)
move-to-column moves to column N
window-width returns width in characters
Tab stops are character-aligned
Truncation/wrapping decisions use column counts
hscroll is measured in columns

This breaks with:

Variable-width fonts — column N isn't at a fixed pixel position
Ligatures — "fi" is one glyph, not two columns
Emoji — may be wider than 2 columns
CJK — 2-column wide but pixel width varies with font

What neomacs can do: Pixel-accurate layout from the start. cosmic-text handles variable-width, ligatures, and complex scripts natively. Column-based Lisp functions (current-column, etc.) become thin wrappers that convert pixel positions back to approximate column numbers for compatibility.

5. No Native Text Shaping

The problem: Emacs bolted on HarfBuzz support (Emacs 27+), but it's not deeply integrated:

Shaping runs per-glyph-string (a run of same-face characters), not per-paragraph
No native ligature support across face boundaries
Composition is handled by a separate, older system (composite.c)
Font fallback is per-character, not per-cluster
No support for OpenType features (stylistic alternates, contextual forms, etc.) without manual configuration

What neomacs can do: cosmic-text + rustybuzz provide a modern text shaping pipeline:

Full OpenType shaping (ligatures, kerning, mark positioning)
Per-paragraph shaping (correct for Arabic/Devanagari)
Automatic font fallback per-cluster
All OpenType features accessible

6. Synchronous Process I/O

The problem: When a subprocess (LSP server, git, grep) produces output, Emacs reads it in the main thread via process-filter. This blocks everything:

User types key → command runs → spawns process → reads output →
  BLOCKS (waiting for data) → process filter runs → parses JSON →
  BLOCKS (parsing) → updates buffer → redisplay → finally responds to user

With LSP, this is catastrophic. A 100ms language server response means 100ms of frozen UI.

What neomacs can do:

Process I/O on dedicated threads
Non-blocking process filters that accumulate output
Parse JSON/protocol data on background thread
Only deliver parsed results to Lisp thread via message queue

7. Stop-the-World Garbage Collection

The problem: Emacs uses a mark-and-sweep GC that freezes everything:

GC trigger → mark all live objects (traverse entire heap) →
  sweep dead objects → compact → resume

Duration: 50-200ms for large sessions (thousands of buffers, overlays)

During GC, no input processing, no redisplay, no process I/O. Users experience visible stuttering.

What neomacs can do:

Incremental GC — mark a few objects per Lisp instruction, spread cost over time
Generational GC — young objects die fast, only scan nursery most of the time
Concurrent GC — mark phase runs on background thread
Arena allocation — temporary objects in per-command arenas, freed in bulk

8. Internal String Encoding (Not UTF-8)

The problem: Emacs uses its own multibyte encoding ("emacs-mule" / internal encoding), not UTF-8. Every interaction with the outside world requires conversion:

File on disk (UTF-8) → decode to Emacs internal → process → encode to UTF-8 → write
Subprocess output (UTF-8) → decode → process → encode → send

This is slow (constant encoding/decoding overhead), complex (FETCH_MULTIBYTE_CHAR is not a simple byte read), and incompatible (every Rust/C library expects UTF-8).

What neomacs can do: Use UTF-8 internally. Eliminates all encoding/decoding overhead and makes FFI with Rust trivial (Rust strings are UTF-8 natively).

9. Gap Buffer

The problem: Emacs uses a gap buffer — a contiguous array with a "gap" at the editing point:

Buffer: [H][e][l][l][o][___GAP___][w][o][r][l][d]
                        ↑ cursor
Insert 'X': just put it in the gap, O(1)
Move cursor far away: move the gap = O(gap_distance), memmove

Good for sequential editing at one point. Bad for:

Multiple cursors — need to move gap for each cursor, O(n) per cursor
Large files — gap move = memmove of potentially megabytes
Cache locality — gap fragments the text, cache misses when reading across gap
Concurrent access — can't read text while gap is moving (thread-unsafe)

What neomacs can do:

Piece table (used by VS Code) — O(log n) insert/delete, immutable original text
Rope (used by Xi editor, Zed) — O(log n) everything, cache-friendly, naturally supports multiple cursors and concurrent reads
Hybrid — keep gap buffer for compatibility, add parallel rope index for layout engine reads

10. Redisplay Blocks Lisp

The problem: redisplay_internal() runs synchronously. While redisplay computes layout, no Lisp can execute. For complex buffers (org-mode with many overlays, magit with thousands of diff lines), redisplay can take 50-100ms. The UI feels sluggish.

What neomacs can do: With the Rust display engine:

Layout the visible region first (above-the-fold rendering)
Send partial results to GPU immediately
Continue layout for offscreen regions asynchronously
Lisp can resume execution while offscreen layout completes

11. No SIMD / Vectorization

The problem: Text processing (searching, encoding, column counting, whitespace detection) is done byte-by-byte in scalar C code. Modern CPUs have SIMD instructions (AVX2, NEON) that process 32-64 bytes simultaneously.

Scalar (Emacs):     for each byte: if byte == '\n' count++    → ~1 byte/cycle
SIMD (possible):    load 32 bytes → compare all → popcount    → ~32 bytes/cycle

What neomacs can do: Rust has excellent SIMD support via memchr crate (10-30x faster byte search), SIMD-accelerated UTF-8 validation (simdutf), SIMD line counting and column calculation.

12. Lisp Object Memory Layout

The problem: Emacs Lisp objects are tagged pointers scattered across the heap. A cons cell, a string, and a symbol may be far apart in memory. Iterating over a list causes cache misses on every CDR:

Memory:
  [cons1] ... 4KB gap ... [cons2] ... 8KB gap ... [cons3]

  Each CDR → cache miss → ~100ns stall

Modern CPUs can do 10 billion operations/second but only 10 million cache misses/second. Memory layout matters more than instruction count.

What neomacs can do:

Arena-allocated lists — cons cells for the same list in contiguous memory
Struct-of-arrays for frequently iterated data (face attributes, glyph properties)
Small-string optimization — short strings inline in the Lisp_Object, no pointer chase
Bump allocation for temporary objects (per-command arena)

Redesign Priority Order

Priority	Area	Neomacs Status	Impact
1	Multi-threaded Lisp	Planned	Eliminates root cause of all UI freezing
2	GPU rendering	Done	120fps, shader effects, animations
3	Rust display engine	In progress (this doc)	Unified layout+render, TUI backend
4	Async process I/O	Planned	LSP/subprocess won't block
5	Concurrent/incremental GC	Planned	Eliminates GC pauses
6	UTF-8 internal encoding	Planned	Zero conversion overhead, trivial Rust FFI
7	SIMD text processing	Easy wins available	10-30x faster search/scan
8	Rope/piece table	Future	Multi-cursor, large files
9	Cache-friendly memory	Future	Requires Lisp runtime redesign

Rust Display Engine (Strategy 4)

Replace Emacs's C display engine (xdisp.c, dispnew.c, ~40k LOC) with a Rust layout engine that reads buffer data directly and produces GPU-ready glyph batches.

How the Official Emacs Display Engine Works

Understanding what we're replacing. The Emacs display engine is a two-phase system:

Phase 1: BUILD desired_matrix    (xdisp.c — "what should be on screen")
Phase 2: UPDATE current_matrix   (dispnew.c — "make the screen match")

Each window has TWO glyph matrices:

desired_matrix — what redisplay WANTS to show (rebuilt each cycle)
current_matrix — what's ACTUALLY on screen (updated incrementally)

The update phase diffs them and only redraws what changed — like a virtual DOM diff.

Entry Point: `redisplay_internal()` (xdisp.c:17360)

Called after every command, timer, process output, or explicit (redisplay):

redisplay_internal()
  ├─ Check if redisplay is needed (windows_or_buffers_changed?)
  ├─ Run pre-redisplay-function (Lisp hook)
  ├─ prepare_menu_bars()
  ├─ For each visible frame:
  │    └─ For each window in frame's window tree:
  │         └─ redisplay_window(w)
  ├─ Run post-redisplay hooks
  └─ Set windows_or_buffers_changed = 0

Per-Window: `redisplay_window()` (~2000 lines)

Decides what to display. Tries optimizations first, falls back to full layout:

redisplay_window(w)
  ├─ Check if window-start is still valid
  ├─ Try optimization: try_window_reusing_current_matrix()
  │    └─ If buffer unchanged, try to reuse existing rows (scroll optimization)
  ├─ Try optimization: try_window_id()
  │    └─ Find minimal diff between current and desired (incremental update)
  ├─ Fallback: try_window()
  │    └─ Full layout from window-start to bottom of window
  ├─ If point not visible → adjust window-start → retry (2-5 passes)
  ├─ Display mode-line, header-line, tab-line
  └─ Set w->window_end_pos, w->window_end_vpos, w->cursor

The Core: `display_line()` (~500 lines)

Generates ONE visual row (glyph row) from buffer content:

display_line(it)
  ├─ prepare_desired_row()  — clear row
  ├─ LOOP:
  │    ├─ get_next_display_element(it)  — fetch next thing to display
  │    ├─ PRODUCE_GLYPHS(it)           — compute metrics, append glyph
  │    ├─ Check: line full? (wrapping/truncation decision)
  │    ├─ Check: reached end of window?
  │    └─ set_iterator_to_next(it)     — advance position
  ├─ Handle end-of-line: padding, continuation/truncation glyphs
  └─ Compute row metrics: ascent, descent, height

The Display Iterator (`struct it`)

The central abstraction. A state machine with ~550 fields that walks through display content:

struct it {
    // Where we are
    struct display_pos current;     // Buffer/string position
    ptrdiff_t stop_charpos;         // Next position to check for property changes

    // What we're iterating over
    enum it_method method;          // GET_FROM_BUFFER, GET_FROM_STRING,
                                    // GET_FROM_IMAGE, GET_FROM_STRETCH, etc.

    // What we found
    enum display_element_type what; // IT_CHARACTER, IT_IMAGE, IT_COMPOSITION,
                                    // IT_STRETCH, IT_GLYPHLESS, etc.
    int c;                          // Character code (if IT_CHARACTER)
    int face_id;                    // Resolved face for current element

    // Metrics of current element
    int pixel_width;
    int ascent, descent;
    int current_x, current_y;

    // Nested context stack (for display properties in overlay strings, etc.)
    struct iterator_stack_entry stack[5];  // IT_STACK_SIZE = 5
    int sp;                               // Stack pointer

    // Bidi state
    struct bidi_it bidi_it;

    // Display state
    int hscroll;
    enum line_wrap_method line_wrap;  // TRUNCATE, WORD_WRAP, WINDOW_WRAP

    // ... ~500 more fields
};

Iterator Methods (9 types)

The iterator is polymorphic — method determines how to get the next element:

Method	Source	Used for
`GET_FROM_BUFFER`	Buffer text	Normal text editing
`GET_FROM_STRING`	Lisp string	Overlay strings, display strings
`GET_FROM_C_STRING`	C string	Mode-line format results
`GET_FROM_IMAGE`	Image spec	`(image ...)` display property
`GET_FROM_STRETCH`	Space spec	`(space ...)` display property
`GET_FROM_XWIDGET`	Xwidget	Embedded widgets
`GET_FROM_DISPLAY_VECTOR`	Display table	Character display overrides
`GET_FROM_VIDEO`	Video spec	`(video ...)` (neomacs extension)
`GET_FROM_WEBKIT`	WebKit spec	`(webkit ...)` (neomacs extension)

The Property Check Pipeline

At each stop_charpos, the iterator runs property handlers in sequence:

handle_stop(it)
  ├─ handle_fontified_prop()     — trigger jit-lock if text not fontified
  │    └─ Call fontification-functions (Lisp callback!)
  ├─ handle_face_prop()          — resolve face at position
  │    └─ face_at_buffer_position()
  │         ├─ Get text property 'face
  │         ├─ Get ALL overlay 'face properties (sorted by priority)
  │         └─ Merge into single realized face ID
  ├─ handle_display_prop()       — check for display overrides
  │    └─ handle_single_display_spec()  (634 lines, 14 spec types)
  │         ├─ String → push_it(), switch to GET_FROM_STRING
  │         ├─ Image → switch to GET_FROM_IMAGE
  │         ├─ Space → switch to GET_FROM_STRETCH
  │         ├─ Fringe bitmap → record for fringe drawing
  │         ├─ Margin spec → redirect to margin area
  │         └─ ... 14 types total, composable in lists/vectors
  ├─ handle_invisible_prop()     — skip invisible text
  │    └─ advance past invisible region, handle ellipsis
  └─ handle_overlay_change()     — load overlay strings
       └─ load_overlay_strings()
            ├─ Scan overlays at position (itree)
            ├─ Collect before-strings and after-strings
            ├─ Sort by priority
            └─ push_it(), switch to GET_FROM_STRING

The Iterator Stack (5 levels)

When a display property replaces text with a string, the iterator pushes its current state and switches to iterating the string. When the string is exhausted, it pops back:

Buffer text: "Hello [OVERLAY] world"
                     ↓
            overlay has before-string: ">>>"
            ">>>" has display property: (image :file "arrow.png")

Iterator state stack:
  Level 0: GET_FROM_BUFFER  (buffer text)
  Level 1: GET_FROM_STRING  (overlay before-string ">>>")
  Level 2: GET_FROM_IMAGE   (image in display property)

  Pop level 2 → back to ">>>" string
  Pop level 1 → back to buffer text at "world"

Maximum nesting: 5 levels (IT_STACK_SIZE).

Glyph Production

PRODUCE_GLYPHS(it) → gui_produce_glyphs(it) computes metrics and appends a glyph:

gui_produce_glyphs(it)
  ├─ IT_CHARACTER:
  │    ├─ Get font metrics (ascent, descent, width) from face->font
  │    ├─ Handle special cases: tab, newline, control chars
  │    └─ append_glyph() → add CHAR_GLYPH to row
  ├─ IT_COMPOSITION:
  │    ├─ Get composed glyph metrics from composition table
  │    └─ append_composite_glyph() → add COMPOSITE_GLYPH
  ├─ IT_IMAGE:
  │    ├─ Compute image dimensions (scaling, slicing)
  │    └─ append_glyph() → add IMAGE_GLYPH
  ├─ IT_STRETCH:
  │    ├─ Compute space width (calc_pixel_width_or_height)
  │    └─ append_stretch_glyph() → add STRETCH_GLYPH
  └─ IT_GLYPHLESS:
       ├─ Missing glyph → show as hex code or empty box
       └─ append_glyphless_glyph()

Glyph Structure

Each glyph in the matrix:

struct glyph {
    // What to display
    unsigned type : 3;          // CHAR_GLYPH, IMAGE_GLYPH, STRETCH_GLYPH, etc.
    union {
        unsigned ch;            // Character code (CHAR_GLYPH)
        struct { int id; } cmp; // Composition ID
        int img_id;             // Image ID
    } u;

    // How to display it
    unsigned face_id : 20;      // Index into face cache

    // Layout metrics
    short pixel_width;
    short ascent, descent;
    short voffset;              // Vertical offset (for 'raise' property)

    // Buffer position (for cursor, mouse, etc.)
    ptrdiff_t charpos;
    Lisp_Object object;         // Source: buffer, string, or nil

    // Bidi
    unsigned resolved_level : 7;
    unsigned bidi_type : 3;

    // Flags
    bool padding_p;
    bool left_box_line_p, right_box_line_p;
    bool overlaps_vertically_p;
};

Glyph Row

One visual line:

struct glyph_row {
    struct glyph *glyphs[3];    // LEFT_MARGIN_AREA, TEXT_AREA, RIGHT_MARGIN_AREA
    short used[3];              // Glyph count per area

    int x, y;                   // Position in window
    int pixel_width;
    int ascent, descent, height;

    // Buffer range this row displays
    struct text_pos start, end;
    struct text_pos minpos, maxpos;  // Bidi-aware min/max

    // Flags
    bool enabled_p;
    bool displays_text_p;
    bool mode_line_p, header_line_p, tab_line_p;
    bool truncated_on_left_p, truncated_on_right_p;
    bool reversed_p;            // RTL paragraph

    // Fringe
    int left_fringe_bitmap, right_fringe_bitmap;
};

Phase 2: Updating the Screen (dispnew.c)

After Phase 1 fills desired_matrix, Phase 2 makes the screen match:

update_window(w)
  ├─ gui_update_window_begin()      — save cursor, setup
  ├─ Optimization: scrolling_window()
  │    └─ Try to reuse rows by scrolling (copy rows up/down)
  ├─ For each row in desired_matrix:
  │    ├─ Compare with current_matrix row
  │    ├─ If different: update_window_line()
  │    │    ├─ update_text_area()   — draw changed glyphs
  │    │    │    └─ Calls rif->write_glyphs() → backend draws
  │    │    └─ Update marginal areas
  │    └─ If same: skip (no redraw needed)
  ├─ Copy desired_matrix → current_matrix
  ├─ set_window_cursor_after_update()
  ├─ gui_update_window_end()
  │    ├─ Draw cursor via rif->draw_window_cursor()
  │    └─ Draw borders via rif->draw_vertical_window_border()
  └─ rif->flush_display()          — commit to screen

The Backend Interface (`struct redisplay_interface`)

The display engine talks to backends (X11, GTK, neomacs, etc.) through function pointers:

struct redisplay_interface {
    void (*produce_glyphs)(struct it *);
    void (*write_glyphs)(struct window *, struct glyph_row *, ...);
    void (*update_window_begin_hook)(struct window *);
    void (*update_window_end_hook)(struct window *, ...);
    void (*draw_glyph_string)(struct glyph_string *);
    void (*draw_window_cursor)(struct window *, struct glyph_row *, ...);
    void (*draw_vertical_window_border)(struct window *, int, int, int);
    void (*draw_fringe_bitmap)(struct window *, struct glyph_row *, ...);
    void (*clear_frame_area)(struct frame *, int, int, int, int);
    void (*scroll_run_hook)(struct window *, struct run *);
    void (*flush_display)(struct frame *);
};

In neomacs, this is neomacs_redisplay_interface (neomacsterm.c:160). Neomacs ignores most of these hooks and instead walks current_matrix directly in neomacs_extract_full_frame().

Face Resolution Pipeline

Text at position P with overlays O1 (priority=10), O2 (priority=20):

1. Start with DEFAULT_FACE_ID
2. Merge text property 'face at P       → attrs[]
3. Merge O1's 'face (lower priority)    → attrs[]
4. Merge O2's 'face (higher priority)   → attrs[]
5. Lookup/create realized face from attrs[] → face_id

face_at_buffer_position(w, P, ...)
  → Returns single face_id with all attributes merged

Face attributes (from struct face):

foreground, background (pixel colors)
font (struct font — metrics, shaping)
underline (style: single/wave/double/dots/dashes, color)
overline, strike_through (bool + color)
box (type: none/line/3D, width, corner_radius, color)
bold, italic (via font selection)

Window-Start Computation

The hardest part of redisplay — deciding which part of the buffer to show:

redisplay_window(w)
  ├─ Is current w->start still good?
  │    └─ try_window(w->start)
  │         ├─ Layout from w->start to bottom
  │         ├─ Is point visible? → YES: done
  │         └─ Is point in scroll margin? → return -1 (need adjustment)
  │
  ├─ Point not visible → compute new window-start:
  │    ├─ scroll_conservatively > 0?
  │    │    └─ Try small adjustments to show point
  │    ├─ scroll_step > 0?
  │    │    └─ Scroll by scroll_step lines
  │    └─ Fallback: center point in window
  │         └─ move_it_vertically_backward() to find start
  │
  └─ Set w->start = new_start, retry layout

This can take 2-5 layout passes per frame.

Mode-Line Rendering

display_mode_line(w)
  ├─ format_mode_line(mode_line_format)  — evaluate Lisp format
  │    └─ Recursive: handles %b (buffer name), %l (line), %c (column),
  │       conditionals, Lisp expressions, etc.
  │    └─ Returns: Lisp string with text properties (faces)
  ├─ init_iterator() for mode-line row
  ├─ display_string() — layout the formatted string
  └─ Set w->mode_line_height from row height

Key Design Principles of the Official Engine

Incremental update: Only redraw what changed (desired vs current matrix diff).
Lazy computation: Don't compute until needed. jit-lock fontifies on demand. window-end only computed when queried.
Property-change-driven scanning: The iterator jumps between stop_charpos positions where properties change, skipping unchanged regions.
Stack-based nesting: Display properties, overlay strings, and display strings nest via a 5-level stack, avoiding recursion.
Backend abstraction: Drawing is separated from layout via redisplay_interface function pointers.
Persistent pixel buffers: Official backends (X11, GTK) draw once and persist. Only changed regions are redrawn. (Neomacs differs — it clears and rebuilds every frame.)

Why It's Complex (~40k LOC)

xdisp.c alone is 39,660 lines because it handles every edge case: bidi, 14 display spec types, 5-level nesting, overlay string interleaving, incremental matrix optimization, scroll heuristics, 3-tier window-start computation, mode-line format evaluation, fringe bitmaps, margin areas, tab/header lines, minibuffer resize, mouse highlighting, and glyphless character rendering.
Much of this complexity is optimization (reusing matrices, scrolling rows, diffing) that neomacs doesn't need since GPU clears and rebuilds every frame.
Another chunk is backend-specific code (X11/GTK drawing) that a Rust engine replaces entirely.

Current Architecture

Emacs redisplay (C, xdisp.c ~30k LOC)
  → glyph matrices (current_matrix)
    → neomacsterm.c extracts glyphs (FFI boundary)
      → FrameGlyphBuffer sent to Rust via crossbeam
        → wgpu renders

Pain Points

Double work: Emacs builds a full CPU-side glyph matrix, then we serialize it again into FrameGlyphBuffer.
FFI friction: Every new feature (cursors, borders, images, animations) requires C-side extraction code + Rust-side handling.
CPU-centric design: Emacs redisplay was designed for XDrawString / terminal cells, not GPU batched rendering.
No GPU awareness: Layout doesn't know about GPU capabilities (instancing, atlas caching, compute shaders).

Proposed Architecture

IMPORTANT: Layout must run on the Emacs thread, not the render thread. See Critical Design Constraint: Synchronization for why.

Emacs buffer/overlay/face data
  → Rust layout engine (called on Emacs thread during redisplay)
    → LayoutOutput (backend-agnostic glyph positions)
      → sent to render thread via crossbeam
        → wgpu renders (GPU) or TuiRenderer (terminal)

Layout results (window-end, cursor position, visibility) are also fed back to Emacs so that Lisp query functions (pos-visible-in-window-p, window-end, etc.) continue to work.

Critical Design Constraint: Synchronization

Many Emacs Lisp functions require synchronous access to layout results. These aren't obscure — they're called on every keystroke by popular packages:

Function	Who calls it	What it does
`pos-visible-in-window-p`	posframe, company, corfu, vertico	Runs the display iterator from scratch
`window-end`	helm, ivy, magit, every scroll package	Runs `start_display()` + `move_it_vertically()`
`vertical-motion`	`forward-line`, `recenter`, all movement	Runs the full iterator with wrapping
`posn-at-point` / `posn-at-x-y`	Mouse tracking, popup positioning, tooltips	Queries glyph matrix
`move-to-column`	Indentation, `kill-rectangle`, column movement	Scans line with tab/display prop handling
`compute-motion`	Low-level motion, `goto-line`	Full layout computation

Additionally, fontification runs DURING layout. jit-lock calls fontification-functions as the iterator advances through unfontified text. This is Lisp code that must execute on the Emacs thread. Layout cannot be purely on the render thread — it calls back into Lisp.

Consequence: The Rust layout engine must run on the Emacs thread (called during redisplay_internal()). Layout results are then sent to the render thread for rendering. Parallel layout (layout on render thread while Emacs executes Lisp) is a future optimization (Phase 8+), not a starting point.

Synchronous Lisp Function Categories

Category A — Read cached state (safe to defer): window-start, window-hscroll, window-vscroll, window-body-width, window-body-height, window-text-width, window-text-height, coordinates-in-window-p

Category B — Read current matrix (needs up-to-date layout): window-line-height, window-lines-pixel-dimensions, posn-at-x-y

Category C — Trigger layout computation (CRITICAL — blocks Lisp): window-end(update=t), pos-visible-in-window-p, vertical-motion, move-to-column, compute-motion, move-to-window-line, recenter

Category D — Modify state and trigger redisplay: set-window-start, set-window-hscroll, set-window-vscroll, scroll-up/scroll-down, scroll-left/scroll-right

Category E — Hooks that expect synchronous state: pre-redisplay-function (runs INSIDE redisplay_internal()), fontification-functions (runs DURING iterator scanning), window-scroll-functions, post-command-hook

Display Engine API Contract

The display engine sits between the Emacs C core and the rendering backend. Understanding exactly what flows in each direction is critical — the Rust replacement must satisfy this contract.

            EMACS C CORE PROVIDES                    DISPLAY ENGINE PROVIDES BACK
            ═══════════════════                      ═══════════════════════════

  Buffer text (gap buffer) ──────┐         ┌──── w->window_end_pos/vpos/valid
  Text properties (face,         │         │     w->cursor (hpos, vpos, x, y)
    display, invisible,          │         │     w->phys_cursor (+ width/height)
    fontified, composition) ─────┤         │     w->mode_line_height
  Overlays (face, before/after   │         │       /header_line_height
    string, invisible,           │    ┌────┴──┐    /tab_line_height
    priority, window) ───────────┼───►│DISPLAY│───►w->current_matrix (glyph rows)
  Face cache (colors, fonts,     │    │ENGINE │   windows_or_buffers_changed
    decorations) ────────────────┤    └────┬──┘   f->cursor_type_changed
  Window state (start, hscroll,  │         │
    pointm, dimensions) ─────────┤         ├──── pos-visible-in-window-p()
  Buffer-local vars (truncate,   │         │     window-end()
    word-wrap, tab-width,        │         │     vertical-motion()
    bidi, line-spacing) ─────────┤         │     move-to-column()
  Global vars (scroll-margin,    │         │     compute-motion()
    scroll-step, display-        │         │     posn-at-point/x-y()
    line-numbers) ───────────────┘         │     current-column()
                                           │     line-pixel-height()
  fontification-functions ◄────────────────┘     format-mode-line()
  (callback DURING layout)                       (Lisp query functions)

Inputs: What Emacs C Core Provides to the Display Engine

Buffer Data

Input	How accessed	Purpose
Buffer text	`BYTE_POS_ADDR`, `FETCH_MULTIBYTE_CHAR`	Raw characters to display
Narrowing bounds	`BEGV`, `ZV`	Visible region of buffer
Point position	`PT`	Where cursor goes
Multibyte flag	`BVAR(buf, enable_multibyte_characters)`	Character encoding mode

Text Properties (at each position)

Property	Purpose
`face`	Text styling (colors, bold, italic, underline, etc.)
`display`	Override rendering (images, strings, spaces, margins, fringes)
`invisible`	Hide text or show ellipsis
`fontified`	Triggers lazy fontification via `fontification-functions`
`composition`	Ligatures, combining characters, emoji sequences
`mouse-face`	Hover highlighting
`line-height`	Override line height
`raise`	Vertical offset

Overlays (at each position)

Property	Purpose
`face`	Overlay face (merged with text property face, priority-ordered)
`display`	Display overrides
`before-string` / `after-string`	Inserted strings with their own properties
`invisible`	Hide overlay region
`priority`	Stacking order for face merging and string ordering
`window`	Per-window overlay filtering

Face Data

Input	Purpose
`face_at_buffer_position()`	Merged face at position (text prop + all overlays)
`FACE_FROM_ID(f, id)`	Resolve face ID to colors, font, decorations
`face_for_char()`	Font fallback for specific character (fontset system)
`merge_faces()`	Combine control-char face with base face

Window State

Field	Purpose
`w->start`	First visible buffer position (marker)
`w->pointm`	Window's copy of point (for non-selected windows)
`w->hscroll`, `w->min_hscroll`	Horizontal scroll offset
`WINDOW_PIXEL_WIDTH/HEIGHT(w)`	Window dimensions in pixels
`w->contents`	Which buffer is displayed
Display table	`window_display_table(w)` — character display overrides

Buffer-Local Variables

Variable	Purpose
`truncate-lines`	Truncate long lines vs wrap
`word-wrap`	Word wrapping mode
`tab-width`	Tab character width in columns
`line-spacing` / `extra-line-spacing`	Extra vertical space between lines
`selective-display`	Hide lines by indentation level
`ctl-arrow`	Display control chars as `^X` vs `\NNN`
`bidi-display-reordering`	Enable bidirectional text reordering
`bidi-paragraph-direction`	Force paragraph direction (left-to-right / right-to-left)

Global Variables

Variable	Purpose
`scroll-margin`	Lines to keep visible around point
`scroll-conservatively`	How aggressively to scroll
`scroll-step`	Lines to scroll at a time
`hscroll-margin` / `hscroll-step`	Horizontal scroll parameters
`truncate-partial-width-windows`	Truncate in narrow windows
`display-line-numbers`	Line number display mode (absolute/relative/visual)
`display-line-numbers-width`	Width of line number column
`nobreak-char-display`	Display non-breaking spaces/hyphens
`auto-composition-mode`	Enable automatic character composition
`maximum-scroll-margin`	Cap on scroll margin

Outputs: What the Display Engine Must Provide Back

This is the contract the Rust replacement must satisfy.

Window Fields (set during redisplay, read by Emacs)

Field	Type	Who reads it	Purpose
`w->window_end_pos`	`ptrdiff_t`	`window-end` (window.c)	Buffer position of last visible char (as `Z - end_charpos`)
`w->window_end_vpos`	`int`	window.c	Matrix row number of last visible char
`w->window_end_valid`	`bool`	`window-end`, `window-line-height`	Guard: are end fields valid? Many functions check this first.
`w->cursor`	`{x, y, hpos, vpos}`	Cursor drawing, movement commands	Intended cursor position in matrix coordinates
`w->phys_cursor`	`{x, y, hpos, vpos}`	neomacsterm.c, window.c	Actual cursor pixel position (window-relative)
`w->phys_cursor_type`	`enum`	window.c	Current cursor style: box(0), bar(1), hbar(2), hollow(3)
`w->phys_cursor_width`	`int`	neomacsterm.c, window.c	Cursor width in pixels
`w->phys_cursor_height`	`int`	neomacsterm.c, window.c	Cursor height in pixels
`w->phys_cursor_ascent`	`int`	window.c	Cursor ascent in pixels
`w->phys_cursor_on_p`	`bool`	window.c, dispnew.c	Is cursor currently being displayed?
`w->mode_line_height`	`int`	`window-mode-line-height`, layout	Mode-line pixel height (-1 if unknown)
`w->header_line_height`	`int`	`window-header-line-height`	Header-line pixel height (-1 if unknown)
`w->tab_line_height`	`int`	`window-tab-line-height`	Tab-line pixel height (-1 if unknown)
`w->last_cursor_vpos`	`int`	xdisp.c	Previous frame's cursor vpos (detects cursor movement)

Frame Fields

Field	Purpose
`f->cursor_type_changed`	Signals cursor needs redraw (set true on change, cleared after draw)
`f->garbaged`	Cleared after full redraw
`f->updated_p`	Frame was updated this redisplay cycle

Global Variables

Variable	Purpose
`windows_or_buffers_changed`	Dirty flag: 0 = all cached display data is fresh. Non-zero = needs redisplay.
`update_mode_lines`	Mode-line dirty flag
`redisplaying_p`	Re-entrancy guard: true while redisplay is running

Current Matrix (`w->current_matrix`)

The glyph matrix is read by code OUTSIDE the display engine:

Consumer	What it reads	Why
`window-line-height` (window.c)	`MATRIX_ROW()`, row height/ascent	Return line dimensions to Lisp
`window-cursor-info` (window.c)	Row at `phys_cursor.vpos` → glyph	Get glyph under cursor for width/height
`pos-visible-in-window-p` (window.c)	Calls `pos_visible_p()` which runs iterator	Check if position is visible
neomacsterm.c	Entire matrix: all rows, all glyphs, all areas	Extract for GPU rendering

Layout Query Functions (must be implemented by the display engine)

Lisp Function	What it computes	How it works
`pos-visible-in-window-p`	Is buffer position visible?	Runs display iterator from `w->start`
`window-end`	Last visible buffer position	Reads `window_end_pos` or recomputes via iterator
`vertical-motion`	Move N visual lines	Runs iterator with wrapping/truncation
`move-to-column`	Move to column N	Scans line handling tabs, display props, wide chars
`compute-motion`	Position after hypothetical motion	Full layout computation (7 parameters)
`posn-at-point`	Pixel position of point	Calls `pos-visible-in-window-p` internally
`posn-at-x-y`	Buffer position at pixel coords	Queries glyph matrix rows and glyphs
`current-column`	Column number at point	Scans from line start
`line-pixel-height`	Pixel height of current line	Runs iterator for one line
`format-mode-line`	Rendered mode-line text	Evaluates Lisp format specs, returns text with properties

Callbacks During Layout

Callback	When	Direction
`fontification-functions`	Iterator reaches unfontified text	Display engine → Lisp (pauses layout, calls Lisp, resumes)
`pre-redisplay-function`	Start of `redisplay_internal()`	Display engine → Lisp
`window-scroll-functions`	After window scroll	Display engine → Lisp
`redisplay_interface` hooks	After layout, during drawing	Display engine → rendering backend

What We Keep vs Replace

Keep (Emacs C/Lisp)

Buffer management (gap buffer, undo, markers)
Text properties and overlays (Lisp-level API)
Window tree management (splits, sizing)
Face definitions and merging (Lisp-level defface)
Fontset/font selection logic
Mode-line format evaluation (format-mode-line)

Replace (Rust)

The display iterator (struct it — 550+ fields state machine in xdisp.c)
Line layout (display_line() — wrapping, truncation, alignment)
Glyph production (produce_glyphs() — metrics, positioning)
Matrix management (dispnew.c — desired/current diff)
The extraction layer (neomacsterm.c glyph walking)

Reading Lisp Data from Rust

Rust needs access to:

Data	Where it lives	Access pattern
Buffer text	Gap buffer (`BUF_BYTE_ADDRESS`)	Contiguous reads around gap
Text properties	Interval tree on buffer	Walk intervals for face/display/invisible
Overlays	Sorted linked lists on buffer	Scan overlays at each position
Faces	`face_cache->faces_by_id[]` on frame	Lookup by ID, merge multiple
Window geometry	`struct window` fields	Read pixel_left/top/width/height
Window-start	Marker on window	Read marker position
Font metrics	`struct font` on face	Ascent, descent, average width

Approach A: Snapshot (recommended, start here)

At update_end, C serializes a layout snapshot to Rust:

struct LayoutSnapshot {
    windows: Vec<WindowSnapshot>,
}

struct WindowSnapshot {
    id: i64,
    bounds: Rect,
    buffer: BufferSnapshot,
    window_start: usize,
    hscroll: i32,
    selected: bool,
}

struct BufferSnapshot {
    text: Vec<u8>,                      // Full buffer text (no gap)
    intervals: Vec<PropertyInterval>,   // Text property spans
    overlays: Vec<OverlaySpan>,         // Active overlays
}

struct PropertyInterval {
    start: usize,
    end: usize,
    face_id: Option<u32>,
    display: Option<DisplaySpec>,
    invisible: bool,
}

Pros: Clean Rust ownership, no unsafe, thread-safe. Cons: Copies all buffer text every frame (~0.1ms for 1MB buffer).

Approach B: Shared-memory / FFI read (optimize later)

Rust reads Emacs data structures directly via unsafe FFI pointers:

unsafe fn read_buffer_char(buf: *const EmacsBuffer, pos: usize) -> char {
    // Handle gap buffer directly
}

unsafe fn get_face(frame: *const EmacsFrame, face_id: u32) -> &Face {
    // Read from face_cache->faces_by_id[face_id]
}

Pros: Zero-copy, instant access. Cons: Extremely unsafe, must synchronize with Emacs thread, Emacs struct layout changes break everything.

Recommendation: Start with Approach A, optimize to B later for large buffers. The snapshot cost is negligible for typical buffer sizes (<100KB).

Phased Implementation

Phase -1: Direct Glyph Hook (Low-Risk Stepping Stone)

Before the full rewrite, eliminate the matrix intermediary without changing the layout engine. Keep xdisp.c but hook into produce_glyphs / append_glyph so it calls into Rust directly as glyphs are generated, instead of extracting from current_matrix after the fact.

CURRENT:   xdisp.c → glyph matrix → neomacsterm.c extracts → FFI → Rust renders
PHASE -1:  xdisp.c → calls Rust directly during produce_glyphs → Rust renders

Benefits:

Keeps 100% compatibility (same layout algorithm, all Lisp functions work)
Eliminates the matrix → extraction → FFI overhead
Still gets Rust rendering, cosmic-text, GPU batching
Doesn't break any packages
Much lower risk (~2000 LOC changes)
Proves the architecture before committing to the full rewrite

Does not enable: Pixel-level scrolling, parallel layout, TUI backend (those require the full rewrite).

Scope: ~2000 LOC. Difficulty: Medium. Risk: Low.

Phase 0: Layout Snapshot Infrastructure

Add C function neomacs_build_layout_snapshot() that serializes buffer text + property intervals + overlays into a flat buffer. Send snapshot to Rust alongside (or replacing) FrameGlyphBuffer. Rust ignores snapshot initially, still uses old glyph path.

Note: Display property serialization is complex — 14 display spec types with composability (lists/vectors of specs), overlay strings with their own properties, 5-level nesting.

Scope: ~1000 C, ~500 Rust. Difficulty: Medium-Hard.

Phase 1: Monospace ASCII Layout Engine

Build RustLayoutEngine that handles the simplest case:

Fixed-width font, single face, no overlays, no display properties
Line breaking at window width (or truncate-lines)
Cursor positioning
Window-start / point tracking
Tab stops (tab-width, tab-stop-list)
Control characters (^X = 2 columns, \NNN = 4 columns)
Wide characters (CJK = 2 columns)
Continuation glyphs (line wrapping indicators)

This alone covers ~70% of what you see in a typical coding buffer.

Must also implement layout result feedback to Emacs: window-end, cursor (x,y), and visibility info so that pos-visible-in-window-p, vertical-motion, move-to-column etc. return correct results.

Scope: ~2500 Rust. Difficulty: Medium.

Phase 1.5: TUI Renderer

See TUI Rendering Backend section.

Scope: ~1200 Rust. Difficulty: Medium.

Phase 2: Face Resolution

Read face intervals from snapshot
Apply face attributes (fg, bg, bold, italic, underline)
Handle face merging (text property face + all overlay faces, priority-ordered)
Face realization: compute pixel values, select font from fontset
Use existing cosmic-text for font selection per face
Unlimited overlays can contribute faces at one position (face_at_buffer_position merges all)

Scope: ~1500 Rust. Difficulty: Medium-Hard.

Phase 3: Display Properties

This is the most complex phase. Emacs has 14 display spec types with composability:

Display spec types:

(when FORM . VALUE) — conditional display
(height HEIGHT) — font height adjustment
(space-width WIDTH) — width scaling
(min-width (WIDTH)) — minimum width padding
(slice X Y WIDTH HEIGHT) — image cropping
(raise FACTOR) — vertical offset (fraction of line height)
(left-fringe BITMAP [FACE]) — left fringe bitmap
(right-fringe BITMAP [FACE]) — right fringe bitmap
((margin LOCATION) SPEC) — margin display (left/right/nil)
(space ...) — space/stretch specs (:width, :align-to, :relative-width, :height)
(image ...) — image display
(video ...) — video display (neomacs extension)
(webkit ...) — WebKit display (neomacs extension)
string — replacement string with its own properties

Composability: Specs can be nested in lists/vectors. The display iterator uses a 5-level push/pop stack (IT_STACK_SIZE = 5) to handle nested contexts: display strings inside overlay strings inside display properties.

Overlay strings (before-string / after-string):

Priority-based sorting (complex ordering: after-strings before before-strings, decreasing/increasing priority)
Chunked processing (16 at a time via OVERLAY_STRING_CHUNK_SIZE)
Overlay strings can have their own text properties (including face and display properties) — creating recursive layout
Window-specific overlay filtering

Additional features in this phase:

invisible property — skip text ranges (with ellipsis option)
line-prefix / wrap-prefix — continuation line indentation
Fringe bitmaps — 25 standard types + custom (arrows, brackets, indicators)
Margin areas — display-line-numbers-mode, margin display specs
format-mode-line result integration (text with properties from Lisp)

Scope: ~5000 Rust. Difficulty: Very hard.

Phase 4: Mode-line, Header-line & Tab-line

Evaluate mode-line format (stays in Lisp via format-mode-line)
C sends pre-formatted mode-line/header-line/tab-line strings + faces to Rust
Rust lays out each special row
Handle format-mode-line which evaluates arbitrary Lisp (conditionals, %e, %@, etc.)

Scope: ~1000 Rust. Difficulty: Medium.

Phase 5: Variable-width & Compositions

Variable-width font support (cosmic-text already handles this)
Emoji/composition support (already working in current system)
Ligatures (cosmic-text + rustybuzz)
Font fallback chains, metric computation

Scope: ~1200 Rust. Difficulty: Medium.

Phase 6: Bidi

Integrate unicode-bidi crate for reordering
Handle mixed LTR/RTL paragraphs
Integration with ALL of the above: line breaking, cursor movement, overlay strings, display properties
Every other phase's complexity increases with bidi
This is the hardest single feature

Scope: ~3000 Rust. Difficulty: Very hard.

Phase 7: Images & Media

Inline images (already rendered by wgpu, just need layout positioning)
Video/WebKit (same — just need position from layout)

Scope: ~400 Rust. Difficulty: Easy.

Phase 8+: Parallel Layout (Future)

Once Phases 0-7 are complete and stable:

Move layout to render thread for parallel execution
Implement synchronous RPC for Category C Lisp functions
Cache layout results for fast Category B queries
Enable pixel-level smooth scrolling with async window-start feedback

This phase depends on the entire layout engine being correct and stable first.

Summary

Phase	LOC (Rust)	Difficulty	Enables
-1: Direct glyph hook	~2000	Medium	Proves architecture, low risk
0: Snapshot infra	~1000 C, ~500 Rust	Medium-Hard	Foundation
1: Monospace ASCII	~2500 Rust	Medium	Basic editing
1.5: TUI renderer	~1200 Rust	Medium	Terminal Emacs
2: Faces	~1500 Rust	Medium-Hard	Syntax highlighting
3: Display props	~5000 Rust	Very hard	Packages (company, which-key, org)
4: Mode-line	~1000 Rust	Medium	Status display
5: Variable-width	~1200 Rust	Medium	Proportional fonts, ligatures
6: Bidi	~3000 Rust	Very hard	International text
7: Images/media	~400 Rust	Easy	Already working

Total: ~17-18k LOC Rust replacing ~40k LOC C. The reduction comes from:

No terminal backend code needed
No X11/GTK drawing code needed
No incremental matrix diffing (GPU redraws everything)
Modern Rust text crates handle Unicode/shaping complexity

Previously Missing Components (Now Included)

These were absent from the original design and are now incorporated into the phases above:

Component	Phase	Notes
Fringe bitmaps (25 standard + custom)	3	Arrows, brackets, debugging indicators
Margin areas / `display-line-numbers-mode`	3	Very commonly used
`vertical-motion` / `move-to-column` feedback	1	Cursor movement depends on layout
`window-end` feedback to Emacs	1	Many packages query this
jit-lock / fontification integration	1	Rust layout must trigger Lisp fontification
Minibuffer resize (`resize-mini-windows`)	3	Dynamic height based on content
Tab-line	4	Window tab bar
Iterative window-start computation (2-5 passes)	1	Scroll policies: `scroll-margin`, `scroll-step`, etc.
Child frames	3	posframe, etc. — independent layout per frame

What This Unlocks

Immediate benefits (Phases 0-7):

Unified Rust codebase — layout and rendering in same language. No more C extraction layer, no FFI serialization per frame.
Memory safety — no more buffer overflows in display code. Emacs has had CVEs in xdisp.c.
Modern text stack — cosmic-text + rustybuzz = proper ligatures, OpenType features, font fallback.
Simpler architecture — no glyph matrix intermediary. Layout produces render-ready data directly.
TUI backend for free — same layout engine outputs to terminal.
Testability — Rust unit tests for layout logic. Currently impossible to test xdisp.c in isolation.
Incremental layout — Rust can diff against previous layout and only re-layout changed regions.
Ligatures everywhere — cosmic-text + rustybuzz handle OpenType ligatures natively.

Future benefits (Phase 8+):

Pixel-level smooth scrolling — sub-line viewport with async position feedback to Emacs.
Sub-frame cursor movement — cursor position computed in render thread, animated instantly.
Parallel layout — layout on render thread while Emacs executes Lisp.
Custom rendering effects — animated text insertion, per-character fade-in, elastic overscroll.

Difficulty Analysis

What's Actually Easy

Monospace ASCII without properties — count chars, break at width, position on grid. This is what Alacritty does.
Basic face application — once positioned, applying fg/bg/bold/italic is straightforward table lookup.
Cursor rendering — already done in current Rust renderer.
Images/video/webkit — already rendered by wgpu. Layout just reserves space.
TUI output — quantize pixel coordinates to cell grid, diff, emit ANSI. Well-understood.
Glyph rasterization — cosmic-text handles this well.

What's Hard (Ranked)

1. Display properties + overlay strings (Phase 3)

The single hardest phase. 14 display spec types with composability, 5-level nesting via push/pop stack, recursive overlay string properties. This is where most Emacs redisplay bugs live. handle_single_display_spec alone is 634 lines of C with deeply nested conditionals. The overlay string machinery adds another ~400 lines. All interactions between these systems must be faithfully replicated.

2. Synchronization with Emacs Lisp

Layout results must flow back to Emacs for pos-visible-in-window-p, window-end, vertical-motion, etc. These are called synchronously from Lisp and must return correct results immediately. The Rust layout engine must maintain queryable state that matches what these functions expect.

3. Iterative window-start computation

Emacs does 2-5 layout passes per frame to find the correct window-start:

Try displaying from current start → check if point is visible
If not, adjust window-start → retry
Handle scroll-margin, scroll-conservatively, scroll-step
Three optimization tiers: try_window_reusing_current_matrix → try_window_id → try_window

This iterative process is deeply intertwined with layout and must work correctly.

4. jit-lock / fontification integration

Fontification runs DURING layout iteration, calling into Lisp. The Rust layout engine must pause layout, call into C/Lisp for fontification via FFI callback, then resume with the newly-applied faces. This is architecturally messy.

5. Face realization + merging

Not just reading a face_id. Involves: merge text property face + ALL overlay faces (unlimited, priority-ordered), then realize (compute pixel values, select font from fontset). face_at_buffer_position is ~130 lines, but font selection from the fontset system adds significant complexity.

6. Bidi (Phase 6)

Not just "use unicode-bidi crate." The crate handles the algorithm, but integrating bidi with line breaking, cursor movement, overlay strings, and display properties creates multiplicative complexity. Every other phase gets harder with bidi.

What's Medium

Margin areas / line numbers — display-line-numbers-mode is margin-based, needs its own layout per row.
Fringe bitmaps — 25 standard types + custom. Need bitmap rendering pipeline.
Mode-line — format-mode-line evaluates arbitrary Lisp. Serialization of results is the hard part.
Minibuffer resize — full buffer scan for height, dynamic window resizing during redisplay.

Pros vs Cons

Pros

Unified Rust codebase — layout and rendering in same language. No C extraction layer, no FFI serialization per frame.
Memory safety — no buffer overflows in display code.
Modern text stack — cosmic-text, rustybuzz for ligatures, proper Unicode support.
Simpler architecture — no glyph matrix intermediary. Layout produces render-ready data.
TUI backend — same layout engine for terminal rendering. Currently impossible.
Testability — Rust unit tests for layout. Currently impossible to test xdisp.c in isolation.
Future GPU optimizations — layout aware of GPU capabilities.
Incremental layout — diff against previous layout, only re-layout changed regions.

Cons

Layout must run on Emacs thread — the "parallel layout" benefit is deferred to Phase 8+. Many Lisp functions need synchronous layout access. This limits the performance gain vs current architecture.
Scope is ~2x original estimate — ~17k LOC Rust, not ~8-10k. Display properties and missing components account for the increase.
Compatibility risk — every Emacs package is a test case. Subtle layout differences = visual bugs. Packages like posframe, company, corfu, org-mode, magit depend on exact redisplay behavior.
Display property complexity — 14 types with composability, 5-level nesting, recursive overlay string properties. The long tail of edge cases will take months.
jit-lock callback — Rust calling back into Lisp during layout is architecturally messy. FFI boundary goes both directions.
Regression risk — xdisp.c is battle-tested over 40 years. A rewrite will have bugs the original doesn't.
Two-way data flow — layout results must flow back to Emacs for Lisp functions. Not just Emacs→Rust, also Rust→Emacs.
Parallel development burden — must maintain both old and new display engines during multi-phase transition.
Iterative window-start — the 2-5 pass window-start computation is deeply tied to layout and scroll policies. Getting this wrong breaks scrolling.

Compatibility Risk

The biggest risk is Emacs packages that depend on redisplay behavior:

posframe — creates child frames at specific pixel positions via posn-at-point
company-mode / corfu — popup overlays positioned relative to point via pos-visible-in-window-p
which-key — positioned popups
org-mode — heavy use of display properties, invisible text, overlays, line-prefix
magit — thousands of overlays for diff coloring, section folding via invisible
helm / ivy / vertico — rapid overlay creation/deletion, face changes on every keystroke
lsp-mode / eglot — diagnostic overlays with after-string, inline hints

Functions that MUST return correct results:

pos-visible-in-window-p — "is buffer position visible?"
posn-at-point / posn-at-x-y — pixel-to-position conversion
window-end — last visible buffer position
vertical-motion — move by visual lines
move-to-column — move to specific column
compute-motion — full motion computation
current-column — get current column

Mitigation: Phase -1 (direct glyph hook) proves the rendering architecture works without any compatibility risk. Full layout rewrite proceeds incrementally with continuous testing against real packages.

Alternative Considered: GPU Compute Layout (Strategy 5)

Rejected. GPU compute shaders for text layout would push line breaking, glyph positioning, and wrapping to the GPU. However:

Line breaking is inherently sequential — can't know where line N+1 starts until line N finishes. GPU parallelism doesn't help.
Bidi is impossible on GPU — the Unicode Bidi Algorithm has deeply sequential state (embedding levels, bracket matching). Not expressible in WGSL.
Text shaping must stay on CPU — HarfBuzz/rustybuzz needs CPU access to font tables for ligatures, kerning, mark attachment.
Branching kills GPU perf — overlays, display properties, invisible text, variable-width fonts all require per-glyph branching. GPUs hate divergent branches.
No ecosystem — no existing references for GPU text layout in editors.
The bottleneck doesn't exist — a typical frame has ~3000-8000 visible glyphs. CPU layout for that is <1ms.

Strategy 4 (CPU Rust layout + GPU render) gives 95% of the performance benefit with 10% of the complexity. Every modern editor (VS Code, Zed, Lapce, Alacritty) uses this architecture.

TUI Rendering Backend

A major benefit of owning layout in Rust: one layout engine, multiple renderers. The same RustLayoutEngine that produces glyph batches for wgpu can also output to a terminal grid, giving us a true TUI Emacs.

Architecture

                              ┌─→ WgpuRenderer (GPU)
LayoutSnapshot → RustLayout ──┤
                              └─→ TuiRenderer (terminal)

The layout engine produces a backend-agnostic intermediate representation:

struct LayoutOutput {
    rows: Vec<LayoutRow>,
}

struct LayoutRow {
    glyphs: Vec<LayoutGlyph>,
    y: f32,
    height: f32,
}

struct LayoutGlyph {
    char: char,
    x: f32,
    width: f32,
    face_id: u32,
    is_cursor: bool,
    // ... other attributes
}

The WgpuRenderer consumes this as pixel-positioned glyph batches (current path). The TuiRenderer maps this to a cell grid:

struct TuiRenderer {
    terminal: crossterm::Terminal,   // or termwiz / ratatui backend
    grid: Vec<Vec<Cell>>,           // rows x cols cell grid
    prev_grid: Vec<Vec<Cell>>,      // previous frame for diffing
}

struct Cell {
    char: char,
    fg: Color,
    bg: Color,
    attrs: CellAttrs,  // bold, italic, underline, strikethrough
}

How TUI Rendering Works

Layout: RustLayoutEngine produces LayoutOutput with pixel coordinates
Quantize: TuiRenderer maps pixel positions to cell grid (divide by cell width/height)
Diff: Compare current grid against previous grid
Emit: Output only changed cells via ANSI escape sequences

Terminal Features

Feature	GPU (wgpu)	TUI (terminal)
Text rendering	Glyph atlas + shader	ANSI escape sequences
Colors	32-bit RGBA linear	256-color / 24-bit truecolor
Bold/italic	Font variant selection	SGR attributes
Underline	Custom pixel drawing	SGR underline (wavy if supported)
Images	GPU texture	Sixel / Kitty graphics protocol
Cursor	Animated, blinking	Terminal cursor escape
Smooth scroll	Pixel-level	Line-level (or pixel with Kitty)
Ligatures	Full OpenType	Not possible (cell grid)
Variable-width	Full support	Monospace only
Box drawing	SDF rounded rects	Unicode box characters
Video/WebKit	Inline rendering	Not supported
Mouse	Full pixel tracking	Cell-level tracking
DPI scaling	Automatic	Terminal handles it
Performance	120fps GPU	60fps terminal refresh

Crate Choices

crossterm — Cross-platform terminal manipulation (input, output, raw mode). Mature, widely used.
ratatui — TUI framework built on crossterm. Provides widget abstractions, but we may only need the backend layer since we have our own layout.
termwiz — Alternative from wezterm project. Better Kitty graphics protocol support.

TUI-Specific Considerations

Cell grid quantization: The layout engine works in pixel coordinates. For TUI, we quantize:

let col = (glyph.x / cell_width).floor() as usize;
let row = (glyph.y / cell_height).floor() as usize;

Wide characters (CJK) occupy 2 cells. The layout engine already knows character widths from cosmic-text; TUI renderer uses unicode-width crate to determine cell count.

Color mapping: Layout faces use 32-bit sRGB colors. TUI renderer maps to:

24-bit truecolor (most modern terminals)
256-color palette (fallback)
16-color ANSI (minimal fallback)

Detection via COLORTERM=truecolor environment variable or terminfo capabilities.

Inline images: Modern terminals support image protocols:

Kitty graphics protocol — pixel-perfect, widely supported
Sixel — older but broadly compatible
iTerm2 inline images — macOS terminals

The TUI renderer can optionally support these for IMAGE_GLYPH layout items.

Diffing for performance: Unlike GPU (clear-and-rebuild each frame), terminals are slow to redraw. The TUI renderer must diff current vs previous grid and only emit changes. This is standard practice (ncurses, crossterm, ratatui all do this).

Implementation Phase

TUI backend fits as an additional phase after Phase 1 (monospace ASCII):

Phase 1.5: TUI Renderer

Implement TuiRenderer that consumes LayoutOutput
Cell grid quantization from pixel coordinates
ANSI escape sequence output via crossterm
Grid diffing for incremental updates
Basic face -> SGR attribute mapping
Cursor display via terminal cursor

Scope: ~1200 Rust. Difficulty: Medium.

This phase can proceed in parallel with Phases 2-7 since TUI and GPU renderers consume the same layout output. Each layout feature (faces, display props, bidi) automatically works in both renderers once the layout engine supports it.

Use Cases

SSH sessions — Full Emacs over SSH without X11 forwarding or GPU
Containers / CI — Emacs in Docker, headless servers
Low-resource machines — No GPU required
Terminal multiplexers — Works inside tmux, screen, zellij
Accessibility — Screen readers work with terminal output
Testing — Deterministic text output for layout regression tests

Uh oh!

FilesExpand file tree

rust-display-engine.md

Latest commit

History

rust-display-engine.md

File metadata and controls

Neomacs Architecture: Redesigning Emacs in Rust

What's Wrong with Official Emacs — Redesign Opportunities

1. Single-Threaded Lisp Machine

2. CPU-Only Rendering

3. Persistent Pixel Buffers vs Immediate Mode

4. Character-Cell Thinking

5. No Native Text Shaping

6. Synchronous Process I/O

7. Stop-the-World Garbage Collection

8. Internal String Encoding (Not UTF-8)

9. Gap Buffer

10. Redisplay Blocks Lisp

11. No SIMD / Vectorization

12. Lisp Object Memory Layout

Redesign Priority Order

Rust Display Engine (Strategy 4)

How the Official Emacs Display Engine Works

Entry Point: redisplay_internal() (xdisp.c:17360)

Per-Window: redisplay_window() (~2000 lines)

The Core: display_line() (~500 lines)

The Display Iterator (struct it)

Iterator Methods (9 types)

The Property Check Pipeline

The Iterator Stack (5 levels)

Glyph Production

Glyph Structure

Glyph Row

Phase 2: Updating the Screen (dispnew.c)

The Backend Interface (struct redisplay_interface)

Face Resolution Pipeline

Window-Start Computation

Mode-Line Rendering

Key Design Principles of the Official Engine

Why It's Complex (~40k LOC)

Current Architecture

Pain Points

Proposed Architecture

Critical Design Constraint: Synchronization

Synchronous Lisp Function Categories

Display Engine API Contract

Inputs: What Emacs C Core Provides to the Display Engine

Buffer Data

Text Properties (at each position)

Overlays (at each position)

Face Data

Window State

Buffer-Local Variables

Global Variables

Outputs: What the Display Engine Must Provide Back

Window Fields (set during redisplay, read by Emacs)

Frame Fields

Global Variables

Current Matrix (w->current_matrix)

Layout Query Functions (must be implemented by the display engine)

Callbacks During Layout

What We Keep vs Replace

Keep (Emacs C/Lisp)

Replace (Rust)

Reading Lisp Data from Rust

Approach A: Snapshot (recommended, start here)

Approach B: Shared-memory / FFI read (optimize later)

Phased Implementation

Phase -1: Direct Glyph Hook (Low-Risk Stepping Stone)

Phase 0: Layout Snapshot Infrastructure

Phase 1: Monospace ASCII Layout Engine

Phase 1.5: TUI Renderer

Phase 2: Face Resolution

Phase 3: Display Properties

Phase 4: Mode-line, Header-line & Tab-line

Phase 5: Variable-width & Compositions

Phase 6: Bidi

Phase 7: Images & Media

Phase 8+: Parallel Layout (Future)

Entry Point: `redisplay_internal()` (xdisp.c:17360)

Per-Window: `redisplay_window()` (~2000 lines)

The Core: `display_line()` (~500 lines)

The Display Iterator (`struct it`)

The Backend Interface (`struct redisplay_interface`)

Current Matrix (`w->current_matrix`)