Skip to content

Conversation

@hero78119
Copy link
Collaborator

@hero78119 hero78119 commented Oct 13, 2025

To deal with > 1 shards and satisfy offline memory constrain
related to #1061, #1063, #699, #698, #697, #696, #700

change scope

  • separate mem bus to read/write 2 different map
  • add new public input
  • integrate local init chip
  • integrate local final chip
  • shift opcode & mem record to shard ts
  • integrate mem bus chip
  • refactor mock prover mock proving

benchmarks

bench on single chunk
with fibonacci on CPU: 5900XT 32 cores, 64GB RAM

Benchmark Median Time (s) Median Change (%)
fibonacci_max_steps_1048576 2.8142 +1.54% (Change within noise threshold)
fibonacci_max_steps_2097152 4.9337 +1.05% (Change within noise threshold)
fibonacci_max_steps_4194304 9.1971 -3.21% (Change within noise threshold)

which shows no performance impact

@hero78119 hero78119 marked this pull request as draft October 13, 2025 09:35
@hero78119 hero78119 changed the title [WIP] continuation of multi-shard + ram bus continuation of multi-shard + ram bus Oct 20, 2025
@hero78119
Copy link
Collaborator Author

multi shards and continuation support

This PR add Shard: meta info BEFORE emulate, and ShardContext: meta info AFTER emulate. ShardContext was passed into opcode & table circuit, and each memory access (read/write) was categorized via ShardContext and differentiate it's within or outside chunk

@hero78119
Copy link
Collaborator Author

shards mem records tracking

refer ceno_emul/src/chunked_vec.rs
To tracking a memory write record being accessed in the future or not, a data structure ChunkedVec was being added in emulator to track future access toward this record. As in each cycle there used to be something write, so the cycle pattern is pretty dense, we choose implement dynamic extended vector so leverage index access with a slightly waste of space.

kunxian-xia and others added 2 commits November 3, 2025 12:46
## Summary

### Sub tasks
- [x] $N = 2^n$ septic curve points accumulation (in **one layer**)
using [Quark](https://eprint.iacr.org/2020/1275)
cmd: `cargo test --release --lib test_ecc_quark_prover --features
"sanity-check" -- --nocapture`
- [x] `Global` chip @kunxian-xia
    - [x] constraints
    - [x] debugging `sum != p[0] + p[1]`
    - [x] enable poseidon2
    - [x] #1093
    - [x] enable non power-of-two #1081

---------

Co-authored-by: Ming <hero78119@gmail.com>
@kunxian-xia kunxian-xia mentioned this pull request Nov 3, 2025
2 tasks
hero78119 and others added 2 commits November 5, 2025 17:36
build on top of #1084

### changes
- [x] (cpu) migrate all tables to gkr-iop
- [x] fixing verifier logic and e2e
- [x] refactor code under gpu module
- [x] code cleanup & benchmark 

### benchmark


With CPU
3-4% performance regressed

| Benchmark | Median Time (s) | Median Change (%) |

|------------------------------|------------------|------------------------------------|
| fibonacci_max_steps_1048576 | 2.8744 | +4.06% (Performance has
regressed) |
| fibonacci_max_steps_2097152 | 5.0004 | +3.31% (Performance has
regressed) |
| fibonacci_max_steps_4194304 | 9.3081 | +3.08% (Performance has
regressed) |
With GPU

On GPU 5070 ti, CPU 5900XT
performance regressed a bit on smaller size is expected. On larger
workload it even improve.
Think the reason is because we got less cuda kernel invocations during
e2e

| Benchmark | GPU (New) | Δ Change (%) |

|------------------------------|-----------|------------------------------------|
| fibonacci_max_steps_1048576 | 0.979 s | +3.63% (Performance has
regressed) |
| fibonacci_max_steps_2097152 | 1.344 s | +3.92% (Performance has
regressed) |
| fibonacci_max_steps_4194304 | 2.222 s | -14.87% (Performance has
improved) |

---------

Co-authored-by: kunxian xia <xiakunxian130@gmail.com>
# Summary

A few cleanups to #1084 

- [x] remove `RamBusCircuit`
- [x] rename `global` circuit to `ShardRamCircuit`.
Copy link
Collaborator

@kunxian-xia kunxian-xia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@hero78119 hero78119 added this pull request to the merge queue Nov 6, 2025
Merged via the queue into master with commit 1e3c940 Nov 6, 2025
4 checks passed
@hero78119 hero78119 deleted the feat/multi_shard branch November 6, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants