Skip to content

Conversation

@hero78119
Copy link
Collaborator

@hero78119 hero78119 commented Oct 22, 2025

build on top of #1084

changes

  • (cpu) migrate all tables to gkr-iop
  • fixing verifier logic and e2e
  • refactor code under gpu module
  • code cleanup & benchmark

benchmark

With CPU
3-4% performance regressed

Benchmark Median Time (s) Median Change (%)
fibonacci_max_steps_1048576 2.8744 +4.06% (Performance has regressed)
fibonacci_max_steps_2097152 5.0004 +3.31% (Performance has regressed)
fibonacci_max_steps_4194304 9.3081 +3.08% (Performance has regressed)
With GPU

On GPU 5070 ti, CPU 5900XT
performance regressed a bit on smaller size is expected. On larger workload it even improve.
Think the reason is because we got less cuda kernel invocations during e2e

Benchmark GPU (New) Δ Change (%)
fibonacci_max_steps_1048576 0.979 s +3.63% (Performance has regressed)
fibonacci_max_steps_2097152 1.344 s +3.92% (Performance has regressed)
fibonacci_max_steps_4194304 2.222 s -14.87% (Performance has improved)

@hero78119 hero78119 marked this pull request as draft October 22, 2025 14:24
@hero78119 hero78119 marked this pull request as ready for review October 26, 2025 07:14
@hero78119 hero78119 changed the title (WIP) unify all circuit to gkr-iop unify all circuit to gkr-iop Oct 26, 2025
Copy link
Collaborator

@kunxian-xia kunxian-xia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1st pass of code review

descending: false,
},
);
let selector = cb.create_placeholder_structural_witin(|| "selector");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove this function as it's already covered in trait TableCircuit.

kunxian-xia and others added 5 commits November 4, 2025 23:13
Build on top of #1092 

### change scopes
- [x] verify api for continuation proofs
- [x] exclude init tables on shard > 0
- [x] ceno-cli with multiple proofs
- [x] separate fixed commitment into 2 set, one for first shard, the
other for non-first shards
- [x] multi-shards ci integration test 

### design rationales
`prover_id` and `num_provers` are exposed as CLI arguments to specify
the number of physical provers in a cluster, each mark with a prover_id.
The overall trace data is divided into `shards`, which are distributed
evenly among the provers. The number of shards are in general agnostic
to number of provers. Each prover is assigned `n` shard where n can be
even empty

Shard distribution follows a balanced allocation strategy — for example,
if there are 10 shards and 3 provers, the shard counts will be
distributed as 4, 3, and 3, ensuring an even workload across all
provers.

---------

Co-authored-by: xkx <xiakunxian130@gmail.com>
@kunxian-xia kunxian-xia merged commit 1d147c4 into feat/multi_shard Nov 5, 2025
4 checks passed
@kunxian-xia kunxian-xia deleted the feat/all_gkr_iop branch November 5, 2025 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants