Skip to content

Conversation

@vermorel
Copy link
Member

Perf (Benchmark20_1_simd_intrinsics, IterationCount=200, WarmupCount=1, InProcess):

  • net10 mean: 67.31 ms
  • net10-perf mean: 40.20 ms (~1.67x faster, -27.11 ms)

Env: Intel Xeon W-2223, .NET 10.0.1, AVX-512F+CD+BW+DQ+VL.

Output check: last_hidden_state payload matches between net10 and net10-perf for 'Hello world this is some text' (timestamp stripped, --disable-simd).

Perf (Benchmark20_1_simd_intrinsics, IterationCount=200, WarmupCount=1, InProcess):

- net10 mean: 67.31 ms

- net10-perf mean: 40.20 ms (~1.67x faster, -27.11 ms)

Env: Intel Xeon W-2223, .NET 10.0.1, AVX-512F+CD+BW+DQ+VL.

Output check: last_hidden_state payload matches between net10 and net10-perf for 'Hello world this is some text' (timestamp stripped, --disable-simd).

Not included (regressions/unstable): packed B^T cache kernel, odd-M 2x4 tail, MatMul broadcast fast path, linear batch offsets, 4x4 kernel, AVX load/store, C-span reuse, AggressiveOptimization attribute, Broadcast alloc avoidance, shape-dedupe pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants