Skip to content

Conversation

@Ka-zam
Copy link
Contributor

@Ka-zam Ka-zam commented Dec 4, 2025

index_??? should return first index meeting criteria.

@marcusmueller
Copy link
Member

Hi @Ka-zam!

Oh! Wasn't aware that's a bug; thanks for submitting a fix! Um, are we perhaps missing a unit test there, or is the existing test borked?

@Ka-zam Ka-zam force-pushed the fix_rvv_index_max branch 4 times, most recently from 1e6670a to 4095edb Compare December 4, 2025 23:24
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Dec 4, 2025

Hi @Ka-zam!

Oh! Wasn't aware that's a bug; thanks for submitting a fix! Um, are we perhaps missing a unit test there, or is the existing test borked?

Hi!
Existing test is not effective since it relies on floats being equal and in the same lane which is unlikely to happen. Adding test:

  // Test data with duplicate max values at indices 1 and 4
  float test_data[] = {0x1.0p0f, 0x1.8p1f, 0x1.0p1f, 0x1.4p1f, 0x1.8p1f, 0x1.2p1f};
  // Expected result: index 1 (first occurrence of max)

(edit)
Hmm, and removing them since they exposed similar bugs in the x64 implementations. We should decide how to handle this.

Ka-zam added a commit to Ka-zam/volk that referenced this pull request Dec 7, 2025
…mprovements, asin/acos optimization

PR gnuradio#801 - NEON/NEONv8 kernel implementations:
- Add NEON/NEONv8 implementations for many kernels with 2-20x speedups
- Fix NEON asin/acos sqrt approximation for large values

PR gnuradio#802 - RVV bug fix:
- Fix RVV index_max/min kernels returning wrong index

PR gnuradio#803 - Test output improvements:
- Print tolerance, max_err, fail variables
- Tablify test output for better readability

PR gnuradio#804 - Optimize asin/acos kernels:
- Use Sollya-generated polynomial with two-range algorithm
- Improve accuracy from ~5.5e-4 to ~1.5e-6 (67x better)
- Achieve ~20-27x speedup on x86, ~4.5x on ARM
- Tighten test tolerance from 1e-2 to 1e-5 with edge cases
- Add ARMv7 NEON sqrt intrinsic with Newton-Raphson iteration
@jdemel
Copy link
Contributor

jdemel commented Dec 7, 2025

Thanks for having a look at this!

So, are we consistently failing here across implementations?

size_t maxIndex = 0;
float maximum = -INFTY;
for (size_t index = 0; index < vec.size(); ++index){
  if (vec[index] > maximum){
    maximum = vec[index];
    maxIndex = index;
  }
}

I assume this would be the simple and correct algorithm. Thus the result for 1.0, 1.0, 1.0 is index 0 and not 1 or 2.

This is the kind of bug and test behavior why I introduced googletest. I'd like to add tests like this there to encode expected behavior and make it possible to test against that.

If you want to add tests for this case, please go ahead. Besides, we can merge this PR.

@Ka-zam
Copy link
Contributor Author

Ka-zam commented Dec 8, 2025

Thanks for having a look at this!

So, are we consistently failing here across implementations?

size_t maxIndex = 0;
float maximum = -INFTY;
for (size_t index = 0; index < vec.size(); ++index){
  if (vec[index] > maximum){
    maximum = vec[index];
    maxIndex = index;
  }
}

I assume this would be the simple and correct algorithm. Thus the result for 1.0, 1.0, 1.0 is index 0 and not 1 or 2.

This is the kind of bug and test behavior why I introduced googletest. I'd like to add tests like this there to encode expected behavior and make it possible to test against that.

If you want to add tests for this case, please go ahead. Besides, we can merge this PR.

Yes, your implementation is how I understand it should work.

We are consistently failing across all implementations when injecting floats that are bit equivalent since none of the implementations keep track of which lane holds the minimum value. The reason RVV failed for this run was simply that the test generated identical input values spaced close enough. So, we fix and inject known problematic test patterns to test this?

@Ka-zam Ka-zam force-pushed the fix_rvv_index_max branch 2 times, most recently from 7d42ef6 to 87a19b7 Compare December 8, 2025 22:26
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Dec 8, 2025

Added edge case to index_* tests:

    // Index kernels need identical values to test tie-breaking (first index wins)
    volk_test_params_t test_params_index(test_params.make_tol(0));
    test_params_index.add_float_edge_cases({
        1.0f,
        1.0f,
        1.0f,
        1.0f, // 4 identical (SSE lane width)
        1.0f,
        1.0f,
        1.0f,
        1.0f, // 8 total (AVX lane width)
        1.0f,
        1.0f,
        1.0f,
        1.0f, // 12
        1.0f,
        1.0f,
        1.0f,
        1.0f, // 16 total (AVX512 lane width)
    });
    QA(VOLK_INIT_TEST(volk_32f_index_max_16u, test_params_index))
    QA(VOLK_INIT_TEST(volk_32f_index_max_32u, test_params_index))
    QA(VOLK_INIT_TEST(volk_32f_index_min_16u, test_params_index))
    QA(VOLK_INIT_TEST(volk_32f_index_min_32u, test_params_index))

Fixed kernels:

RUN_VOLK_TESTS: volk_32fc_index_max_32u(131071,1987)
generic                          129.8275 ms ( 24073.3 MB/s)
a_avx2_variant_0                  38.0663 ms ( 82103.6 MB/s)
a_avx2_variant_1                  18.2670 ms (171094.2 MB/s)
a_sse3                            53.8067 ms ( 58085.2 MB/s)
a_avx512f                         13.9901 ms (223399.2 MB/s)
u_avx2_variant_0                  38.5189 ms ( 81138.8 MB/s)
u_avx2_variant_1                  23.3854 ms (133646.4 MB/s)
u_avx512f                         13.6445 ms (229057.8 MB/s) *
Best aligned arch:                 u_avx512f         (9.52x)
Best unaligned arch:               u_avx512f         (9.52x)
--------------------------------------------------------------------------------
.
.
.

RUN_VOLK_TESTS: volk_32f_index_min_32u(131071,997)
generic                          109.0696 ms (  9585.3 MB/s)
neon                              47.8347 ms ( 21855.7 MB/s)
neonv8                            36.2350 ms ( 28852.3 MB/s) *
Best aligned arch:                    neonv8         (3.01x)
Best unaligned arch:                  neonv8         (3.01x)
--------------------------------------------------------------------------------

@Ka-zam Ka-zam mentioned this pull request Dec 9, 2025
Copy link
Contributor

@jdemel jdemel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing and fixing this bug. LGTM.

@jdemel jdemel linked an issue Dec 16, 2025 that may be closed by this pull request
@jdemel
Copy link
Contributor

jdemel commented Dec 16, 2025

I linked #700 against this PR since it discusses the same root cause. Thanks again!

@jdemel
Copy link
Contributor

jdemel commented Dec 16, 2025

Merging the other PR #801 created merge conflicts. That's very unfortunate. Can you rebase?

@jdemel
Copy link
Contributor

jdemel commented Dec 16, 2025

FYI: I'd like to get all your PRs #802, #803 , and #804 merged. Afterwards, it's time to do a release.

Signed-off-by: Magnus Lundmark <magnuslundmark@gmail.com>
Signed-off-by: Magnus Lundmark <magnuslundmark@gmail.com>
@Ka-zam
Copy link
Contributor Author

Ka-zam commented Dec 18, 2025

This can be merged cleanly now.

@jdemel jdemel merged commit 043337f into gnuradio:main Dec 19, 2025
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qa_volk_32fc_index_* are flaky

3 participants