Skip to content

Bug: Multi-versioning CUDA kernels #241

@ashvardanian

Description

@ashvardanian

Describe the bug

Our current compilation method, focusing on sm90a is too constraining. Built-in NVCC multi-versioning is too tricky to use and results in bloated binaries. So we should find a better way to pre-compile and pre-package PTX and/or SASS for a small set of generations. Likely: Volta, Ampere, Hopper, Blackwell.

Steps to reproduce

Try to run StringWa.rs benchmarks in a 4060 machine.

Expected behavior

Passing conpilation.

StringZilla version

v4

Operating System

22.04

Hardware architecture

x86

Which interface are you using?

C implementation

Contact Details

No response

Are you open to being tagged as a contributor?

  • I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcoreWork on the algorithm design and implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions