Enable builds without direct torch.cuda availability and support sm89 / sm90.#5
Open
wanderingai wants to merge 2 commits intoHazyResearch:mainfrom
Open
Enable builds without direct torch.cuda availability and support sm89 / sm90.#5wanderingai wants to merge 2 commits intoHazyResearch:mainfrom
wanderingai wants to merge 2 commits intoHazyResearch:mainfrom
Conversation
Contributor
|
It looks like this PR is introducing some race conditions - when I install using this branch, some tests fail: |
|
@wanderingai Love this PR, its a improvement from the previous setup.py. Can this be merged, worst case with no sm_90 flags by default? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR allows for
monarch_cudakernel to be built for computessm80,sm89, andsm90, which includes the following GPUs:Additionally the
setup.pyis updated to enable builds based onnvccavailability but without directtorch.cudaavailability for flexible builds.Update:
The compiler flags have been abstracted to support both PTX and SASS builds while defaulting to the original Ampere-based PTX-only build i.e.
-gencode=arch=compute_80,code=compute_80.Successfully tested by building a docker image and running tests under
tests/: