Skip to content

Pytorch not working with CUDA 11.2 and CUDA 11.7 #32

@TC-MCZ

Description

@TC-MCZ
          Hi ,I have some problems when running cricket in pytorch. I have pulled the latest code,and build pytorch locally with modify change the doces mentioned.

my CUDA is 11.2 and cudnn is 8.9.2 in ths Tesla P4,but get this problem:

server:
+08:01:00.423212 WARNING: duplicate resource! The first resource will be overwritten in resource-mg.c:145 +08:01:00.445168 WARNING: duplicate resource! The first resource will be overwritten in resource-mg.c:145 +08:01:00.445403 WARNING: duplicate resource! The first resource will be overwritten in resource-mg.c:145 +08:01:00.447247 WARNING: duplicate resource! The first resource will be overwritten in resource-mg.c:145 +08:01:00.448076 WARNING: duplicate resource! The first resource will be overwritten in resource-mg.c:145 +08:01:07.164339 ERROR: cuda_device_prop_result size mismatch in cpu-server-runtime.c:367 +08:02:22.370950 INFO: RPC deinit requested. +08:08:54.324012 INFO: have a nice day!
client:
`+08:00:36.417392 WARNING: could not find .nv.info section. This means this binary does not contain any kernels. in cpu-elf2.c:922
+08:00:36.418684 WARNING: could not find .nv.info section. This means this binary does not contain any kernels. in cpu-elf2.c:922
+08:00:36.420058 WARNING: could not find .nv.info section. This means this binary does not contain any kernels. in cpu-elf2.c:922
call failed: RPC: Timed out
call failed: RPC: Timed out
call failed: RPC: Timed out
+08:02:01.851255 ERROR: something went wrong in cpu-client-runtime.c:444
Traceback (most recent call last):
File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 242, in _lazy_init
queued_call()
File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 125, in _check_capability
capability = get_device_capability(d)
File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 357, in get_device_capability
prop = get_device_properties(device)
File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 375, in get_device_properties
return _get_device_properties(device) # type: ignore[name-defined]
RuntimeError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/lwh/cricket/tests/test_apps/pytorch_minimal.py", line 39, in
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 246, in _lazy_init
raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error:

CUDA call was originally invoked at:

[' File "/home/lwh/cricket/tests/test_apps/pytorch_minimal.py", line 31, in \n import torch\n', ' File "", line 991, in _find_and_load\n', ' File "", line 975, in _find_and_load_unlocked\n', ' File "", line 671, in _load_unlocked\n', ' File "", line 843, in exec_module\n', ' File "", line 219, in _call_with_frames_removed\n', ' File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/init.py", line 798, in \n _C._initExtension(manager_path())\n', ' File "", line 991, in _find_and_load\n', ' File "", line 975, in _find_and_load_unlocked\n', ' File "", line 671, in _load_unlocked\n', ' File "", line 843, in exec_module\n', ' File "", line 219, in _call_with_frames_removed\n', ' File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 179, in \n _lazy_call(_check_capability)\n', ' File "/root/anaconda3/envs/py3.8/lib/python3.8/site-packages/torch/cuda/init.py", line 177, in _lazy_call\n _queued_calls.append((callable, traceback.format_stack()))\n']
+08:02:27.007890 ERROR: call failed. in cpu-client.c:213
+08:02:27.012036 INFO: api-call-cnt: 14
+08:02:27.012051 INFO: memcpy-cnt: 0`

Is my CUDA version wrong? or other reasons?

Originally posted by @Tlhaoge in #6 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    pytorchIssues related to pytorch

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions