Skip to content

Does cuda_ipc support GPU cards without NVLink? #11046

@Jeffwan

Description

@Jeffwan

Configuration

parser.c:2368 UCX  INFO  UCX_* env variables: UCX_PROTO_INFO=y UCX_LOG_LEVEL=debug UCX_TLS=cuda_ipc,cuda_copy,tcp

Logs

[1765333153.178129] [g340-cd51-4900-11aa-cea2-3aba-b2e8:1586361:2]      ucp_worker.c:1912 UCX  INFO    ucp_context_0 intra-node cfg#2 rma_am(tcp/eth1)  amo_am(tcp/eth1)  am(tcp/eth1 tcp/eth5 tcp/eth7 tcp/eth6 tcp/eth8 cuda_ipc/cuda)  ka(tcp/eth1)

seems cuda_ipc is not part of the data transfer protocols.

For RMA lanes (not RMA_BW), the wireup selection requires these flags (from select.c:1176-1180):
  UCT_IFACE_FLAG_PUT_SHORT |
  UCT_IFACE_FLAG_PUT_BCOPY |
  UCT_IFACE_FLAG_GET_BCOPY |
  UCT_IFACE_FLAG_PENDING

But cuda_ipc only has (from line 275-280):
  UCT_IFACE_FLAG_GET_ZCOPY |
  UCT_IFACE_FLAG_PUT_ZCOPY |
  UCT_IFACE_FLAG_PENDING

cuda_ipc is missing PUT_SHORT, PUT_BCOPY, and GET_BCOPY - it only supports zcopy operations. does it mean cuda_ipc is not selected for rma_am lane - it doesn't have the required short and bcopy capabilities that RMA lane selection require?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions