Skip to content

Conversation

@Binyang2014
Copy link
Contributor

@Binyang2014 Binyang2014 commented Jul 26, 2023

Provide tile-based api

void put2D(uint64_t dstOffset, uint64_t srcOffset, uint32_t width, uint32_t height)
void put2DWithSignal(uint64_t dstOffset, uint64_t srcOffset, uint32_t width, uint32_t height)

To support this, add a new structure fields2D in ChannelTrigger. In this structure we replace the 64bit size to two 32 bit fields (a 32bit width and a 32bit height). Also add another flag multiDimensionFlag in fields2D to distinguish with fields structure

Example to use tile-based API:
When setup the connections, need to call channelService->addPitch first

  for (int r = 0; r < worldSize; r++) {
    if (r == rank) {
      continue;
    }
    std::shared_ptr<mscclpp::Connection> conn;
    if ((rankToNode(r) == rankToNode(gEnv->rank)) && !useIbOnly) {
      conn = communicator->connectOnSetup(r, 0, mscclpp::Transport::CudaIpc);
    } else {
      conn = communicator->connectOnSetup(r, 0, ibTransport);
    }
    connections[r] = conn;
    communicator->sendMemoryOnSetup(recvBufRegMem, r, 0);
    auto remoteMemory = communicator->recvMemoryOnSetup(r, 0);
    communicator->setup();

    mscclpp::SemaphoreId cid = channelService->add2DChannel(conn, std::pair<size_t, size_t>(dstPitch, srcPitch));
    communicator->setup();
  }

Then use the put2D API

if (threadIdx.x == 0) proxyChan.put2DWithSignal(offset, width * sizeof(int), height);

@chhwang chhwang marked this pull request as ready for review July 26, 2023 12:47
@Binyang2014
Copy link
Contributor Author

The API cudaMemcpy2DAsync seems slower than cudaMemcpyAsync for 1D data. Need to investigate

return semaphores_.size() - 1;
}

MSCCLPP_API_CPP SemaphoreId ProxyService::buildAndAddSemaphore(Communicator& communicator,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make much sense to me, why do we need an extra an way of building a semaphore? We only need to provide a 2D write over 1D arrays. So, just a 2D write is enough. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to set pitch/stride for the channel (the name semaphore is not accurate). The reason we don't set stride in the put2D API is our trigger is only 128bit. We don't have extra bits for the it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants