Skip to content

THREAD PANIC and Segmentation fault when passing data parallelized model between threads #483

@farleylai

Description

@farleylai

The use case is to create multi-GPU model variants in multiple threads and even for later multi-threaded training. Only when the model is made data parallel using the DataParallelTable would the following THREAD PANIC and Segmentation fault be thrown when the data parallel model is passing between main thread and worker threads.

FATAL THREAD PANIC: (read) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:370: table index is nil
THCudaCheck FAIL file=../Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c line=238 error=29 : driver shutting down
FATAL THREAD PANIC: (write) ...Downloads/pkg/torch/install/share/lua/5.1/torch/File.lua:210: cuda runtime error (29) : driver shutting down at /home/ml/farleylai/Downloads/pkg/torch/extra/cutorch/torch/generic/Storage.c:238	
Segmentation fault (core dumped)

The model is made data parallel using the multi-GPU example code:
function Models.parallelize(model)
if opt.nGPU > 1 then
local gpus = torch.range(1, opt.nGPU):totable()
local dpt = nn.DataParallelTable(1, true, true):add(model, gpus):threads(function() require 'cudnn' cudnn.benchmark = true end)
dpt.gradInput = nil
model = dpt:cuda()
end
return model
end

Any ideas or justifications?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions