Skip to content

softhdcuvid – Segfault on NVIDIA Blackwell (RTX 5060) / Driver 580 [solved] #77

@nullptrdesaster

Description

@nullptrdesaster

Environment:
GPU NVIDIA GeForce RTX 5060 (Blackwell GB206, SM 12.0)
NVIDIA Driver 580.126.16 / 580.126.20 (Open)
Vulkan API1.4.312
vdr-plugin-softhdcuvid3.35.3+git20260315-355-b475c17-0yavdr0noble
libplacebo349 7.349.0+git20241013-18-9e16c86f-1yavdr0
noble
libplacebo338 6.338.2-2build1
OS Ubuntu 24.04.4 LTS (Noble)
Platform Proxmox VM with GPU-Passthrough

VDR crashes on shutdown with a segfault, reported by the Linux kernel in dmesg:
video display[7125]: segfault at 7d5456124760 ip 00007d5456124760
sp 00007d5688fdc818 error 15 likely on CPU 0

ip == faulting address → call through an invalid function pointer
error 15 → protection fault, page present, user mode, reserved bit set in PTE
crash always occurs during teardown of the video display thread

Sequence in VDR log immediately before the crash:
vdr: CuvidExit
vdr: video: video thread canceled
vdr: decoder thread exit
vdr: delete placebo ← last log entry before crash
kernel: video display[...]: segfault error 15

Root Cause
softhdcuvid initializes the Vulkan instance in InitPlacebo() in video.c with only two extensions:
const char *ext[2] = {"VK_KHR_xcb_surface", "VK_KHR_surface"};
iparams.num_extensions = 2;
iparams.extensions = ext;

VK_EXT_debug_utils is neither requested at instance nor at device level.
libplacebo and libnvidia-glcore internally load vkCmdBeginDebugUtilsLabelEXT and vkCmdEndDebugUtilsLabelEXT via vkGetDeviceProcAddr – without the extension having been enabled.

Behavior depending on driver/GPU:

  • Older GPUs/drivers: vkGetDeviceProcAddr returns NULL → null checks work → no crash
  • RTX 5060 + Driver 580 (Blackwell SM 12.0): vkGetDeviceProcAddr returns a non-NULL but invalid pointer → null checks do not fire → SEGFAULT

Confirmed by Vulkan Validation Layer:
vkCmdBeginDebugUtilsLabelEXT(): function required extension
VK_EXT_debug_utils which has not been enabled.

Fix
Two changes in video.c inside InitPlacebo():

  1. Add instance extension:
    // Before:
    const char *ext[2] = {"VK_KHR_xcb_surface", "VK_KHR_surface"};
    iparams.num_extensions = 2;

// After:
const char *ext[3] = {"VK_KHR_xcb_surface", "VK_KHR_surface",
"VK_EXT_debug_utils"};
iparams.num_extensions = 3;

  1. Add device extension (after params.allow_software = false;):
    static const char *dev_opt_ext[] = {"VK_EXT_debug_utils"};
    params.opt_extensions = dev_opt_ext;
    params.num_opt_extensions = 1;

opt_extensions is used so that device creation does not fail if the extension is unavailable on other systems.

Relation to libplacebo Bug
A related bug was found and fixed simultaneously in libplacebo: CmdBeginDebugUtilsLabelEXT and CmdEndDebugUtilsLabelEXT were loaded as mandatory functions in vk_dev_funs[] without activating the extension (src/vulkan/context.c). Both fixes together completely resolve the segfault.
libplacebo fix (src/vulkan/context.c):
bash# Remove from mandatory function list vk_dev_funs[]:
sed -i '/PL_VK_DEV_FUN(CmdBeginDebugUtilsLabelEXT)/d' src/vulkan/context.c
sed -i '/PL_VK_DEV_FUN(CmdEndDebugUtilsLabelEXT)/d' src/vulkan/context.c

Tested
Ubuntu 24.04.4 LTS (Noble)
RTX 5060 (Blackwell GB206, SM 12.0)
NVIDIA Open Driver 580.126.16 / 580.126.20
VDR starts and stops without segfault after the fix

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions