Question regarding to ROCM and removable GPUs
WalterCool
waltercool at slash.cl
Tue Dec 31 05:48:01 UTC 2024
Hi everyone,
I have a Framework 16 laptop with dGPU extension, as you may know, that means it uses Radeon 780M iGPU and Radeon RX 7700S dGPU.
Something I noticed, after I virtually remove RX 7700S for VM applications (vfio-pci), it breaks the entire ROCM feature... making ROCM apps to not detect anything, not even iGPU. Re-attaching 7700S doesn't resolve the problem.
> $ rocminfo
> ROCk module is loaded
> Unable to open /dev/kfd read-write: Invalid argument
> waltercool is member of render group
When using LMStudio for example, by default both ROCM and OpenCL backends work fine, if I remove my GPU (and later re-attach to host), only OpenCL will work.
Applications like ollama or ComfyUI will fail after dGPU is detached, using "amdgpu_gpu_recover" does not resolve the issue.
Any ideas how to recover KFD/ROCM functionality after I detach my GPU?
Q: How do you detach your GPU?
A: /sys/bus/pci/devices/${GPU_VIDEO/GPU_AUDIO}/driver/unbind or /sys/module/amdgpu/drivers/pci\:amdgpu/unbind
Q: How do you reattach your GPU?
A: Remove device (/sys/bus/pci/devices/${GPU_VIDEO/GPU_AUDIO}/remove), then PCI rescan.
Kind regards.
--
WalterCool
Sent with Proton Mail secure email.
More information about the amd-gfx
mailing list