After Vega 56/64 GPU hang I unable reboot system

StDenis, Tom Tom.StDenis at amd.com
Thu Dec 20 11:17:56 UTC 2018


On 2018-12-19 10:29 p.m., Mikhail Gavrilov wrote:
> On Thu, 20 Dec 2018 at 03:41, StDenis, Tom <Tom.StDenis at amd.com> wrote:
> 
>> sudo strace umr -R gfx[.] 2>&1 | tee strace.log
>>
>> will capture everything.
>>
>> In the mean time I can fix at least the segfault.
>>
>> The issue is why can't it open "amdgpu_ring_gfx".
>>
>> Tom
>>
> 
> strace file is attached here.

Well yup the kernel is not letting you open the files:

openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_gca_config", O_RDONLY) 
= -1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs_didt", O_RDWR) = 
-1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs_pcie", O_RDWR) = 
-1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs_smc", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_sensors", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_wave", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_vram", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_gpr", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_iova", O_RDWR) = -1 
ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_iomem", O_RDWR) = -1 
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:0b:00.0/vbios_version", 
O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "xxx-xxx-xxx\n", 4096)          = 12
close(3)                                = 0
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_firmware_info", 
O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "VCE feature version: 0, firmware"..., 4096) = 1059
read(3, "", 4096)                       = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_gca_config", O_RDONLY) 
= -1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_ring_gfx", O_RDWR) = -1 
EPERM (Operation not permitted)

As sudo/root you should be able to open these files with umr.  What 
happens if you just open a shell as root and run it?



Tom


More information about the amd-gfx mailing list