After Vega 56/64 GPU hang I unable reboot system
StDenis, Tom
Tom.StDenis at amd.com
Thu Dec 20 11:17:56 UTC 2018
On 2018-12-19 10:29 p.m., Mikhail Gavrilov wrote:
> On Thu, 20 Dec 2018 at 03:41, StDenis, Tom <Tom.StDenis at amd.com> wrote:
>
>> sudo strace umr -R gfx[.] 2>&1 | tee strace.log
>>
>> will capture everything.
>>
>> In the mean time I can fix at least the segfault.
>>
>> The issue is why can't it open "amdgpu_ring_gfx".
>>
>> Tom
>>
>
> strace file is attached here.
Well yup the kernel is not letting you open the files:
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_gca_config", O_RDONLY)
= -1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs_didt", O_RDWR) =
-1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs_pcie", O_RDWR) =
-1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_regs_smc", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_sensors", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_wave", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_vram", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_gpr", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_iova", O_RDWR) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_iomem", O_RDWR) = -1
EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/bus/pci/devices/0000:0b:00.0/vbios_version",
O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
read(3, "xxx-xxx-xxx\n", 4096) = 12
close(3) = 0
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_firmware_info",
O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "VCE feature version: 0, firmware"..., 4096) = 1059
read(3, "", 4096) = 0
close(3) = 0
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_gca_config", O_RDONLY)
= -1 EPERM (Operation not permitted)
openat(AT_FDCWD, "/sys/kernel/debug/dri/0/amdgpu_ring_gfx", O_RDWR) = -1
EPERM (Operation not permitted)
As sudo/root you should be able to open these files with umr. What
happens if you just open a shell as root and run it?
Tom
More information about the amd-gfx
mailing list