After Vega 56/64 GPU hang I unable reboot system

StDenis, Tom Tom.StDenis at amd.com
Thu Dec 20 14:08:38 UTC 2018


On 2018-12-20 9:06 a.m., Tom St Denis wrote:
> On 2018-12-20 6:45 a.m., Mikhail Gavrilov wrote:
>> On Thu, 20 Dec 2018 at 16:17, StDenis, Tom <Tom.StDenis at amd.com> wrote:
>>>
>>> Well yup the kernel is not letting you open the files:
>>>
>>>
>>> As sudo/root you should be able to open these files with umr.  What
>>> happens if you just open a shell as root and run it?
>>>
>>
>> [root at localhost ~]# touch /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>> [root at localhost ~]# cat /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>> cat: /sys/kernel/debug/dri/0/amdgpu_ring_gfx: Operation not permitted
>> [root at localhost ~]# ls -laZ /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>> -r--r--r--. 1 root root system_u:object_r:debugfs_t:s0 8204 Dec 20
>> 16:31 /sys/kernel/debug/dri/0/amdgpu_ring_gfx
>> [root at localhost ~]# getenforce
>> Permissive
>> [root at localhost ~]# /home/mikhail/packaging-work/umr/build/src/app/umr
>> -O verbose,halt_waves -wa
>> Cannot seek to MMIO address: Bad file descriptor
>> [ERROR]: Could not open ring debugfs fileSegmentation fault (core dumped)
>>
>> I am already tried launch `umr` under root user, but kernel don't let
>> open `amdgpu_ring_gfx` again.
>>
>> What else kernel options I should to check?
>>
>> I am also attached current kernel config to this message.
> 
> I can replicate this by doing
> 
> chmod u+s umr
> sudo ./umr -R gfx[.]
> 
> You need to remove the u+s bit you are literally not running umr as root!

Actually disregard that.  I'm confused at this point.

I run umr 100s of times a day on my devel box just fine as root.

Let me fiddle and see if I can sort this out.

Tom


More information about the amd-gfx mailing list