After Vega 56/64 GPU hang I unable reboot system

Wentland, Harry Harry.Wentland at amd.com
Mon Dec 17 18:51:51 UTC 2018


On 2018-12-15 4:42 a.m., Mikhail Gavrilov wrote:
> On Sat, 15 Dec 2018 at 00:36, Wentland, Harry <Harry.Wentland at amd.com> wrote:
>>
>> Looks like there's an error before this happens that might get us into this mess:
>>
>> [  229.741741] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=28686, emitted seq=28688
>> [  229.741806] [drm] GPU recovery disabled.
>>
>> Harry
> 
> Harry, Is this ever will be fixed?
> That annoying `ring gfx timeout` still follow me on all machines with
> Vega GPU more than year.
> Just yesterday I blocked the computer and went to sleep, at the
> morning I found out that I could not unlock the machine.
> After connected via ssh I saw again in the kernel log
> `[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled
> seq=32778472, emitted seq=32778474`
> It means that this bug may happens even it I doing nothing on my machine.
> 
> Should we wait for any improvement in localization this bug?
> Because I suppose message `[drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> ring gfx timeout, signaled seq=32778472, emitted seq=32778474` not
> contain any useful info for fixing this bug.
> 

I don't know much about ring gfx timeouts as my area of expertise revolves around the display side of things, not gfx.

Alex, Christian, any ideas?

Harry

> Thanks.
> 
> --
> Best Regards,
> Mike Gavrilov.
> 


More information about the amd-gfx mailing list