The problem "ring gfx timeout" are experienced yet another AMD GPU Vega 8 user

Mikhail Gavrilov mikhail.v.gavrilov at gmail.com
Thu Jul 18 20:38:54 UTC 2019


On Wed, 3 Jul 2019 at 23:57, Marek Olšák <maraeo at gmail.com> wrote:
>
> It looks like memory corruption. You can try to disable IOMMU in the BIOS.
>

We disabled IOMMU in the BIOS [1].
And was run the memory check with MemTest86.
MemTest86 did not find any memory problems [2].

But previously reported issue with GPU hanging, unfortunately, happens again.

[17571.578988] amdgpu 0000:08:00.0: [gfxhub] no-retry page fault
(src_id:0 ring:158 vmid:7 pasid:32776, for process hoi4 pid 9225
thread hoi4:cs0 pid 9226)
[17571.578992] amdgpu 0000:08:00.0:   in page starting at address
0x0000000044160000 from 27
[17571.578994] amdgpu 0000:08:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070153C
[17576.635622] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
Waiting for fences timed out.
[17581.755948] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
Waiting for fences timed out.
[17581.765672] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=1520345, emitted seq=1520347
[17581.765765] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process hoi4 pid 9225 thread hoi4:cs0 pid 9226
[17581.765766] [drm] GPU recovery disabled.
[17586.875783] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR*
Waiting for fences timed out.
[17592.005836] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout, signaled seq=1520345, emitted seq=1520347
[17592.005921] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process
information: process hoi4 pid 9225 thread hoi4:cs0 pid 9226
[17592.005923] [drm] GPU recovery disabled.


No more ideas on how memory may be corrupted?

Fresh logs uploaded here [3].

Thanks.

[1] https://postimg.cc/RJLYWgH7
[2] https://postimg.cc/Fk4qFM7F
[3] https://mega.nz/#F!8xphjAJL!7HVUz-NyRaICjCSu_x-fFA

--
Best Regards,
Mike Gavrilov.


More information about the amd-gfx mailing list