<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Hard lockup with radeonsi driver on FirePro W600, W9000 and W9100" href="https://bugs.freedesktop.org/show_bug.cgi?id=100465#c9">Comment # 9</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Hard lockup with radeonsi driver on FirePro W600, W9000 and W9100" href="https://bugs.freedesktop.org/show_bug.cgi?id=100465">bug 100465</a> from <a class="email" href="mailto:julien.isorce@gmail.com" title="Julien Isorce <julien.isorce@gmail.com>"> Julien Isorce</a> <pre>When using R600_DEBUG=check_vm on both Xorg and the gl app I can get some output in kern.log. It looks like a "ring 0 stalled" is detected and then follow a gpu softreset which succeeds ("GPU reset succeeded, trying to resume") but fails to resume because: [drm:atom_execute_table_locked [radeon]] [kworker/0:1H, 434] *ERROR* atombios stuck executing C483 (len 254, WS 0, PS 4) @ 0xC4AD [drm:atom_execute_table_locked [radeon]] [kworker/0:1H, 434] *ERROR* atombios stuck executing BC59 (len 74, WS 0, PS 8) @ 0xBC8E Then there is two: radeon_mc_wait_for_idle failure "Wait for MC idle timedout" from si_mc_program Finally si_startup fails because si_cp_resume fails because r600_ring_test fails with: "radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)" But it seems it keeps looping trying to do a gpu softreset and at some point it freezes. I need to confirm this ending scenario though but these atombios failures are worring in the first place. At the same time I get some "radeon_ttm_bo_destroy" notified by "WARN_ON(!list_empty(&bo->va));" from kernel radeon driver. So it seems to leak some buffers. I will attach the full log tomorrow, it is mess-up with my traces atm but the essential is above I hope. So I have 4 questions: 1: Can an application causes a "ring 0 stalled" ? or is it a driver bug (kernel side or mesa/drm or xserver) ? 2: About these atombios failures, does it mean that it fails to load the gpu microcode/firmware ? 3: Does it try to do a gpu softreset because I added R600_DEBUG=check_vm ? Or this one just help to flush the traces on vm fault (like mentioned in a commit msg related to that env var in mesa) ? 4: For the deallocation failure / leak above (radeon_ttm_bo_destroy warning), does it mean the memory is lost until next reboot or does a gpu soft reset allow to recover these leaks ? Thx !</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>