<div dir="ltr">Hi Alex,<div><br></div><div>Thx for your reply, but in all of the cases you mentioned, the user would still</div><div>be able to reboot properly ( i.e. typing reboot or a magic keyboard key)</div><div>or to have a trace of a kernel panic if it happens, is it correct ?</div><div><div class="gmail_extra"><br></div><div class="gmail_extra">Thx</div><div class="gmail_extra">Julien</div><div class="gmail_extra"><br><div class="gmail_quote">On 9 November 2017 at 18:08, Alex Deucher <span dir="ltr"><<a href="mailto:alexdeucher@gmail.com" target="_blank">alexdeucher@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Thu, Nov 9, 2017 at 4:35 AM, Julien Isorce <<a href="mailto:julien.isorce@gmail.com" target="_blank">julien.isorce@gmail.com</a>> wrote:<br> > Hi Monk.<br> ><br> > I am interested on this. Currently when a "ring X stalled for more than N<br> > sec" happens it usually goes into the gpu reset routine.<br> > Does it always cause the vram to be lost ? Could you explain what happens if<br> > the vram remains lost ?<br> <br> </span>It means the contents of vram are gone or unreliable. In that case<br> applications need to re-initialize all of their buffers before<br> submitting any work. You really need to add GL_robustness support to<br> any applications you care about. Whether vram is lost or not depends<br> on the reset method and the asic. E.g., soft reset of a specific<br> engine won't cause a loss of vram, but a full adapter reset or an FLR<br> may.<br> <span><br> ><br> > I am asking this because I experienced some recurrent gpu reset that are<br> > marked succeeded from the log but fail in the "resume" step.<br> > I would not be interested in this if it would always leave a chance to the<br> > user to cleanly reboot the machine.<br> ><br> > The issue is that it can require a hard reboot without kernel panic and<br> > without keeping the keyboard responding to magic keys.<br> > Are those patches trying to address this issue ?<br> ><br> > Note that here "issue" is not referring to the root cause of a ring X<br> > stalled and it is also not referring to why "resume" step fails.<br> <br> </span>There were a few issues that caused problems with GPU reset. The<br> biggest was that the GPU scheduler deadlocked in certain cases so if<br> you got a GPU hang, the driver locked up. That should mostly be<br> straightened out at this point. I think there may still be some<br> deadlocks in the modesetting code after a reset. Once that is sorted,<br> it will come down to fine tuning the actual reset sequences. Full<br> adapter resets are the easiest to get working reliably (and are<br> already implemented in the driver), but also the most destructive.<br> <span class="m_-6113423385651225305HOEnZb"><font color="#888888"><br> Alex<br> </font></span><div class="m_-6113423385651225305HOEnZb"><div class="m_-6113423385651225305h5"><br> ><br> > Thx a lot<br> > Julien<br> ><br> ><br> > On 30 October 2017 at 04:15, Monk Liu <<a href="mailto:Monk.Liu@amd.com" target="_blank">Monk.Liu@amd.com</a>> wrote:<br> >><br> >> *** job skipping logic in scheduler part is re-implemented ***<br> >><br> >> Monk Liu (7):<br> >> amd/scheduler:imple job skip feature(v3)<br> >> drm/amdgpu:implement new GPU recover(v3)<br> >> drm/amdgpu:cleanup in_sriov_reset and lock_reset<br> >> drm/amdgpu:cleanup ucode_init_bo<br> >> drm/amdgpu:block kms open during gpu_reset<br> >> drm/amdgpu/sriov:fix memory leak in psp_load_fw<br> >> drm/amdgpu:fix random missing of FLR NOTIFY<br> >><br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu.h | 9 +-<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_device.c | 311<br> >> ++++++++++++--------------<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_fence.c | 10 +-<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_irq.c | 2 +-<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_job.c | 18 +-<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_kms.c | 3 +<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_psp.c | 22 +-<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_ucode.c | 4 +-<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_virt.c | 2 -<br> >> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_virt.h | 2 -<br> >> drivers/gpu/drm/amd/amdgpu/gfx<wbr>_v8_0.c | 6 +-<br> >> drivers/gpu/drm/amd/amdgpu/gfx<wbr>_v9_0.c | 6 +-<br> >> drivers/gpu/drm/amd/amdgpu/mxg<wbr>pu_ai.c | 16 +-<br> >> drivers/gpu/drm/amd/amdgpu/mxg<wbr>pu_vi.c | 2 +-<br> >> drivers/gpu/drm/amd/scheduler/<wbr>gpu_scheduler.c | 39 ++--<br> >> 15 files changed, 220 insertions(+), 232 deletions(-)<br> >><br> >> --<br> >> 2.7.4<br> >><br> >> ______________________________<wbr>_________________<br> >> amd-gfx mailing list<br> >> <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br> >> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br> ><br> ><br> ><br> > ______________________________<wbr>_________________<br> > amd-gfx mailing list<br> > <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br> ><br> </div></div></blockquote></div><br></div></div></div>