<div dir="ltr">Hi Alex,<div><br></div><div>Thx for your reply, but in all of the cases you mentioned, the user would still</div><div>be able to reboot properly ( i.e. typing reboot or a magic keyboard key)</div><div>or to have a trace of a kernel panic if it happens, is it correct ?</div><div><div class="gmail_extra"><br></div><div class="gmail_extra">Thx</div><div class="gmail_extra">Julien</div><div class="gmail_extra"><br><div class="gmail_quote">On 9 November 2017 at 18:08, Alex Deucher <span dir="ltr"><<a href="mailto:alexdeucher@gmail.com" target="_blank">alexdeucher@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Thu, Nov 9, 2017 at 4:35 AM, Julien Isorce <<a href="mailto:julien.isorce@gmail.com" target="_blank">julien.isorce@gmail.com</a>> wrote:<br>
> Hi Monk.<br>
><br>
> I am interested on this. Currently when a "ring X stalled for more than N<br>
> sec" happens it usually goes into the gpu reset routine.<br>
> Does it always cause the vram to be lost ? Could you explain what happens if<br>
> the vram remains lost ?<br>
<br>
</span>It means the contents of vram are gone or unreliable. In that case<br>
applications need to re-initialize all of their buffers before<br>
submitting any work. You really need to add GL_robustness support to<br>
any applications you care about. Whether vram is lost or not depends<br>
on the reset method and the asic. E.g., soft reset of a specific<br>
engine won't cause a loss of vram, but a full adapter reset or an FLR<br>
may.<br>
<span><br>
><br>
> I am asking this because I experienced some recurrent gpu reset that are<br>
> marked succeeded from the log but fail in the "resume" step.<br>
> I would not be interested in this if it would always leave a chance to the<br>
> user to cleanly reboot the machine.<br>
><br>
> The issue is that it can require a hard reboot without kernel panic and<br>
> without keeping the keyboard responding to magic keys.<br>
> Are those patches trying to address this issue ?<br>
><br>
> Note that here "issue" is not referring to the root cause of a ring X<br>
> stalled and it is also not referring to why "resume" step fails.<br>
<br>
</span>There were a few issues that caused problems with GPU reset. The<br>
biggest was that the GPU scheduler deadlocked in certain cases so if<br>
you got a GPU hang, the driver locked up. That should mostly be<br>
straightened out at this point. I think there may still be some<br>
deadlocks in the modesetting code after a reset. Once that is sorted,<br>
it will come down to fine tuning the actual reset sequences. Full<br>
adapter resets are the easiest to get working reliably (and are<br>
already implemented in the driver), but also the most destructive.<br>
<span class="m_-6113423385651225305HOEnZb"><font color="#888888"><br>
Alex<br>
</font></span><div class="m_-6113423385651225305HOEnZb"><div class="m_-6113423385651225305h5"><br>
><br>
> Thx a lot<br>
> Julien<br>
><br>
><br>
> On 30 October 2017 at 04:15, Monk Liu <<a href="mailto:Monk.Liu@amd.com" target="_blank">Monk.Liu@amd.com</a>> wrote:<br>
>><br>
>> *** job skipping logic in scheduler part is re-implemented ***<br>
>><br>
>> Monk Liu (7):<br>
>> amd/scheduler:imple job skip feature(v3)<br>
>> drm/amdgpu:implement new GPU recover(v3)<br>
>> drm/amdgpu:cleanup in_sriov_reset and lock_reset<br>
>> drm/amdgpu:cleanup ucode_init_bo<br>
>> drm/amdgpu:block kms open during gpu_reset<br>
>> drm/amdgpu/sriov:fix memory leak in psp_load_fw<br>
>> drm/amdgpu:fix random missing of FLR NOTIFY<br>
>><br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu.h | 9 +-<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_device.c | 311<br>
>> ++++++++++++--------------<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_fence.c | 10 +-<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_irq.c | 2 +-<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_job.c | 18 +-<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_kms.c | 3 +<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_psp.c | 22 +-<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_ucode.c | 4 +-<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_virt.c | 2 -<br>
>> drivers/gpu/drm/amd/amdgpu/amd<wbr>gpu_virt.h | 2 -<br>
>> drivers/gpu/drm/amd/amdgpu/gfx<wbr>_v8_0.c | 6 +-<br>
>> drivers/gpu/drm/amd/amdgpu/gfx<wbr>_v9_0.c | 6 +-<br>
>> drivers/gpu/drm/amd/amdgpu/mxg<wbr>pu_ai.c | 16 +-<br>
>> drivers/gpu/drm/amd/amdgpu/mxg<wbr>pu_vi.c | 2 +-<br>
>> drivers/gpu/drm/amd/scheduler/<wbr>gpu_scheduler.c | 39 ++--<br>
>> 15 files changed, 220 insertions(+), 232 deletions(-)<br>
>><br>
>> --<br>
>> 2.7.4<br>
>><br>
>> ______________________________<wbr>_________________<br>
>> amd-gfx mailing list<br>
>> <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
>> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> amd-gfx mailing list<br>
> <a href="mailto:amd-gfx@lists.freedesktop.org" target="_blank">amd-gfx@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/amd-gfx" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/amd-gfx</a><br>
><br>
</div></div></blockquote></div><br></div></div></div>