[PATCH 0/7] *** GPU recover V3 ***
Alex Deucher
alexdeucher at gmail.com
Mon Nov 13 15:23:00 UTC 2017
On Mon, Nov 13, 2017 at 6:24 AM, Julien Isorce <julien.isorce at gmail.com> wrote:
> Hi Alex,
>
> Thx for your reply, but in all of the cases you mentioned, the user would
> still
> be able to reboot properly ( i.e. typing reboot or a magic keyboard key)
> or to have a trace of a kernel panic if it happens, is it correct ?
Yes, the deadlock in the GPU scheduler was the issue preventing that
from working properly.
Alex
>
> Thx
> Julien
>
> On 9 November 2017 at 18:08, Alex Deucher <alexdeucher at gmail.com> wrote:
>>
>> On Thu, Nov 9, 2017 at 4:35 AM, Julien Isorce <julien.isorce at gmail.com>
>> wrote:
>> > Hi Monk.
>> >
>> > I am interested on this. Currently when a "ring X stalled for more than
>> > N
>> > sec" happens it usually goes into the gpu reset routine.
>> > Does it always cause the vram to be lost ? Could you explain what
>> > happens if
>> > the vram remains lost ?
>>
>> It means the contents of vram are gone or unreliable. In that case
>> applications need to re-initialize all of their buffers before
>> submitting any work. You really need to add GL_robustness support to
>> any applications you care about. Whether vram is lost or not depends
>> on the reset method and the asic. E.g., soft reset of a specific
>> engine won't cause a loss of vram, but a full adapter reset or an FLR
>> may.
>>
>> >
>> > I am asking this because I experienced some recurrent gpu reset that are
>> > marked succeeded from the log but fail in the "resume" step.
>> > I would not be interested in this if it would always leave a chance to
>> > the
>> > user to cleanly reboot the machine.
>> >
>> > The issue is that it can require a hard reboot without kernel panic and
>> > without keeping the keyboard responding to magic keys.
>> > Are those patches trying to address this issue ?
>> >
>> > Note that here "issue" is not referring to the root cause of a ring X
>> > stalled and it is also not referring to why "resume" step fails.
>>
>> There were a few issues that caused problems with GPU reset. The
>> biggest was that the GPU scheduler deadlocked in certain cases so if
>> you got a GPU hang, the driver locked up. That should mostly be
>> straightened out at this point. I think there may still be some
>> deadlocks in the modesetting code after a reset. Once that is sorted,
>> it will come down to fine tuning the actual reset sequences. Full
>> adapter resets are the easiest to get working reliably (and are
>> already implemented in the driver), but also the most destructive.
>>
>> Alex
>>
>> >
>> > Thx a lot
>> > Julien
>> >
>> >
>> > On 30 October 2017 at 04:15, Monk Liu <Monk.Liu at amd.com> wrote:
>> >>
>> >> *** job skipping logic in scheduler part is re-implemented ***
>> >>
>> >> Monk Liu (7):
>> >> amd/scheduler:imple job skip feature(v3)
>> >> drm/amdgpu:implement new GPU recover(v3)
>> >> drm/amdgpu:cleanup in_sriov_reset and lock_reset
>> >> drm/amdgpu:cleanup ucode_init_bo
>> >> drm/amdgpu:block kms open during gpu_reset
>> >> drm/amdgpu/sriov:fix memory leak in psp_load_fw
>> >> drm/amdgpu:fix random missing of FLR NOTIFY
>> >>
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 +-
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 311
>> >> ++++++++++++--------------
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 10 +-
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 2 +-
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 18 +-
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 3 +
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 22 +-
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +-
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 -
>> >> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 -
>> >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 +-
>> >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +-
>> >> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 16 +-
>> >> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 2 +-
>> >> drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 39 ++--
>> >> 15 files changed, 220 insertions(+), 232 deletions(-)
>> >>
>> >> --
>> >> 2.7.4
>> >>
>> >> _______________________________________________
>> >> amd-gfx mailing list
>> >> amd-gfx at lists.freedesktop.org
>> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> >
>> >
>> >
>> > _______________________________________________
>> > amd-gfx mailing list
>> > amd-gfx at lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> >
>
>
More information about the amd-gfx
mailing list