[PATCH 0/7] *** GPU recover V3 ***

Julien Isorce julien.isorce at gmail.com
Mon Nov 13 11:24:58 UTC 2017


Hi Alex,

Thx for your reply, but in all of the cases you mentioned, the user would
still
be able to reboot properly ( i.e. typing reboot or a magic keyboard key)
or to have a trace of a kernel panic if it happens, is it correct ?

Thx
Julien

On 9 November 2017 at 18:08, Alex Deucher <alexdeucher at gmail.com> wrote:

> On Thu, Nov 9, 2017 at 4:35 AM, Julien Isorce <julien.isorce at gmail.com>
> wrote:
> > Hi Monk.
> >
> > I am interested on this. Currently when a "ring X stalled for more than N
> > sec" happens it usually goes into the gpu reset routine.
> > Does it always cause the vram to be lost ? Could you explain what
> happens if
> > the vram remains lost ?
>
> It means the contents of vram are gone or unreliable.  In that case
> applications need to re-initialize all of their buffers before
> submitting any work.  You really need to add GL_robustness support to
> any applications you care about.  Whether vram is lost or not depends
> on the reset method and the asic.  E.g., soft reset of a specific
> engine won't cause a loss of vram, but a full adapter reset or an FLR
> may.
>
> >
> > I am asking this because I experienced some recurrent gpu reset that are
> > marked succeeded from the log but fail in the "resume" step.
> > I would not be interested in this if it would always leave a chance to
> the
> > user to cleanly reboot the machine.
> >
> > The issue is that it can require a hard reboot without kernel panic and
> > without keeping the keyboard responding to magic keys.
> > Are those patches trying to address this issue ?
> >
> > Note that here "issue" is not referring to the root cause of a ring X
> > stalled and it is also not referring to why "resume" step fails.
>
> There were a few issues that caused problems with GPU reset.  The
> biggest was that the GPU scheduler deadlocked in certain cases so if
> you got a GPU hang, the driver locked up.  That should mostly be
> straightened out at this point.  I think there may still be some
> deadlocks in the modesetting code after a reset.  Once that is sorted,
> it will come down to fine tuning the actual reset sequences.  Full
> adapter resets are the easiest to get working reliably (and are
> already implemented in the driver), but also the most destructive.
>
> Alex
>
> >
> > Thx a lot
> > Julien
> >
> >
> > On 30 October 2017 at 04:15, Monk Liu <Monk.Liu at amd.com> wrote:
> >>
> >> *** job skipping logic in scheduler part is re-implemented  ***
> >>
> >> Monk Liu (7):
> >>   amd/scheduler:imple job skip feature(v3)
> >>   drm/amdgpu:implement new GPU recover(v3)
> >>   drm/amdgpu:cleanup in_sriov_reset and lock_reset
> >>   drm/amdgpu:cleanup ucode_init_bo
> >>   drm/amdgpu:block kms open during gpu_reset
> >>   drm/amdgpu/sriov:fix memory leak in psp_load_fw
> >>   drm/amdgpu:fix random missing of FLR NOTIFY
> >>
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   9 +-
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 311
> >> ++++++++++++--------------
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c     |  10 +-
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c       |   2 +-
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c       |  18 +-
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   3 +
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       |  22 +-
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c     |   4 +-
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c      |   2 -
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h      |   2 -
> >>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |   6 +-
> >>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         |   6 +-
> >>  drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c         |  16 +-
> >>  drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c         |   2 +-
> >>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c |  39 ++--
> >>  15 files changed, 220 insertions(+), 232 deletions(-)
> >>
> >> --
> >> 2.7.4
> >>
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >
> >
> >
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20171113/fe9bb579/attachment.html>


More information about the amd-gfx mailing list