[Intel-gfx] Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8
Larry Finger
Larry.Finger at lwfinger.net
Thu Mar 23 21:23:49 UTC 2017
On 03/23/2017 03:44 PM, Chris Wilson wrote:
> On Thu, Mar 23, 2017 at 01:19:43PM -0500, Larry Finger wrote:
>> Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered
>> intermittent hangs with the following information in the logs:
>>
>> linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in
>> plasmashell [1283], reason: Hang on render ring, action: reset
>> linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere
>> in the entire gfx stack, including userspace.
>> linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on
>> bugs.freedesktop.org against DRI -> DRM/Intel
>> linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign
>> to the right component if it's not a kernel issue.
>> linux-4v1g.suse kernel: [drm] The gpu crash dump is required to
>> analyze gpu hangs, so please always attach it.
>> linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
>> linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang
>>
>> This problem was added to
>> https://bugs.freedesktop.org/show_bug.cgi?id=99380, but it probably
>> is a different bug, as the OP in that report has problems with
>> kernel 4.10.x, whereas my problem did not appear until 4.11.
>
> Close. Actually that patch touches code you are not using (oa-perf and
> gvt), the real culprit was e8a9c58fcd9a ("drm/i915: Unify active context
> tracking between legacy/execlists/guc").
>
> The fix
>
> commit 5d4bac5503fcc67dd7999571e243cee49371aef7
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date: Wed Mar 22 20:59:30 2017 +0000
>
> drm/i915: Restore marking context objects as dirty on pinning
>
> Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between
> legacy/execlists/guc") converted the legacy intel_ringbuffer submission
> to the same context pinning mechanism as execlists - that is to pin the
> context until the subsequent request is retired. Previously it used the
> vma retirement of the context object to keep itself pinned until the
> next request (after i915_vma_move_to_active()). In the conversion, I
> missed that the vma retirement was also responsible for marking the
> object as dirty. Mark the context object as dirty when pinning
> (equivalent to execlists) which ensures that if the context is swapped
> out due to mempressure or suspend/hibernation, when it is loaded back in
> it does so with the previous state (and not all zero).
>
> Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc")
> Reported-by: Dennis Gilmore <dennis at ausil.us>
> Reported-by: Mathieu Marquer <mathieu.marquer at gmail.com>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: <drm-intel-fixes at lists.freedesktop.org> # v4.11-rc1
> Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> went in this morning and so will be upstreamed ~next week.
> -Chris
Thanks. With a bug that is difficult to trigger, bisection is difficult. I am
surprised that the only step I got wrong was the last one. BTW, my reversion
failed after 20 hours. I was ready to write again when I got your fix. Good timing.
If your patch does not fix my problem, I will let you know.
Larry
More information about the Intel-gfx
mailing list