[Intel-gfx] Regression in i915 for 4.11-rc1 - bisected to commit 69df05e11ab8

Larry Finger Larry.Finger at lwfinger.net
Thu Mar 23 21:23:49 UTC 2017


On 03/23/2017 03:44 PM, Chris Wilson wrote:
> On Thu, Mar 23, 2017 at 01:19:43PM -0500, Larry Finger wrote:
>> Since kernel 4.11-rc1, my desktop (Plasma5/KDE) has encountered
>> intermittent hangs with the following information in the logs:
>>
>> linux-4v1g.suse kernel: [drm] GPU HANG: ecode 7:0:0xf3cffffe, in
>> plasmashell [1283], reason: Hang on render ring, action: reset
>> linux-4v1g.suse kernel: [drm] GPU hangs can indicate a bug anywhere
>> in the entire gfx stack, including userspace.
>> linux-4v1g.suse kernel: [drm] Please file a _new_ bug report on
>> bugs.freedesktop.org against DRI -> DRM/Intel
>> linux-4v1g.suse kernel: [drm] drm/i915 developers can then reassign
>> to the right component if it's not a kernel issue.
>> linux-4v1g.suse kernel: [drm] The gpu crash dump is required to
>> analyze gpu hangs, so please always attach it.
>> linux-4v1g.suse kernel: [drm] GPU crash dump saved to /sys/class/drm/card0/error
>> linux-4v1g.suse kernel: drm/i915: Resetting chip after gpu hang
>>
>> This problem was added to
>> https://bugs.freedesktop.org/show_bug.cgi?id=99380, but it probably
>> is a different bug, as the OP in that report has problems with
>> kernel 4.10.x, whereas my problem did not appear until 4.11.
>
> Close. Actually that patch touches code you are not using (oa-perf and
> gvt), the real culprit was e8a9c58fcd9a ("drm/i915: Unify active context
> tracking between legacy/execlists/guc").
>
> The fix
>
> commit 5d4bac5503fcc67dd7999571e243cee49371aef7
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date:   Wed Mar 22 20:59:30 2017 +0000
>
>     drm/i915: Restore marking context objects as dirty on pinning
>
>     Commit e8a9c58fcd9a ("drm/i915: Unify active context tracking between
>     legacy/execlists/guc") converted the legacy intel_ringbuffer submission
>     to the same context pinning mechanism as execlists - that is to pin the
>     context until the subsequent request is retired. Previously it used the
>     vma retirement of the context object to keep itself pinned until the
>     next request (after i915_vma_move_to_active()). In the conversion, I
>     missed that the vma retirement was also responsible for marking the
>     object as dirty. Mark the context object as dirty when pinning
>     (equivalent to execlists) which ensures that if the context is swapped
>     out due to mempressure or suspend/hibernation, when it is loaded back in
>     it does so with the previous state (and not all zero).
>
>     Fixes: e8a9c58fcd9a ("drm/i915: Unify active context tracking between legacy/execlists/guc")
>     Reported-by: Dennis Gilmore <dennis at ausil.us>
>     Reported-by: Mathieu Marquer <mathieu.marquer at gmail.com>
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99993
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100181
>     Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>     Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>     Cc: <drm-intel-fixes at lists.freedesktop.org> # v4.11-rc1
>     Link: http://patchwork.freedesktop.org/patch/msgid/20170322205930.12762-1-chris@chris-wilson.co.uk
>     Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>
> went in this morning and so will be upstreamed ~next week.
> -Chris

Thanks. With a bug that is difficult to trigger, bisection is difficult. I am 
surprised that the only step I got wrong was the last one. BTW, my reversion 
failed after 20 hours. I was ready to write again when I got your fix. Good timing.

If your patch does not fix my problem, I will let you know.

Larry




More information about the Intel-gfx mailing list