[Intel-gfx] [PATCH] HAX drm/i915: Disable CSR (DMC) for Kabylake
Chris Wilson
chris at chris-wilson.co.uk
Thu Oct 12 21:07:16 UTC 2017
Quoting Rodrigo Vivi (2017-10-12 21:55:24)
> On Thu, Oct 12, 2017 at 07:43:04PM +0000, Chris Wilson wrote:
> > Quoting Rodrigo Vivi (2017-10-12 19:04:45)
> > > On Thu, Oct 12, 2017 at 10:18:13AM +0000, Chris Wilson wrote:
> > > > ---
> > >
> > > Why?
> >
> > Have you looked at the random but frequent mmio death on Kabylake?
> > Seems rather reminiscent of earlier DMC bugs.
>
> hm... Could you please give us an example?
> a link?
There's usually one per-shard run, in the baseline CI results for this
patch you can see one.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3222/shard-kbl4/dmesg14.log
The pattern I'm looking at starts with
<3>[ 160.736276] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:56:pipe C] flip_done timed out
usually from kms_flip, but it's probably anything that cycles the
powerwell at just the right frequency, leading to a GPU hang and
<3>[ 188.832189] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
and then repeats until the shard is rebooted.
> and why do you think that would be dmc?
That it always seems to be triggered from interaction with the display
powerwells.
> I've seen many bugs there on kbl, but mostly associated to LSPCON
> and link trainings, and gpu hangs... and the ones I looked now
> seemed that DC state was blocked.
Those DC hangs are suspected to DMC issues as I am sure you already know,
such as the one where it causes lost interrupts.
> But well... yeap... it is a black box right?! so the
> experiment is valid for sure.
It's a known broken blackbox that we are always waiting on for
bugfixes. Sounds familiar. :(
-Chris
More information about the Intel-gfx
mailing list