[Intel-gfx] Possible i915 regression with 4.4-rc

Takashi Iwai tiwai at suse.de
Fri Dec 4 08:12:26 PST 2015


On Fri, 04 Dec 2015 17:02:52 +0100,
Daniel Vetter wrote:
> 
> On Fri, Dec 04, 2015 at 11:40:59AM +0200, Ville Syrjälä wrote:
> > On Fri, Dec 04, 2015 at 10:49:48AM +0200, Jani Nikula wrote:
> > > On Thu, 03 Dec 2015, Ville Syrjälä <ville.syrjala at linux.intel.com> wrote:
> > > > On Thu, Dec 03, 2015 at 09:00:55PM +0100, Takashi Iwai wrote:
> > > >> Hi,
> > > >> 
> > > >> I've experienced a few graphics issues recently, and I tend to believe
> > > >> that it has happened since 4.4-rc.  Namely, after some long time usage
> > > >> on my HSW laptop (two or three days), the mouse cursor vanished
> > > >> suddenly.  It kept pointing but just became invisible.  Also, after
> > > >> some S3 cycles, some glyphs on a console or on Firefox became
> > > >> invisible, too.  The windows and graphics were shown well, and X core
> > > >> fonts were still shown properly, too.  Switching to VT1 and back
> > > >> didn't change the situation.
> > > >
> > > > I think I have a fix for this *very* annoying problem. I'v been cursing
> > > > on irc for weeks about it, until I finally got off my arse and debugged
> > > > it.
> > > >
> > > > I pushed out my my cursor branch:
> > > > git://github.com/vsyrjala/linux.git disappearing_cursor_fix
> > > >
> > > > It has lots of other junk too, but it should be just there two that fix it:
> > > > 59f65fa270fb ("drm/i915: Kill intel_crtc->cursor_bo")
> > > > 25651a198d17 ("drm/i915: Drop the broken curcor base==0 special casing")
> > > >
> > > > Unfortunatleey I've managed to keep myself busy on other stuff, so didn't
> > > > send them out yet. Maybe tomorrow...
> > > 
> > > So I've hit this too, albeit very rarely, on a Haswell running Debian
> > > stable with the stock v3.16 kernel. Haven't seen it on any other
> > > machine. It's really too rare to even debug or verify a fix. Is it
> > > possible we just happened to make an old bug occur more frequently now?
> > 
> > The potential for it has definitely been there for a long time.
> 
> Oh dear, let's have fun and look at some awful history.
> 
> commit e568af1c626031925465a5caaab7cca1303d55c7
> Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> Date:   Wed Mar 26 20:08:20 2014 +0100
> 
>     drm/i915: Undo gtt scratch pte unmapping again
> 
> Which essentially reverted
> 
> commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
> Author: Ben Widawsky <benjamin.widawsky at intel.com>
> Date:   Wed Oct 16 09:21:30 2013 -0700
> 
>     drm/i915: Disable GGTT PTEs on GEN6+ suspend
>     
>     Once the machine gets to a certain point in the suspend process, we
>     expect the GPU to be idle. If it is not, we might corrupt memory.
>     Empirically (with an early version of this patch) we have seen this is
>     not the case. We cannot currently explain why the latent GPU writes
>     occur.
>     
>     In the technical sense, this patch is a workaround in that we have an
>     issue we can't explain, and the patch indirectly solves the issue.
>     However, it's really better than a workaround because we understand why
>     it works, and it really should be a safe thing to do in all cases.
>     
>     The noticeable effect other than the debug messages would be an increase
>     in the suspend time. I have not measure how expensive it actually is.
>     
>     I think it would be good to spend further time to root cause why we're
>     seeing these latent writes, but it shouldn't preclude preventing the
>     fallout.
>     
>     NOTE: It should be safe (and makes some sense IMO) to also keep the
>     VALID bit unset on resume when we clear_range(). I've opted not to do
>     this as properly clearing those bits at some later point would be extra
>     work.
>     
>     v2: Fix bugzilla link
>     
>     Bugzilla: http://bugs.freedesktop.org/show_bug.cgi?id=65496
>     Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=59321
>     Tested-by: Takashi Iwai <tiwai at suse.de>
>     Tested-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
>     Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
>     Tested-By: Todd Previte <tprevite at gmail.com>
>     Cc: stable at vger.kernel.org
>     Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> This was a regression in a regression right before I ragequit the entire
> bug handling deal because no one cared any more and management was all
> "why is this important".
> 
> Would be interesting if these issues magically disapper when changing that
> back again. Doesn't mean that we're any closer to figuring out what's
> corrupting what exactly here, but at least we'd have a reason to digg out
> this old sob story of mine.

Hm, but this revert was also fairly ago, and I don't remember of the
similar breakage until 4.4-rc.  Might be just a (bad) luck, though.

(And no surprise, I was already in the party above!  Everyone must
 have smoked badly there.)


thanks,

Takashi


More information about the Intel-gfx mailing list