[Intel-gfx] [PATCH v2] drm/i915: Avoid GPU hang when coming out of S3 or S4

Antoine, Peter peter.antoine at intel.com
Tue May 5 00:05:07 PDT 2015


On Mon, 2015-05-04 at 16:55 +0200, Daniel Vetter wrote:
> On Wed, Apr 29, 2015 at 12:39:17PM +0100, Chris Wilson wrote:
> > 
> > On Wed, Apr 29, 2015 at 02:07:19PM +0300, David Weinehall wrote:
> > > On Tue, Apr 28, 2015 at 03:46:46PM +0100, Chris Wilson wrote:
> > > > On Tue, Apr 28, 2015 at 02:38:25PM +0000, Antoine, Peter wrote:
> > > > > So is the plan to push these patches and have follow-on work to cover the other paths?
> > > > > As this fixes the Bugzilla issue that has been raised.
> > > > 
> > > > You've identified an issue, but I think your patch is incomplete.
> > > 
> > > I've tried my best to go through the remaining similar-looking code,
> > > but the rest seems fine (I might've missed something though).
> > > 
> > > The only thing I reacted on was that in intel_runtime_resume() the call
> > > to intel_init_pch_refclk() is conditional on IS_GEN6(), but none of the
> > > other invocations of intel_init_pch_refclk() are.  The commit message
> > > doesn't seem to provide a sufficient explanation for why this is so.
> > 
> > The explanation for moving init_hw() was that it setups clock_gating. If
> > any in that path are required for enabling the rings, those should be move to
> > init_render_ring() or the init_context.
> 
> Yeah we've had this fun before. init_clock_gating is _not_ for GT
> workarounds, only for modeset workarounds. We might need to rename that
> hook to avoid getting bitten for eternity, but moving init_hw because of
> clock_gating to get the rings up and running is bogus.
> 
> As Chris said we need to identify which bits need to get moved to the ring
> init or w/a bb code and do that (and in a separate patch from enabling the
> interrupts early enough like this one here does).
> -Daniel

Ok. More work is required.

But, we have two issues here. The open and accessible (to any user)
ioctl that causes the driver (and in testing the kernel) to misbehave
and cause the system to become unresponsive need or to reboot.This is
covered by a simple de-reference check that protects the driver and the
system.

Secondly, a nice to have, which is the hw-lock/context code that seems
to have more issues with it and should be turned off when it is not
required. This code has subtle connections that will need more work. The
un-niceness of this code has been known about for a while and it would
be good to turn off.

The first I would suggest is kind of important as is simply exploitable
security hole, the second is just probably full of security holes.

Can we split this into two jobs, fix the actionable security hole, and
get back to the nice to have later.

Peter.


More information about the Intel-gfx mailing list