[Intel-gfx] [PATCH 1/7] drm/i915: add struct i915_ctx_hang_stats

Daniel Vetter daniel at ffwll.ch
Thu Jun 13 22:59:18 CEST 2013


On Thu, Jun 13, 2013 at 11:53:13AM +0200, Daniel Vetter wrote:
> On Wed, Jun 12, 2013 at 11:44:51PM +0100, Chris Wilson wrote:
> > On Wed, Jun 12, 2013 at 02:13:26PM -0700, Ben Widawsky wrote:
> > > On Wed, Jun 12, 2013 at 12:35:28PM +0300, Mika Kuoppala wrote:
> > > > To count context losses, add struct i915_ctx_hang_stats for
> > > > both i915_hw_context and drm_i915_file_private.
> > > > drm_i915_file_private is used when there is no context.
> > > > 
> > > > v2: renamed and cleaned up the struct (Chris Wilson, Ian Romanick)
> > > > 
> > > > Signed-off-by: Mika Kuoppala <mika.kuoppala at intel.com>
> > > >
> > > I don't have time to do a proper review before Daniel wants to merge
> > > these, and Chris has already reviewed it.
> > > 
> > > 1-6 are:
> > > Acked-by: Ben Widawsky <ben at bwidawsk.net>
> > > 
> > > I don't really like the behavior of 7. At least, I'd like to make it
> > > something that can be disabled via debugfs, sysfs, or module parameter.
> > > (I'd very much prefer it to be opt-in also).  TBH , I only read it very
> > > fast, and I'm not horribly opposed to it, just a bunch of complexity for
> > > IMO little gain. Presumably the problem it's trying to solve should be
> > > fixed with a fix to ddx, mesa, libva, client, whatever.  In the embedded
> > > case, the same thing applies.  Banning the guilty doesn't make the user
> > > experience any better. So the only thing I see is DoS, but we've never
> > > *really* made that our priority anyway, so, meh.
> > 
> > Right, it is policy. But it is existing policy. Ultimately we want to
> > get as much of that decision out of the kernel.
> 
> Merged the entire series with a little note added to this patch explaining
> the justification for it. Thanks for patches and review.

I think a careful review of the locking would be good here. I see a few
cases:
- dev_priv->hangcheck stats which are only touched by the hangcheck timer
  ever. Safe since the hangcheck timer is non-reentrant, but would be good
  to add a comment to those things.
- Data shared between hangcheck and reset work. Probably needs spinlock
  protection, I'd add a new one. I know that it's rather unlikely that
  we'll race, but if we e.g. increase the hangcheck rate a lot we might
  fire the next hangcheck before the reset work has completed. Or the
  scheduler could be really annoying and not give the work cpu time for a
  while.
- Guilty context stats in the context structs. Atm this is protected by
  dev->struct_mutex I think, but I'd like to not proliferate the use of
  that lock if possible (there's _way_ too much legacy bagadge attached to
  this thing). So I'd vote for the introduction of a new mutex.
- Anyting else I've missed in my quick review.

I think a patch for each class of data explaining how it's accessed and
how it's protected exactly (both in the commit message and with a short
comment in the headers) would be really fine.

Volunteered?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list