[Intel-gfx] [PATCH 01/12] drm/i915: Assert mutex_is_locked on context lookup

Mon May 6 11:40:06 CEST 2013

On Thu, May 02, 2013 at 01:27:32PM -0700, Jesse Barnes wrote:
> On Tue, 23 Apr 2013 23:15:29 -0700
> Ben Widawsky <ben at bwidawsk.net> wrote:
> 
> > Because our context refcounting doesn't grab a ref at lookup time, it is
> > unsafe to do so without the lock.
> > 
> > NOTE: We don't have an easy way to put the assertion in the lookup
> > function which is where this really belongs. Context switching is good
> > enough because it actually asserts even more correctness by protecting
> > the default_context.
> > 
> > Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_context.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index a1e8ecb..411ace0 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -444,6 +444,8 @@ int i915_switch_context(struct intel_ring_buffer *ring,
> >  	if (dev_priv->hw_contexts_disabled)
> >  		return 0;
> >  
> > +	BUG_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> > +
> >  	if (ring != &dev_priv->ring[RCS])
> >  		return 0;
> >  
> 
> Simple enough.
> 
> Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org>
> 
> We usually do WARN_ONs for this stuff though, in case a user actually
> does hit it, it may not be fatal so why crash the machine?
> 
> But that's a minor distinction since we shouldn't hit this except in
> development anyway.

Well, since this is a patch for upstream the focus should very much be on
supporting bug reporters and not developers. And for bug reporters a BUG
is much more annoying than a WARN and greatly reduces the chances that
we'll get a bug report.

There are imo only very few cases where a BUG instead of a WARN is
justified:
- The kernel is _guaranteed_ to oops in the next few lines anyway, so a
  BUG_ON will help in readability of the backtrace. Note that checking 3
  different things for non-NULL in the same BUG actually reduces OOPS
  readability (with an oops you can at least reconstruct the faulting
  address and so probably the pointer). Also, this means the BUG should
  have a neat description of what exactly blew up.

  The "a few lines" part is just a guideline with some big exceptions. A
  prime example is refcount over/underflows since those will blow up, but
  only sometimes later (and usually no one will have a clue why).

- BUGs are justified if there's a potential security hole awaiting, e.g.
  when something in the userspace input validation has gone wrong.

- I'm wary of special error handling for WARNs, but if an early return
  (with an error code if possible) transforms a BUG into a WARN I'm in.
  But trying to fix up e.g. modeset state is usually futile, since we'll
  end up with a black/fuzzy/corrupted screen most likely anyway.

But the most important rule is: In case of doubt, just WARN, don't BUG.

[I know, I violate it sometimes, too.]

Patch applied with the s/BUG/WARN bikeshed.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch