[Intel-gfx] BUG_ON vs WARN_ON (was: Re: [PATCH] drm/i915: Localise the fbdev console lock frobbing)

Thu Aug 14 16:18:46 CEST 2014

On Thu, Aug 14, 2014 at 01:07:06PM +0300, Jani Nikula wrote:
> On Thu, 14 Aug 2014, Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Thu, Aug 14, 2014 at 8:54 AM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> >> I disaggree with the conversion of the BUG_ON though, a WARN there is
> >> going to screw up unpredictably (well, a hard hang without any output
> >> is the predictable outcome). I'd like to have asserts for things that
> >> could and should be statically analyzed...
> >
> > Well I've put a zero-tolerance rule for BUG_ON into place with the
> > only exception if the kernel will die anyway in the next few lines.
> > Which means I trade in a limping (and potentially dangerous) kernel
> > for the ability to be able to read the backtrace somewhere. I agree
> > that any such extreme policy will end up looking stupid in some cases,
> > but I've just decided that I wasted too much time on chasing lookups
> > which would have been trivial to debug with a WARN_ON instead of a
> > BUG_ON.
> >
> > Until I've wasted too much time with WARN_ON instead of BUG_ON I'll
> > let it stick.  And it's supported by my patch scripts, so small chance
> > I'll miss one. Ofc I'll never change it without a notice in the commit
> > message, so people can always blame me for it.
> 
> In other words, WARN_ON is the new BUG_ON. But what's the new WARN_ON?
> Now we're conflating two things (limp home mode and crashing) into
> one. When I see WARN_ON in code, it's no longer clear to me whether this
> is a condition that we're supposed to survive or not. For example, does
> the code below a WARN_ON need to properly handle errors due to the
> condition? To me, BUG_ON is a code reading aide that sets the absolute
> precondition for the following code. Something that absolutely must be
> fixed if someone hits it, while WARN_ON can sometimes be ignored, or
> even replaced with DRM_DEBUG.

WARN_ON is what userspace hackers usually put into asserts - pre/post
conditions and invariants and stuff like that worth checking but not part
of the main logic. Occasionally it makes sense to have special logic in a
WARN_ON (e.g. when a refcount overflows it's better to leak it), but
usually not worth it.

A WARN_ON should never be on a level with DRM_DEBUG, but perhaps a
DRM_ERROR is more adequate if the backtrace is useless.

> If we have zero-tolerance for BUG_ON, and replace all of those with
> WARN_ON, we'll need to start being *very* selective about adding WARN_ON
> as well.

My rule of thumb for letting a BUG_ON survive is if I can see the kernel
Oopsing in the diff context, it stays. I don't see why that suddenly means
we have to sprinkle less of them, or of WARN_ONs.

So maybe you need to elaborate on your concern here a bit?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch