[Intel-xe] [RFC 4/5] drm/xe: Remove useless XE_BUG_ON.

Wed Mar 29 19:25:36 UTC 2023

On Wed, Mar 29, 2023 at 12:31:01PM +0300, Jani Nikula wrote:
> On Tue, 28 Mar 2023, Michal Wajdeczko <michal.wajdeczko at intel.com> wrote:
> > On 28.03.2023 22:27, Vivi, Rodrigo wrote:
> >> On Tue, 2023-03-28 at 13:24 -0700, Matt Roper wrote:
> >>> On Tue, Mar 28, 2023 at 12:10:20PM -0400, Rodrigo Vivi wrote:
> >>>> If that becomes needed for some reason we bring it
> >>>> back with some written reasoning.
> >>>
> >>> From a quick skim through this patch, most/all of these shouldn't be
> >>> BUG_ON either.  These are assertions that we don't expect to get
> >>> triggered, but if we do screw up somewhere we shouldn't be bringing
> >>> down
> >>> the entire machine; a WARN (and possibly an early exit) would be more
> >>> appropriate for most of these.
> >> 
> >> yeap! I fully agree on that. I get frustrated when I hit one of these
> >> BUG_ONs that should be a graceful exit with a warn without a panic...
> >
> > Recently there was another discussion with proposal to introduce
> > XE_ASSERT as a replacement of XE_BUG_ON - is this still considered ?
> >
> > We likely don't want to pollute production driver with too many
> > redundant BUG_ON/WARN_ON, but still want be paranoid on debug builds
> > (with just WARNs and continuing until the unavoidable crash).
> 
> There are a number of related factors here. From least subjective to
> most subjective:
> 
> First, the trend in kernel is to pretty much never use BUG_ON. The idea
> is that you WARN_ON, and it's the userspace policy to set panic_on_warn
> to oops. This includes the CI.
> 
> Second, each of the macros could use a comment describing what it does,
> what it does not, what it should be used for, and what not. Currently
> there is zero, neither in xe or i915. Everyone just figures it out for
> themselves or cargo-cults.
> 
> Third, I think having *BUG_ON/*WARN_ON in the name of a local macro that
> behaves differently from the originals is misleading. To this end I
> suggested naming it ASSERT something or other to model it after C
> standard library assert(3) that generates no code for NDEBUG. IMO it
> implies debug build behaviour better than *BUG_ON. I think the current
> *BUG_ON/*WARN_ON give a false sense of security regarding input
> validation.
> 
> (I understand the need for asserts that generate no code for non-debug
> builds when the asserts have a performance impact.)

But is this a problem only for i915 and xe? how other drivers are dealing
with this?

> 
> Fourth, I do think the current *BUG_ONs are being used too
> liberally. They're everywhere, so more is added everywhere. That's the
> example being followed. Shouldn't happen so no harm in adding a check,
> right? Well, I'm not so sure about that. There are 1300+ GEM_BUG_ON's
> and GEM_WARN_ON's in i915. (Of which only 4 under display, but that's
> probably due to the "GEM" naming as well as my opinion of them.)

should we already scrutinize all the XE_BUG_ON and move most of them
to XE_WARN_ON? then do the renaming? and probably create the assert?
or the other way around?

> 
> 
> BR,
> Jani.
> 
> 
> 
> -- 
> Jani Nikula, Intel Open Source Graphics Center