[Intel-xe] [RFC 4/5] drm/xe: Remove useless XE_BUG_ON.

Wed Mar 29 09:31:01 UTC 2023

On Tue, 28 Mar 2023, Michal Wajdeczko <michal.wajdeczko at intel.com> wrote:
> On 28.03.2023 22:27, Vivi, Rodrigo wrote:
>> On Tue, 2023-03-28 at 13:24 -0700, Matt Roper wrote:
>>> On Tue, Mar 28, 2023 at 12:10:20PM -0400, Rodrigo Vivi wrote:
>>>> If that becomes needed for some reason we bring it
>>>> back with some written reasoning.
>>>
>>> From a quick skim through this patch, most/all of these shouldn't be
>>> BUG_ON either.  These are assertions that we don't expect to get
>>> triggered, but if we do screw up somewhere we shouldn't be bringing
>>> down
>>> the entire machine; a WARN (and possibly an early exit) would be more
>>> appropriate for most of these.
>> 
>> yeap! I fully agree on that. I get frustrated when I hit one of these
>> BUG_ONs that should be a graceful exit with a warn without a panic...
>
> Recently there was another discussion with proposal to introduce
> XE_ASSERT as a replacement of XE_BUG_ON - is this still considered ?
>
> We likely don't want to pollute production driver with too many
> redundant BUG_ON/WARN_ON, but still want be paranoid on debug builds
> (with just WARNs and continuing until the unavoidable crash).

There are a number of related factors here. From least subjective to
most subjective:

First, the trend in kernel is to pretty much never use BUG_ON. The idea
is that you WARN_ON, and it's the userspace policy to set panic_on_warn
to oops. This includes the CI.

Second, each of the macros could use a comment describing what it does,
what it does not, what it should be used for, and what not. Currently
there is zero, neither in xe or i915. Everyone just figures it out for
themselves or cargo-cults.

Third, I think having *BUG_ON/*WARN_ON in the name of a local macro that
behaves differently from the originals is misleading. To this end I
suggested naming it ASSERT something or other to model it after C
standard library assert(3) that generates no code for NDEBUG. IMO it
implies debug build behaviour better than *BUG_ON. I think the current
*BUG_ON/*WARN_ON give a false sense of security regarding input
validation.

(I understand the need for asserts that generate no code for non-debug
builds when the asserts have a performance impact.)

Fourth, I do think the current *BUG_ONs are being used too
liberally. They're everywhere, so more is added everywhere. That's the
example being followed. Shouldn't happen so no harm in adding a check,
right? Well, I'm not so sure about that. There are 1300+ GEM_BUG_ON's
and GEM_WARN_ON's in i915. (Of which only 4 under display, but that's
probably due to the "GEM" naming as well as my opinion of them.)

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center