[Intel-xe] [PATCH 0/1] Remove uses of BUG_ON

Rodrigo Vivi rodrigo.vivi at intel.com
Mon Jul 24 18:24:34 UTC 2023


On Fri, Jul 21, 2023 at 03:50:42PM +0000, Matthew Brost wrote:
> On Fri, Jul 21, 2023 at 01:20:29PM +0000, Francois Dugast wrote:
> > This is a first pass on removing BUG_ON. It is replaced with a call to
> > drm_err() and a return. Feedback on this is welcome before removing
> > remaining uses of BUG_ON, which will require more manual and specific
> > work.
> > 
> 
> Personally I don't like this at all, agree the BUG_ON should be removed
> but IMO we just replace these with a XE_WARN_ON that gets compiled out
> of non-debug builds.

I disagree here. The BUG_ON cases are in general cases that were marked
as possibly catastrophic, hence the execution should be stopped. But
instead, we need to avoid the code after and exit gracefully with some
error message.

With this in mind I like the initial proposal of this patch here.
I just believe we need to go and change message by message.

>
I like WARN/BUG as they act more like a self
> documenting assert of the codes usage plus if they get trigger we get

The if conditions here are pretty clear in the code as well.
no reason to be some macro in capital letters.

> the call stack which usuaully help diagnosis the issue and disappear in
> non-debug builds.

well, the bug cases should be the catastrophic cases, so in general
I don't see why we should avoid in production mode the cases where
we simply print error and return gracefully and avoiding the
catastrophic scenarios,.

For warns, we should use that in cases where the information of the
stack of the callers is a helpful information for debugging or
triage. And this is usually helpful on production cases as well
to receive bug reports with helpful information.

Also, in general we shouldn't need to disable any kind of message
in production because of the bug report case mentioned above. What
we need to do instead is to carefully mark the debug level for the
drm debug to take care of the messages. So a bug reporter could
enable the right level of debug and collect all the useful info
for us.

That said, I need to say that I hate that we have XE_ variants of
BUG_ON() and WARN_ON(). We need to get rid of this things and use
only the drm and kernel standards.

> 
> Let's see if the team agrees.
> 
> Matt
> 
> > For this pass, most of the changes were automated with coccinelle using:
> > 
> > 	@notpossible@
> > 	@@
> > 
> > 	- XE_BUG_ON("NOT POSSIBLE");
> > 	+ drm_err(&vm->xe->drm, "NOT POSSIBLE");
> > 	+ return -EINVAL;
> > 
> > 	@e@
> > 	identifier macro =~ "^XE_BUG_ON$";
> > 	expression cond;
> > 	@@
> > 	macro(cond)
> > 
> > 	@script : python q@
> > 	cond << e.cond;
> > 	cond_expr;
> > 	@@
> > 	coccinelle.cond_expr = cocci.make_expr("\""+cond.replace(" ", "")+"\"");
> > 
> > 	@replace_in_func_return_struct@
> > 	identifier e.macro;
> > 	expression e.cond;
> > 	expression q.cond_expr;
> > 	identifier func;
> > 	identifier a;
> > 	@@
> > 
> > 	struct a *func(...) {
> > 	...
> > 	- macro(cond);
> > 	+ if (cond) {
> > 	+ drm_err(&xe->drm, cond_expr);
> > 	+ return NULL;
> > 	+ }
> > 	...
> > 	}
> > 
> > 	@replace_in_func_return_void@
> > 	identifier e.macro;
> > 	expression e.cond;
> > 	expression q.cond_expr;
> > 	identifier func;
> > 	@@
> > 
> > 	void func(...) {
> > 	...
> > 	- macro(cond);
> > 	+ if (cond) {
> > 	+ drm_err(&xe->drm, cond_expr);
> > 	+ return;
> > 	+ }
> > 	...
> > 	}
> > 
> > 	@replace_in_func_return_other@
> > 	identifier e.macro;
> > 	expression e.cond;
> > 	expression q.cond_expr;
> > 	@@
> > 
> > 	- macro(cond);
> > 	+ if (cond) {
> > 	+ drm_err(&xe->drm, cond_expr);
> > 	+ return -EINVAL;
> > 	+ }
> > 
> > Francois Dugast (1):
> >   drm/xe: Remove uses of BUG_ON
> > 
> >  drivers/gpu/drm/xe/xe_bo.c                  | 105 ++++++++++++----
> >  drivers/gpu/drm/xe/xe_bo_evict.c            |  10 +-
> >  drivers/gpu/drm/xe/xe_execlist.c            |  22 +++-
> >  drivers/gpu/drm/xe/xe_force_wake.c          |  10 +-
> >  drivers/gpu/drm/xe/xe_gt_clock.c            |   5 +-
> >  drivers/gpu/drm/xe/xe_gt_debugfs.c          |   5 +-
> >  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |  31 ++++-
> >  drivers/gpu/drm/xe/xe_guc.c                 |  32 +++--
> >  drivers/gpu/drm/xe/xe_guc_ads.c             |  35 ++++--
> >  drivers/gpu/drm/xe/xe_guc_hwconfig.c        |   5 +-
> >  drivers/gpu/drm/xe/xe_guc_submit.c          |  95 +++++++++++----
> >  drivers/gpu/drm/xe/xe_huc.c                 |   5 +-
> >  drivers/gpu/drm/xe/xe_migrate.c             |  61 ++++++++--
> >  drivers/gpu/drm/xe/xe_sched_job.c           |   9 +-
> >  drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c      |  10 +-
> >  drivers/gpu/drm/xe/xe_vm.c                  | 125 +++++++++++++++-----
> >  drivers/gpu/drm/xe/xe_wopcm.c               |  45 +++++--
> >  17 files changed, 482 insertions(+), 128 deletions(-)
> > 
> > -- 
> > 2.34.1
> > 


More information about the Intel-xe mailing list