[Intel-xe] [PATCH v4 2/3] drm/xe: Introduce Xe assert macros

Michal Wajdeczko michal.wajdeczko at intel.com
Tue Sep 12 15:11:13 UTC 2023



On 12.09.2023 16:21, Rodrigo Vivi wrote:
> On Tue, Sep 12, 2023 at 03:48:01PM +0200, Michal Wajdeczko wrote:
>>
>>
>> On 12.09.2023 15:34, Jani Nikula wrote:
>>> On Tue, 12 Sep 2023, Michal Wajdeczko <michal.wajdeczko at intel.com> wrote:
>>>> On 12.09.2023 13:35, Jani Nikula wrote:
>>>>> On Tue, 12 Sep 2023, Francois Dugast <francois.dugast at intel.com> wrote:
>>>>>> From: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>>>>
>>>>>> As we are moving away from the controversial XE_BUG_ON macro,
>>>>>> relying just on WARN_ON or drm_err does not cover the cases
>>>>>> where we want to annotate functions with additional detailed
>>>>>> debug checks to assert that all prerequisites are satisfied,
>>>>>> without paying footprint or performance penalty on non-debug
>>>>>> builds, where all misuses introduced during code integration
>>>>>> were already fixed.
>>>>>>
>>>>>> Introduce family of Xe assert macros that try to follow classic
>>>>>> assert() utility and can be compiled out on non-debug builds.
>>>>>>
>>>>>> Macros are based on drm_WARN, but unlikely to origin, disallow
>>>>>> use in expressions since we will compile that code out.
>>>>>>
>>>>>> As we are operating on the xe pointers, we can print additional
>>>>>> information about the device, like tile or GT identifier, that
>>>>>> is not available from generic WARN report:
>>>>>>
>>>>>> [ ] xe 0000:00:02.0: [drm] Assertion `true == false` failed!
>>>>>>     platform: 1 subplatform: 1
>>>>>>     graphics: Xe_LP 12.00 step B0
>>>>>>     media: Xe_M 12.00 step B0
>>>>>>     display: enabled step D0
>>>>>>     tile: 0 VRAM 0 B
>>>>>>     GT: 0 type 1
>>>>>>
>>>>>> [ ] xe 0000:b3:00.0: [drm] Assertion `true == false` failed!
>>>>>>     platform: 7 subplatform: 3
>>>>>>     graphics: Xe_HPG 12.55 step A1
>>>>>>     media: Xe_HPM 12.55 step A1
>>>>>>     display: disabled step **
>>>>>>     tile: 0 VRAM 14.0 GiB
>>>>>>     GT: 0 type 1
>>>>>>
>>>>>> [ ] WARNING: CPU: 0 PID: 2687 at drivers/gpu/drm/xe/xe_device.c:281 xe_device_probe+0x374/0x520 [xe]
>>>>>> [ ] RIP: 0010:xe_device_probe+0x374/0x520 [xe]
>>>>>> [ ] Call Trace:
>>>>>> [ ]  ? __warn+0x7b/0x160
>>>>>> [ ]  ? xe_device_probe+0x374/0x520 [xe]
>>>>>> [ ]  ? report_bug+0x1c3/0x1d0
>>>>>> [ ]  ? handle_bug+0x42/0x70
>>>>>> [ ]  ? exc_invalid_op+0x14/0x70
>>>>>> [ ]  ? asm_exc_invalid_op+0x16/0x20
>>>>>> [ ]  ? xe_device_probe+0x374/0x520 [xe]
>>>>>> [ ]  ? xe_device_probe+0x374/0x520 [xe]
>>>>>> [ ]  xe_pci_probe+0x6e3/0x950 [xe]
>>>>>> [ ]  ? lockdep_hardirqs_on+0xc7/0x140
>>>>>> [ ]  pci_device_probe+0x9e/0x160
>>>>>> [ ]  really_probe+0x19d/0x400
>>>>>>
>>>>>> v2: use lowercase names
>>>>>> v3: apply xe coding style
>>>>>>
>>>>>> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
>>>>>> Cc: Oded Gabbay <ogabbay at kernel.org>
>>>>>> Cc: Jani Nikula <jani.nikula at intel.com>
>>>>>> Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>> Cc: Matthew Brost <matthew.brost at intel.com>
>>>>>> Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>>> Cc: Matt Roper <matthew.d.roper at intel.com>
>>>>>> Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>>> Acked-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>> ---
>>>>>>  drivers/gpu/drm/xe/xe_assert.h | 177 +++++++++++++++++++++++++++++++++
>>>>>>  1 file changed, 177 insertions(+)
>>>>>>  create mode 100644 drivers/gpu/drm/xe/xe_assert.h
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/xe/xe_assert.h b/drivers/gpu/drm/xe/xe_assert.h
>>>>>> new file mode 100644
>>>>>> index 000000000000..b2d3c9b82b31
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/gpu/drm/xe/xe_assert.h
>>>>>> @@ -0,0 +1,177 @@
>>>>>> +/* SPDX-License-Identifier: MIT */
>>>>>> +/*
>>>>>> + * Copyright © 2023 Intel Corporation
>>>>>> + */
>>>>>> +
>>>>>> +#ifndef _XE_ASSERT_H_
>>>>>> +#define _XE_ASSERT_H_
>>>>>> +
>>>>>> +#include <linux/string_helpers.h>
>>>>>> +
>>>>>> +#include <drm/drm_print.h>
>>>>>> +
>>>>>> +#include "xe_device_types.h"
>>>>>> +#include "xe_step.h"
>>>>>> +
>>>>>> +/**
>>>>>> + * DOC: Xe ASSERTs
>>>>>> + *
>>>>>> + * While Xe driver aims to be simpler than legacy i915 driver it is still
>>>>>> + * complex enough that some changes introduced while adding new functionality
>>>>>> + * could break the existing code.
>>>>>> + *
>>>>>> + * Adding &drm_WARN or &drm_err to catch unwanted programming usage could lead
>>>>>> + * to undesired increased driver footprint and may impact production driver
>>>>>> + * performance as this additional code will be always present.
>>>>>> + *
>>>>>> + * To allow annotate functions with additional detailed debug checks to assert
>>>>>> + * that all prerequisites are satisfied, without worrying about footprint or
>>>>>> + * performance penalty on production builds where all potential misuses
>>>>>> + * introduced during code integration were already fixed, we introduce family
>>>>>> + * of Xe assert macros that try to follow classic assert() utility:
>>>>>> + *
>>>>>> + *  * &xe_assert
>>>>>> + *  * &xe_tile_assert
>>>>>> + *  * &xe_gt_assert
>>>>>> + *
>>>>>> + * These macros are implemented on top of &drm_WARN, but unlikely to the origin,
>>>>>> + * warning is triggered when provided condition is false. Additionally all above
>>>>>> + * assert macros cannot be used in expressions or as a condition, since
>>>>>> + * underlying code will be compiled out on non-debug builds.
>>>>>> + *
>>>>>> + * Note that these macros are not intended for use to cover known gaps in the
>>>>>> + * implementation; for such cases use regular &drm_WARN or &drm_err and provide
>>>>>> + * valid safe fallback.
>>>>>> + *
>>>>>> + * Also in cases where performance or footprint is not an issue, developers
>>>>>> + * should continue to use the regular &drm_WARN or &drm_err to ensure that bug
>>>>>> + * reports from production builds will contain meagningful diagnostics data.
>>>>>> + *
>>>>>> + * Below code shows how asserts could help in debug to catch unplanned use::
>>>>>> + *
>>>>>> + *	static void one_igfx(struct xe_device *xe)
>>>>>> + *	{
>>>>>> + *		xe_assert(xe, xe->info.is_dgfx == false);
>>>>>> + *		xe_assert(xe, xe->info.tile_count == 1);
>>>>>> + *	}
>>>>>> + *
>>>>>> + *	static void two_dgfx(struct xe_device *xe)
>>>>>> + *	{
>>>>>> + *		xe_assert(xe, xe->info.is_dgfx);
>>>>>> + *		xe_assert(xe, xe->info.tile_count == 2);
>>>>>> + *	}
>>>>>> + *
>>>>>> + *	void foo(struct xe_device *xe)
>>>>>> + *	{
>>>>>> + *		if (xe->info.dgfx)
>>>>>> + *			return two_dgfx(xe);
>>>>>> + *		return one_igfx(xe);
>>>>>> + *	}
>>>>>> + *
>>>>>> + *	void bar(struct xe_device *xe)
>>>>>> + *	{
>>>>>> + *		if (drm_WARN_ON(xe->drm, xe->info.tile_count > 2))
>>>>>> + *			return;
>>>>>> + *
>>>>>> + *		if (xe->info.tile_count == 2)
>>>>>> + *			return two_dgfx(xe);
>>>>>> + *		return one_igfx(xe);
>>>>>> + *	}
>>>>>> + */
>>>>>> +
>>>>>> +#if IS_ENABLED(CONFIG_DRM_XE_DEBUG)
>>>>>> +#define __xe_assert_msg(xe, condition, msg, arg...) ({						\
>>>>>> +	(void)drm_WARN(&(xe)->drm, !(condition), "[" DRM_NAME "] Assertion `%s` failed!\n" msg,	\
>>>>>> +		       __stringify(condition), ## arg);						\
>>>>>> +})
>>>>>> +#else
>>>>>> +#define __xe_assert_msg(xe, condition, msg, arg...) ({						\
>>>>>> +	typecheck(struct xe_device *, xe);							\
>>>>>> +	BUILD_BUG_ON_INVALID(condition);							\
>>>>>> +})
>>>>>> +#endif
>>>>>> +
>>>>>> +/**
>>>>>> + * xe_assert - warn if condition is false when debugging.
>>>>>> + * @xe: the &struct xe_device pointer to which &condition applies
>>>>>> + * @condition: condition to check
>>>>>> + *
>>>>>> + * xe_assert() uses &drm_WARN to emit a warning and print additional information
>>>>>> + * that could be read from the &xe pointer if provided &condition is false.
>>>>>> + *
>>>>>> + * Contrary to &drm_WARN, xe_assert() is effective only on debug builds
>>>>>> + * (&CONFIG_DRM_XE_DEBUG must be enabled) and cannot be used in expressions
>>>>>> + * or as a condition.
>>>>>> + *
>>>>>> + * See `Xe ASSERTs`_ for general usage guidelines.
>>>>>> + */
>>>>>> +#define xe_assert(xe, condition) xe_assert_msg((xe), condition, "")
>>>>>> +#define xe_assert_msg(xe, condition, msg, arg...) ({						\
>>>>>> +	struct xe_device *__xe = (xe);								\
>>>>>> +	__xe_assert_msg(__xe, condition,							\
>>>>>> +			"platform: %d subplatform: %d\n"					\
>>>>>> +			"graphics: %s %u.%02u step %s\n"					\
>>>>>> +			"media: %s %u.%02u step %s\n"						\
>>>>>> +			"display: %s step %s\n"							\
>>>>>> +			msg,									\
>>>>>> +			__xe->info.platform, __xe->info.subplatform,				\
>>>>>> +			__xe->info.graphics_name,						\
>>>>>> +			__xe->info.graphics_verx100 / 100,					\
>>>>>> +			__xe->info.graphics_verx100 % 100,					\
>>>>>> +			xe_step_name(__xe->info.step.graphics),					\
>>>>>> +			__xe->info.media_name,							\
>>>>>> +			__xe->info.media_verx100 / 100,						\
>>>>>> +			__xe->info.media_verx100 % 100,						\
>>>>>> +			xe_step_name(__xe->info.step.media),					\
>>>>>> +			str_enabled_disabled(__xe->info.enable_display),			\
>>>>>> +			xe_step_name(__xe->info.step.display),					\
>>>>>> +			## arg);								\
>>>>>> +})
>>>>>
>>>>> I guess I have missed this huge splat all along... Why is it necessary?
>>>>> If you print the device id, all the information should be there already,
>>>>> right?
>>>>
>>>> somewhere in the dmesg (if someone/CI was clever enough) maybe yes
>>>>
>>>> but in bug reports usually only the WARN is included, so exposing some
>>>> basic info here for quicker triage
>>>
>>> I just think it's unnecessary duplication. 
>>
>> but what if assert() fires before driver print welcome messages ;)
>>
>> IMO we should try to collect whatever is possible and comes almost at no
>> extra cost (you don't need to type anything, macro takes care of that)
>>
>>> Most likely this will only be
>>> enabled in CI only anyway.
>>
>> I assume asserts will fire mostly during pre-merge testing (either on CI
>> or on tested manually on DUT machines)
> 
> I'm with Jani here. In all of these cases you already know the PCI ID of
> that specific machine, so all this extra information should already be known.
> 
> The only benefit would be to get user reports when we get just that debug
> msg. But also asking for a lspci -nn | grep VGA was never that complicated.

but lspci will not show you the GMDID, so how you will know IP version?
likely same problem with VRAM size as you will see just BAR size, no?
and in the near future how to distinguish between native and PF mode?

> 
> My recommendation is to remove the duplication of this information here.

honestly I don't understand why are we aiming to have totally silent
driver that makes any debug just harder and clueless - maybe it's time
to help early adopters and feature enablers by providing more, even if
duplicated, data which is still better than nothing - like in case when
the driver decides to fail the probe ... you may just start guessing why

> But this is not a blocker in my opinion and we could go with this patch as is.
> 
> Jani, do you see this as a blocker and a new version needed?
> 
> maybe a follow-up later so we don't block this entire series?
> 
> Thanks,
> Rodrigo.
> 
>>
>>>
>>>>
>>>>>
>>>>> This also makes it impossible to use xe_assert() with a NULL xe device
>>>>> pointer in contexts where you don't have the device available.
>>>>
>>>> there was no such requirement (but we can add that if needed)
>>>>
>>>> note that code is based on drm_WARN() which also doesn't work with NULL
>>>
>>> Ah, true. The drm_dbg and friends do.
>>
>> true, so do you want xe_assert() and family to also accept NULL ?
>>
>>>
>>>>
>>>> Michal
>>>>
>>>>>
>>>>> BR,
>>>>> Jani.
>>>>>
>>>>>
>>>>>> +
>>>>>> +/**
>>>>>> + * xe_tile_assert - warn if condition is false when debugging.
>>>>>> + * @tile: the &struct xe_tile pointer to which &condition applies
>>>>>> + * @condition: condition to check
>>>>>> + *
>>>>>> + * xe_tile_assert() uses &drm_WARN to emit a warning and print additional
>>>>>> + * information that could be read from the &tile pointer if provided &condition
>>>>>> + * is false.
>>>>>> + *
>>>>>> + * Contrary to &drm_WARN, xe_tile_assert() is effective only on debug builds
>>>>>> + * (&CONFIG_DRM_XE_DEBUG must be enabled) and cannot be used in expressions
>>>>>> + * or as a condition.
>>>>>> + *
>>>>>> + * See `Xe ASSERTs`_ for general usage guidelines.
>>>>>> + */
>>>>>> +#define xe_tile_assert(tile, condition) xe_tile_assert_msg((tile), condition, "")
>>>>>> +#define xe_tile_assert_msg(tile, condition, msg, arg...) ({					\
>>>>>> +	struct xe_tile *__tile = (tile);							\
>>>>>> +	char __buf[10];										\
>>>>>> +	xe_assert_msg(tile_to_xe(__tile), condition, "tile: %u VRAM %s\n" msg,			\
>>>>>> +		      __tile->id, ({ string_get_size(__tile->mem.vram.actual_physical_size, 1,	\
>>>>>> +				     STRING_UNITS_2, __buf, sizeof(__buf)); __buf; }), ## arg);	\
>>>>>> +})
>>>>>> +
>>>>>> +/**
>>>>>> + * xe_gt_assert - warn if condition is false when debugging.
>>>>>> + * @gt: the &struct xe_gt pointer to which &condition applies
>>>>>> + * @condition: condition to check
>>>>>> + *
>>>>>> + * xe_gt_assert() uses &drm_WARN to emit a warning and print additional
>>>>>> + * information that could be safetely read from the &gt pointer if provided
>>>>>> + * &condition is false.
>>>>>> + *
>>>>>> + * Contrary to &drm_WARN, xe_gt_assert() is effective only on debug builds
>>>>>> + * (&CONFIG_DRM_XE_DEBUG must be enabled) and cannot be used in expressions
>>>>>> + * or as a condition.
>>>>>> + *
>>>>>> + * See `Xe ASSERTs`_ for general usage guidelines.
>>>>>> + */
>>>>>> +#define xe_gt_assert(gt, condition) xe_gt_assert_msg((gt), condition, "")
>>>>>> +#define xe_gt_assert_msg(gt, condition, msg, arg...) ({						\
>>>>>> +	struct xe_gt *__gt = (gt);								\
>>>>>> +	xe_tile_assert_msg(gt_to_tile(__gt), condition, "GT: %u type %d\n" msg,			\
>>>>>> +			   __gt->info.id, __gt->info.type, ## arg);				\
>>>>>> +})
>>>>>> +
>>>>>> +#endif
>>>>>
>>>


More information about the Intel-xe mailing list