[Intel-xe] [PATCH 1/3] drm/xe: Use packed bitfields for xe->info feature flags

Matt Roper matthew.d.roper at intel.com
Tue Apr 11 17:27:04 UTC 2023


On Tue, Apr 11, 2023 at 10:01:01AM -0700, Lucas De Marchi wrote:
> On Tue, Apr 11, 2023 at 08:22:59AM -0700, Matt Roper wrote:
> > On Tue, Apr 11, 2023 at 10:56:37AM +0300, Jani Nikula wrote:
> > > On Tue, 11 Apr 2023, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
> > > > On Mon, Apr 10, 2023 at 11:39:08AM -0700, Matt Roper wrote:
> > > >>Replace 'bool' fields with single bits to allow the various device
> > > >>feature flags to pack more tightly.
> > > >
> > > > Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > >
> > > > but a  digression:
> > > >
> > > > for structs like the descriptors in xe_pci.c, this justification is
> > > > enough as we will be maintaining several of those as they extend to
> > > > each platform. Also the access to struct members is contained in
> > > > one place.
> > > >
> > > > However this reasoning can't be generalized to structs like xe_device,
> > > > that has one allocated per device. The gain should be very minimal if
> > > > any at all.
> > > >
> > > > 1) Each place accessing one of these fields will have more
> > > > instructions generated to use the right bit although in most cases
> > > > the compiler should swap a cmpb with testb since we are likely just
> > > > checking if the feature is available or not.
> > > >
> > > > 2) It also limits the ability to pass them by address.
> > > >
> > > > 3) We also need to be careful when changing
> > > > bool -> u8 as they don't have the same semantics in cases like
> > > > `b = true; b++;`, which may not be obvious in some cases.
> > > >
> > > > for (1), the pro/con should be really small, (2) we should get a
> > > > compiler error if we tried. But for (3)... we need to check all the
> > > > fields we are converting to make sure this doesn't introduce bugs.  I'm
> > > > on the fence on the need for this change, but I'm ok with it.  I double
> > > > checked all the members in the struct and didn't find use cases that
> > > > would introduce a bug, hence my r-b above.
> > > 
> > > bloat-o-meter results with both might be interesting, for code and
> > > data. Does the data saving matter if we bloat code more?
> > 
> > bloat-o-meter indicates that the code size reduces:
> > 
> >        Total: Before=330210, After=325432, chg -1.45%
> > 
> > Does packing these let the compiler combine multiple feature flag tests
> > into a single instruction (e.g., HAS_FOO(xe) && HAS_BAR(xe))?
> 
> yes, that is possible.
> 
> > 
> > size:
> > 
> >   text    data     bss     dec     hex filename
> > 467248   66607     880  534735   828cf drivers/gpu/drm/xe/xe.ko.orig
> > 458855   63394     880  523129   7fb79 drivers/gpu/drm/xe/xe.ko
> 
> but this really surprising. 8K of text?? That doesn't match what I see here with
> this patch applied, but my config has more debug enabled (and had
> display enabled, that I disabled to be closer to yours):
> 
> 	$ ./scripts/bloat-o-meter build64/drivers/gpu/drm/xe/xe.ko.old build64/drivers/gpu/drm/xe/xe.ko
> 	add/remove: 0/0 grow/shrink: 34/6 up/down: 320/-257 (63)
> 	Function                                     old     new   delta
> 	xe_info_init                                1991    2039     +48
> 	...
> 	xe_guc_init                                 1426    1232    -194
> 	Total: Before=477778, After=477841, chg +0.01%
> 
> 	$ size build64/drivers/gpu/drm/xe/xe.ko.old
> build64/drivers/gpu/drm/xe/xe.ko 	   text    data     bss     dec     hex
> filename
> 	 775924  174199    3584  953707   e8d6b build64/drivers/gpu/drm/xe/xe.ko.old
> 	 775896  174199    3584  953679   e8d4f build64/drivers/gpu/drm/xe/xe.ko
> 
> This is more inline with what I expected: very small changes everywhere.
> Not sure why they disagree wrt overal increase/decrease though, maybe
> alignment.

CONFIG_UBSAN apparently.  Comparing a handful of .s files I see a bunch
of UBSAN stuff in different spots between before/after builds.  If I
turn off that config option and rebuild I get numbers more like yours:

    add/remove: 0/0 grow/shrink: 13/5 up/down: 181/-108 (73)
    ...
    Total: Before=300434, After=300507, chg +0.02%


Matt

> Which would mean the change is basically lost in the noise.
> 
> Lucas De Marchi
> 
> > 
> > 
> > Matt
> > 
> > > 
> > > BR,
> > > Jani.
> > > 
> > > 
> > > 
> > > >
> > > >
> > > > Lucas De Marchi
> > > >
> > > >>
> > > >>Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
> > > >>---
> > > >> drivers/gpu/drm/xe/xe_device_types.h | 21 +++++++++++----------
> > > >> 1 file changed, 11 insertions(+), 10 deletions(-)
> > > >>
> > > >>diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > > >>index f5399b284e3b..9ce6e348dd29 100644
> > > >>--- a/drivers/gpu/drm/xe/xe_device_types.h
> > > >>+++ b/drivers/gpu/drm/xe/xe_device_types.h
> > > >>@@ -67,8 +67,6 @@ struct xe_device {
> > > >> 		u32 media_verx100;
> > > >> 		/** @mem_region_mask: mask of valid memory regions */
> > > >> 		u32 mem_region_mask;
> > > >>-		/** @is_dgfx: is discrete device */
> > > >>-		bool is_dgfx;
> > > >> 		/** @platform: XE platform enum */
> > > >> 		enum xe_platform platform;
> > > >> 		/** @subplatform: XE subplatform enum */
> > > >>@@ -87,22 +85,25 @@ struct xe_device {
> > > >> 		u8 tile_count;
> > > >> 		/** @vm_max_level: Max VM level */
> > > >> 		u8 vm_max_level;
> > > >>+
> > > >>+		/** @is_dgfx: is discrete device */
> > > >>+		u8 is_dgfx:1;
> > > >> 		/** @supports_usm: Supports unified shared memory */
> > > >>-		bool supports_usm;
> > > >>+		u8 supports_usm:1;
> > > >> 		/** @has_asid: Has address space ID */
> > > >>-		bool has_asid;
> > > >>+		u8 has_asid:1;
> > > >> 		/** @enable_guc: GuC submission enabled */
> > > >>-		bool enable_guc;
> > > >>+		u8 enable_guc:1;
> > > >> 		/** @has_flat_ccs: Whether flat CCS metadata is used */
> > > >>-		bool has_flat_ccs;
> > > >>+		u8 has_flat_ccs:1;
> > > >> 		/** @has_4tile: Whether tile-4 tiling is supported */
> > > >>-		bool has_4tile;
> > > >>+		u8 has_4tile:1;
> > > >> 		/** @has_range_tlb_invalidation: Has range based TLB invalidations */
> > > >>-		bool has_range_tlb_invalidation;
> > > >>+		u8 has_range_tlb_invalidation:1;
> > > >> 		/** @has_link_copy_engines: Whether the platform has link copy engines */
> > > >>-		bool has_link_copy_engine;
> > > >>+		u8 has_link_copy_engine:1;
> > > >> 		/** @enable_display: display enabled */
> > > >>-		bool enable_display;
> > > >>+		u8 enable_display:1;
> > > >>
> > > >> #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
> > > >> 		struct xe_device_display_info {
> > > >>--
> > > >>2.39.2
> > > >>
> > > 
> > > --
> > > Jani Nikula, Intel Open Source Graphics Center
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list