[Intel-xe] [PATCH 1/3] drm/xe: Use packed bitfields for xe->info feature flags
Lucas De Marchi
lucas.demarchi at intel.com
Tue Apr 11 17:01:01 UTC 2023
On Tue, Apr 11, 2023 at 08:22:59AM -0700, Matt Roper wrote:
>On Tue, Apr 11, 2023 at 10:56:37AM +0300, Jani Nikula wrote:
>> On Tue, 11 Apr 2023, Lucas De Marchi <lucas.demarchi at intel.com> wrote:
>> > On Mon, Apr 10, 2023 at 11:39:08AM -0700, Matt Roper wrote:
>> >>Replace 'bool' fields with single bits to allow the various device
>> >>feature flags to pack more tightly.
>> >
>> > Reviewed-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> >
>> > but a digression:
>> >
>> > for structs like the descriptors in xe_pci.c, this justification is
>> > enough as we will be maintaining several of those as they extend to
>> > each platform. Also the access to struct members is contained in
>> > one place.
>> >
>> > However this reasoning can't be generalized to structs like xe_device,
>> > that has one allocated per device. The gain should be very minimal if
>> > any at all.
>> >
>> > 1) Each place accessing one of these fields will have more
>> > instructions generated to use the right bit although in most cases
>> > the compiler should swap a cmpb with testb since we are likely just
>> > checking if the feature is available or not.
>> >
>> > 2) It also limits the ability to pass them by address.
>> >
>> > 3) We also need to be careful when changing
>> > bool -> u8 as they don't have the same semantics in cases like
>> > `b = true; b++;`, which may not be obvious in some cases.
>> >
>> > for (1), the pro/con should be really small, (2) we should get a
>> > compiler error if we tried. But for (3)... we need to check all the
>> > fields we are converting to make sure this doesn't introduce bugs. I'm
>> > on the fence on the need for this change, but I'm ok with it. I double
>> > checked all the members in the struct and didn't find use cases that
>> > would introduce a bug, hence my r-b above.
>>
>> bloat-o-meter results with both might be interesting, for code and
>> data. Does the data saving matter if we bloat code more?
>
>bloat-o-meter indicates that the code size reduces:
>
> Total: Before=330210, After=325432, chg -1.45%
>
>Does packing these let the compiler combine multiple feature flag tests
>into a single instruction (e.g., HAS_FOO(xe) && HAS_BAR(xe))?
yes, that is possible.
>
>size:
>
> text data bss dec hex filename
> 467248 66607 880 534735 828cf drivers/gpu/drm/xe/xe.ko.orig
> 458855 63394 880 523129 7fb79 drivers/gpu/drm/xe/xe.ko
but this really surprising. 8K of text?? That doesn't match what I see here with
this patch applied, but my config has more debug enabled (and had
display enabled, that I disabled to be closer to yours):
$ ./scripts/bloat-o-meter build64/drivers/gpu/drm/xe/xe.ko.old build64/drivers/gpu/drm/xe/xe.ko
add/remove: 0/0 grow/shrink: 34/6 up/down: 320/-257 (63)
Function old new delta
xe_info_init 1991 2039 +48
...
xe_guc_init 1426 1232 -194
Total: Before=477778, After=477841, chg +0.01%
$ size build64/drivers/gpu/drm/xe/xe.ko.old build64/drivers/gpu/drm/xe/xe.ko
text data bss dec hex filename
775924 174199 3584 953707 e8d6b build64/drivers/gpu/drm/xe/xe.ko.old
775896 174199 3584 953679 e8d4f build64/drivers/gpu/drm/xe/xe.ko
This is more inline with what I expected: very small changes everywhere.
Not sure why they disagree wrt overal increase/decrease though, maybe
alignment. Which would mean the change is basically lost in the noise.
Lucas De Marchi
>
>
>Matt
>
>>
>> BR,
>> Jani.
>>
>>
>>
>> >
>> >
>> > Lucas De Marchi
>> >
>> >>
>> >>Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
>> >>---
>> >> drivers/gpu/drm/xe/xe_device_types.h | 21 +++++++++++----------
>> >> 1 file changed, 11 insertions(+), 10 deletions(-)
>> >>
>> >>diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
>> >>index f5399b284e3b..9ce6e348dd29 100644
>> >>--- a/drivers/gpu/drm/xe/xe_device_types.h
>> >>+++ b/drivers/gpu/drm/xe/xe_device_types.h
>> >>@@ -67,8 +67,6 @@ struct xe_device {
>> >> u32 media_verx100;
>> >> /** @mem_region_mask: mask of valid memory regions */
>> >> u32 mem_region_mask;
>> >>- /** @is_dgfx: is discrete device */
>> >>- bool is_dgfx;
>> >> /** @platform: XE platform enum */
>> >> enum xe_platform platform;
>> >> /** @subplatform: XE subplatform enum */
>> >>@@ -87,22 +85,25 @@ struct xe_device {
>> >> u8 tile_count;
>> >> /** @vm_max_level: Max VM level */
>> >> u8 vm_max_level;
>> >>+
>> >>+ /** @is_dgfx: is discrete device */
>> >>+ u8 is_dgfx:1;
>> >> /** @supports_usm: Supports unified shared memory */
>> >>- bool supports_usm;
>> >>+ u8 supports_usm:1;
>> >> /** @has_asid: Has address space ID */
>> >>- bool has_asid;
>> >>+ u8 has_asid:1;
>> >> /** @enable_guc: GuC submission enabled */
>> >>- bool enable_guc;
>> >>+ u8 enable_guc:1;
>> >> /** @has_flat_ccs: Whether flat CCS metadata is used */
>> >>- bool has_flat_ccs;
>> >>+ u8 has_flat_ccs:1;
>> >> /** @has_4tile: Whether tile-4 tiling is supported */
>> >>- bool has_4tile;
>> >>+ u8 has_4tile:1;
>> >> /** @has_range_tlb_invalidation: Has range based TLB invalidations */
>> >>- bool has_range_tlb_invalidation;
>> >>+ u8 has_range_tlb_invalidation:1;
>> >> /** @has_link_copy_engines: Whether the platform has link copy engines */
>> >>- bool has_link_copy_engine;
>> >>+ u8 has_link_copy_engine:1;
>> >> /** @enable_display: display enabled */
>> >>- bool enable_display;
>> >>+ u8 enable_display:1;
>> >>
>> >> #if IS_ENABLED(CONFIG_DRM_XE_DISPLAY)
>> >> struct xe_device_display_info {
>> >>--
>> >>2.39.2
>> >>
>>
>> --
>> Jani Nikula, Intel Open Source Graphics Center
>
>--
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation
More information about the Intel-xe
mailing list