[Intel-gfx] [PATCH 1/2] drm/i915: introduce for_each_engine(), rework for_each_ring()
Dave Gordon
david.s.gordon at intel.com
Mon Jan 4 07:42:47 PST 2016
On 29/12/15 19:18, Chris Wilson wrote:
> On Tue, Dec 29, 2015 at 06:59:57PM +0000, Dave Gordon wrote:
>> In the discussions around commit 373701b1 (Jani Nikula)
>> drm: fix potential dangling else problems in for_each_ macros
>> Daniel Vetter mooted the idea of a for_each_engine(ring, dev_priv)
>> macro that didn't use or rely on having an integer index variable.
>>
>> So here it is; and, for backwards compatibility, an updated version
>> of for_each_ring() implemented using the same internals. As an example
>> of the use of this new macro, the functions in intel_uncore.c have
>> been converted to use it in preference to for_each_ring() wherever
>> possible. A subsequent patch will convert many of the uses in other
>> source files.
>>
>> Cc: Daniel Vetter <daniel at ffwll.ch>
>> Cc: Jani Nikula <jani.nikula at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>> ---
>> drivers/gpu/drm/i915/i915_drv.h | 32 ++++++++++++++++++++++++++++----
>> drivers/gpu/drm/i915/intel_uncore.c | 5 ++---
>> 2 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index cf7e0fc..a6457ee 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1968,10 +1968,34 @@ static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
>> return container_of(guc, struct drm_i915_private, guc);
>> }
>>
>> -/* Iterate over initialised rings */
>> -#define for_each_ring(ring__, dev_priv__, i__) \
>> - for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
>> - for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), intel_ring_initialized((ring__))))
>> +/* Helpers for the for_each_* macros below */
>> +#define __INIT_ENGINE_PTR(engine, dev_priv) \
>> + ({ \
>> + struct intel_engine_cs *ep = (dev_priv)->ring; \
>> + (engine) = --ep; \
>> + })
>> +#define __NEXT_ACTIVE_ENGINE(engine, dev_priv, nr) \
>> + ({ \
>> + struct intel_engine_cs *end = &(dev_priv)->ring[nr]; \
>> + struct intel_engine_cs *ep = (engine); \
>> + bool in_range; \
>> + do { \
>> + in_range = ++(ep) < end; \
>> + } while (in_range && !intel_ring_initialized(ep)); \
>> + (engine) = ep; \
>> + in_range; \
>> + })
>> +
>> +/* Iterate over initialised engines */
>> +#define for_each_engine(engine, dev_priv) \
>> + for (__INIT_ENGINE_PTR(engine, dev_priv); \
>> + __NEXT_ACTIVE_ENGINE(engine, dev_priv, I915_NUM_RINGS); )
>> +
>> +/* Backwards compatibility: iterate over initialised "rings" */
>> +#define for_each_ring(engine, dev_priv, ring_id) \
>> + for (__INIT_ENGINE_PTR(engine, dev_priv); \
>> + __NEXT_ACTIVE_ENGINE(engine, dev_priv, I915_NUM_RINGS) && \
>> + ((ring_id) = (engine)->id, true); )
>
> No, that is hard to read and bloats the kernel for no benefit. Even more
> so compared against just
Well I agree that __NEXT_ACTIVE_ENGINE() takes a bit of thinking about,
but at least it uses clearly named variables that tell you what they
stand for, and can be formatted without any line breaks within
statements, which IMHO makes it much EASIER to read than then mess of
(parentheses), __underscores__, & random->punctuation that we had.
And breaking the details out into two helpers makes the actual end-user
macro so much clearer; after all, the complicated innards only have to
be checked and verified once, here on the mailing list, and then
everyone can see what the for_each_engine() macro is doing without
having to worry about the GCC magic inside.
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 578f954dba1d..739569458bb7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1967,8 +1967,8 @@ static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
>
> /* Iterate over initialised rings */
> #define for_each_ring(ring__, dev_priv__, i__) \
> - for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
> - for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), intel_engine_initialized((ring__))))
> + for ((ring__) = &(dev_priv__)->ring[0]; (ring__) < &(dev_priv__)->ring[I915_NUM_RINGS]; (ring__)++) \
> + for_each_if (intel_engine_initialized((ring__)))
>
> enum hdmi_force_audio {
> HDMI_AUDIO_OFF_DVI = -2, /* no aux data for HDMI-DVI converter */
>
> (which is a net win over the current code, presumably solely due to
> eliminating dead locals)
> -Chris
Well actually it isn't.
First off, that doesn't even compile, it leaves lots of "variable used
uninitialised" type errors in users of the macro, because those users
actually require the index, which is why my patches converted only those
that didn't.
With the above (with the addition of "i__ = 0" ... "(i__)++") used as
the definition of for_each_ring(), and also (but without the parameter
"i__") as the definition of for_each_engine(), plus my second patch that
converts as many as possible, we find the following:
-rw-r--r-- 30866475 Jan 4 14:55 i915-drm.ko # baseline
-rw-r--r-- 30871447 Jan 4 14:56 i915-cw.ko # this version
-rw-r--r-- 30877524 Jan 4 14:56 i915-dsg.ko # my version
so it looks like this increases the size, and mine increases it even
more. But actually, when we look at CODE size rather than file size:
text data bss dec hex filename
1092416 24233 512 1117161 110be9 i915-drm.ko # baseline
1092820 24233 512 1117565 110d7d i915-cw.ko # this version
1092244 24233 512 1116989 110b3d i915-dsg.ko # my version
we find just the opposite. This makes the code ~400b larger, whereas my
patch made it ~200b smaller. I think it's because GCC knows how to alias
variables with different scopes to the same registers, so caching
intermediate results in very-local-variables is generally a win. It's
definitely not "kernel bloat".
.Dave.
More information about the Intel-gfx
mailing list