[Intel-gfx] [PATCH 1/2] drm/i915: introduce for_each_engine(), rework for_each_ring()

Mon Jan 4 07:42:47 PST 2016

On 29/12/15 19:18, Chris Wilson wrote:
> On Tue, Dec 29, 2015 at 06:59:57PM +0000, Dave Gordon wrote:
>> In the discussions around commit 373701b1 (Jani Nikula)
>>      drm: fix potential dangling else problems in for_each_ macros
>> Daniel Vetter mooted the idea of a for_each_engine(ring, dev_priv)
>> macro that didn't use or rely on having an integer index variable.
>>
>> So here it is; and, for backwards compatibility, an updated version
>> of for_each_ring() implemented using the same internals. As an example
>> of the use of this new macro, the functions in intel_uncore.c have
>> been converted to use it in preference to for_each_ring() wherever
>> possible. A subsequent patch will convert many of the uses in other
>> source files.
>>
>> Cc: Daniel Vetter <daniel at ffwll.ch>
>> Cc: Jani Nikula <jani.nikula at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h     | 32 ++++++++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/intel_uncore.c |  5 ++---
>>   2 files changed, 30 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index cf7e0fc..a6457ee 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1968,10 +1968,34 @@ static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
>>   	return container_of(guc, struct drm_i915_private, guc);
>>   }
>>
>> -/* Iterate over initialised rings */
>> -#define for_each_ring(ring__, dev_priv__, i__) \
>> -	for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
>> -		for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), intel_ring_initialized((ring__))))
>> +/* Helpers for the for_each_* macros below */
>> +#define	__INIT_ENGINE_PTR(engine, dev_priv)				\
>> +	({								\
>> +	 	struct intel_engine_cs *ep = (dev_priv)->ring;		\
>> +		(engine) = --ep;					\
>> +	})
>> +#define	__NEXT_ACTIVE_ENGINE(engine, dev_priv, nr)			\
>> +	({								\
>> +		struct intel_engine_cs *end = &(dev_priv)->ring[nr];	\
>> +		struct intel_engine_cs *ep = (engine);			\
>> +		bool in_range;						\
>> +		do {							\
>> +			in_range = ++(ep) < end;			\
>> +		} while (in_range && !intel_ring_initialized(ep));	\
>> +	 	(engine) = ep;						\
>> +		in_range;						\
>> +	})
>> +
>> +/* Iterate over initialised engines */
>> +#define for_each_engine(engine, dev_priv)				\
>> +	for (__INIT_ENGINE_PTR(engine, dev_priv);			\
>> +	     __NEXT_ACTIVE_ENGINE(engine, dev_priv, I915_NUM_RINGS); )
>> +
>> +/* Backwards compatibility: iterate over initialised "rings" */
>> +#define for_each_ring(engine, dev_priv, ring_id)			\
>> +	for (__INIT_ENGINE_PTR(engine, dev_priv);			\
>> +	     __NEXT_ACTIVE_ENGINE(engine, dev_priv, I915_NUM_RINGS) &&	\
>> +	     ((ring_id) = (engine)->id, true); )
>
> No, that is hard to read and bloats the kernel for no benefit. Even more
> so compared against just

Well I agree that __NEXT_ACTIVE_ENGINE() takes a bit of thinking about, 
but at least it uses clearly named variables that tell you what they 
stand for, and can be formatted without any line breaks within 
statements, which IMHO makes it much EASIER to read than then mess of 
(parentheses), __underscores__, & random->punctuation that we had.
And breaking the details out into two helpers makes the actual end-user 
macro so much clearer; after all, the complicated innards only have to 
be checked and verified once, here on the mailing list, and then 
everyone can see what the for_each_engine() macro is doing without 
having to worry about the GCC magic inside.

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 578f954dba1d..739569458bb7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1967,8 +1967,8 @@ static inline struct drm_i915_private *guc_to_i915(struct intel_guc *guc)
>
>   /* Iterate over initialised rings */
>   #define for_each_ring(ring__, dev_priv__, i__) \
> -       for ((i__) = 0; (i__) < I915_NUM_RINGS; (i__)++) \
> -               for_each_if ((((ring__) = &(dev_priv__)->ring[(i__)]), intel_engine_initialized((ring__))))
> +       for ((ring__) = &(dev_priv__)->ring[0]; (ring__) < &(dev_priv__)->ring[I915_NUM_RINGS]; (ring__)++) \
> +               for_each_if (intel_engine_initialized((ring__)))
>
>   enum hdmi_force_audio {
>          HDMI_AUDIO_OFF_DVI = -2,        /* no aux data for HDMI-DVI converter */
>
> (which is a net win over the current code, presumably solely due to
> eliminating dead locals)
> -Chris

Well actually it isn't.

First off, that doesn't even compile, it leaves lots of "variable used 
uninitialised" type errors in users of the macro, because those users 
actually require the index, which is why my patches converted only those 
that didn't.

With the above (with the addition of "i__ = 0" ... "(i__)++") used as 
the definition of for_each_ring(), and also (but without the parameter 
"i__") as the definition of for_each_engine(), plus my second patch that 
converts as many as possible, we find the following:

-rw-r--r-- 30866475 Jan 4 14:55 i915-drm.ko	# baseline
-rw-r--r-- 30871447 Jan 4 14:56 i915-cw.ko	# this version
-rw-r--r-- 30877524 Jan 4 14:56 i915-dsg.ko	# my version

so it looks like this increases the size, and mine increases it even 
more. But actually, when we look at CODE size rather than file size:

    text	   data	    bss	    dec	    hex	filename
1092416   24233     512 1117161  110be9 i915-drm.ko	# baseline
1092820	  24233	    512	1117565	 110d7d	i915-cw.ko	# this version
1092244	  24233	    512	1116989	 110b3d	i915-dsg.ko	# my version

we find just the opposite. This makes the code ~400b larger, whereas my 
patch made it ~200b smaller. I think it's because GCC knows how to alias 
variables with different scopes to the same registers, so caching 
intermediate results in very-local-variables is generally a win. It's 
definitely not "kernel bloat".

.Dave.