[Intel-gfx] [PATCH 01/12] drm: Inline drm_color_lut_extract()

Kazlauskas, Nicholas Nicholas.Kazlauskas at amd.com
Thu Nov 7 15:47:24 UTC 2019


On 2019-11-07 10:43 a.m., Ville Syrjälä wrote:
> On Thu, Nov 07, 2019 at 03:31:28PM +0000, Kazlauskas, Nicholas wrote:
>> On 2019-11-07 10:17 a.m., Ville Syrjala wrote:
>>> From: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>>
>>> This thing can get called several thousand times per LUT
>>> so seems like we want to inline it to:
>>> - avoid the function call overhead
>>> - allow constant folding
>>>
>>> A quick synthetic test (w/o any hardware interaction) with
>>> a ridiculously large LUT size shows about 50% reduction in
>>> runtime on my HSW and BSW boxes. Slightly less with more
>>> reasonable LUT size but still easily measurable in tens
>>> of microseconds.
>>>
>>> Signed-off-by: Ville Syrjälä <ville.syrjala at linux.intel.com>
>>
>> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas at amd.com>
>>
>> Seems reasonable to me. It would probably make sense to even split this
>> further into two functions, one for high precision and one for low
>> precision so it's purely a calculation and not hitting any branches.
> 
> Constant folding gets rid of it.

I realized after sending that email that moving this to inline is 
probably allowing the compiler to optimize this out and give you that 
large speedup in the first place. Though branch prediction probably 
helped cut down on the cost even when it wasn't inline.

This is fine as is then, thanks.

Nicholas Kazlauskas

> 
>>
>> Nicholas Kazlauskas
>>
>>> ---
>>>    drivers/gpu/drm/drm_color_mgmt.c | 24 ------------------------
>>>    include/drm/drm_color_mgmt.h     | 23 ++++++++++++++++++++++-
>>>    2 files changed, 22 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c
>>> index 4ce5c6d8de99..19c5f635992a 100644
>>> --- a/drivers/gpu/drm/drm_color_mgmt.c
>>> +++ b/drivers/gpu/drm/drm_color_mgmt.c
>>> @@ -108,30 +108,6 @@
>>>     * 	standard enum values supported by the DRM plane.
>>>     */
>>>    
>>> -/**
>>> - * drm_color_lut_extract - clamp and round LUT entries
>>> - * @user_input: input value
>>> - * @bit_precision: number of bits the hw LUT supports
>>> - *
>>> - * Extract a degamma/gamma LUT value provided by user (in the form of
>>> - * &drm_color_lut entries) and round it to the precision supported by the
>>> - * hardware.
>>> - */
>>> -uint32_t drm_color_lut_extract(uint32_t user_input, uint32_t bit_precision)
>>> -{
>>> -	uint32_t val = user_input;
>>> -	uint32_t max = 0xffff >> (16 - bit_precision);
>>> -
>>> -	/* Round only if we're not using full precision. */
>>> -	if (bit_precision < 16) {
>>> -		val += 1UL << (16 - bit_precision - 1);
>>> -		val >>= 16 - bit_precision;
>>> -	}
>>> -
>>> -	return clamp_val(val, 0, max);
>>> -}
>>> -EXPORT_SYMBOL(drm_color_lut_extract);
>>> -
>>>    /**
>>>     * drm_crtc_enable_color_mgmt - enable color management properties
>>>     * @crtc: DRM CRTC
>>> diff --git a/include/drm/drm_color_mgmt.h b/include/drm/drm_color_mgmt.h
>>> index d1c662d92ab7..069b21d61871 100644
>>> --- a/include/drm/drm_color_mgmt.h
>>> +++ b/include/drm/drm_color_mgmt.h
>>> @@ -29,7 +29,28 @@
>>>    struct drm_crtc;
>>>    struct drm_plane;
>>>    
>>> -uint32_t drm_color_lut_extract(uint32_t user_input, uint32_t bit_precision);
>>> +/**
>>> + * drm_color_lut_extract - clamp and round LUT entries
>>> + * @user_input: input value
>>> + * @bit_precision: number of bits the hw LUT supports
>>> + *
>>> + * Extract a degamma/gamma LUT value provided by user (in the form of
>>> + * &drm_color_lut entries) and round it to the precision supported by the
>>> + * hardware.
>>> + */
>>> +static inline u32 drm_color_lut_extract(u32 user_input, int bit_precision)
>>> +{
>>> +	u32 val = user_input;
>>> +	u32 max = 0xffff >> (16 - bit_precision);
>>> +
>>> +	/* Round only if we're not using full precision. */
>>> +	if (bit_precision < 16) {
>>> +		val += 1UL << (16 - bit_precision - 1);
>>> +		val >>= 16 - bit_precision;
>>> +	}
>>> +
>>> +	return clamp_val(val, 0, max);
>>> +}
>>>    
>>>    void drm_crtc_enable_color_mgmt(struct drm_crtc *crtc,
>>>    				uint degamma_lut_size,
>>>
>>
> 



More information about the Intel-gfx mailing list