[[PATCH][RESENT] 1/3] Replace i2f() in r600_blit.c with an optimized version.

Tue Aug 7 00:37:14 PDT 2012

On Mon, 2012-08-06 at 16:11 -0700, Steven Fuerst wrote: 
> We use __fls() to find the most significant bit.  Using that, the
> loop can be avoided.  A second trick is to use the mod(32)
> behaviour of the rotate instructions on x86 to expand the range
> of the unsigned int to float conversion to the full 32 bits.
> 
> The routine is now exact up to 2^24.  Above that, we truncate which
> is equivalent to rounding towards zero.
> 
> Signed-off-by: Steven Fuerst <svfuerst at gmail.com>
> ---
>  drivers/gpu/drm/radeon/r600_blit.c |   52 +++++++++++++++++++++---------------
>  1 file changed, 30 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c
> index 3c031a4..f0ce441 100644
> --- a/drivers/gpu/drm/radeon/r600_blit.c
> +++ b/drivers/gpu/drm/radeon/r600_blit.c
> @@ -489,29 +489,37 @@ set_default_state(drm_radeon_private_t *dev_priv)
>  	ADVANCE_RING();
>  }
>  
> -static uint32_t i2f(uint32_t input)
> +/* 23 bits of float fractional data */
> +#define I2F_FRAC_BITS	23
> +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)
> +
> +/*
> + * Converts unsigned integer into 32-bit IEEE floating point representation.
> + * Will be exact from 0 to 2^24.  Above that, we round towards zero
> + * as the fractional bits will not fit in a float.  (It would be better to
> + * round towards even as the fpu does, but that is slower.)
> + * This routine depends on the mod(32) behaviour of the rotate instructions
> + * on x86.

The radeon driver works on other architectures than x86. It sounds (and
looks, looking at ror32() in include/linux/bitops.h) like this change
will break those, which is a no go.

> +	/*
> +	 * Use a rotate instead of a shift because that works both leftwards
> +	 * and rightwards due to the mod(32) beahviour.  This means we don't
> +	 * need to check to see if we are above 2^24 or not.
> +	 */
> +	fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK;

Seems like you could write this as

fraction = ror32(x, (msb - I2F_FRAC_BITS) & 31) & I2F_MASK;

to avoid that, and remove the mentions of relying on the mod(32)
behaviour.

-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer