[[PATCH][RESENT] 1/3] Replace i2f() in r600_blit.c with an optimized version.
Michel Dänzer
michel at daenzer.net
Tue Aug 7 00:37:14 PDT 2012
On Mon, 2012-08-06 at 16:11 -0700, Steven Fuerst wrote:
> We use __fls() to find the most significant bit. Using that, the
> loop can be avoided. A second trick is to use the mod(32)
> behaviour of the rotate instructions on x86 to expand the range
> of the unsigned int to float conversion to the full 32 bits.
>
> The routine is now exact up to 2^24. Above that, we truncate which
> is equivalent to rounding towards zero.
>
> Signed-off-by: Steven Fuerst <svfuerst at gmail.com>
> ---
> drivers/gpu/drm/radeon/r600_blit.c | 52 +++++++++++++++++++++---------------
> 1 file changed, 30 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/r600_blit.c b/drivers/gpu/drm/radeon/r600_blit.c
> index 3c031a4..f0ce441 100644
> --- a/drivers/gpu/drm/radeon/r600_blit.c
> +++ b/drivers/gpu/drm/radeon/r600_blit.c
> @@ -489,29 +489,37 @@ set_default_state(drm_radeon_private_t *dev_priv)
> ADVANCE_RING();
> }
>
> -static uint32_t i2f(uint32_t input)
> +/* 23 bits of float fractional data */
> +#define I2F_FRAC_BITS 23
> +#define I2F_MASK ((1 << I2F_FRAC_BITS) - 1)
> +
> +/*
> + * Converts unsigned integer into 32-bit IEEE floating point representation.
> + * Will be exact from 0 to 2^24. Above that, we round towards zero
> + * as the fractional bits will not fit in a float. (It would be better to
> + * round towards even as the fpu does, but that is slower.)
> + * This routine depends on the mod(32) behaviour of the rotate instructions
> + * on x86.
The radeon driver works on other architectures than x86. It sounds (and
looks, looking at ror32() in include/linux/bitops.h) like this change
will break those, which is a no go.
> + /*
> + * Use a rotate instead of a shift because that works both leftwards
> + * and rightwards due to the mod(32) beahviour. This means we don't
> + * need to check to see if we are above 2^24 or not.
> + */
> + fraction = ror32(x, msb - I2F_FRAC_BITS) & I2F_MASK;
Seems like you could write this as
fraction = ror32(x, (msb - I2F_FRAC_BITS) & 31) & I2F_MASK;
to avoid that, and remove the mentions of relying on the mod(32)
behaviour.
--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Debian, X and DRI developer
More information about the dri-devel
mailing list