[Mesa-dev] [PATCH] gallivm: optimize lp_build_minify for sse

Tue Nov 5 11:20:00 PST 2013

On 11/05/2013 11:22 AM, sroland at vmware.com wrote:
> From: Roland Scheidegger <sroland at vmware.com>
>
> SSE can't handle true vector shifts (with variable shift count),
> so llvm is turning them into a mess of extracts, scalar shifts and inserts.
> It is however possible to emulate them in lp_build_minify with float muls,
> which should be way faster (saves over 20 instructions per 8-wide
> lp_build_minify). This wouldn't work for "generic" 32bit shifts though
> since we've got only 24bits of mantissa (actually for left shifts it would
> work by using sse41 int mul instead of float mul but not for right shifts).
> Note that this has very limited scope for now, since this is only used with
> per-pixel lod (otherwise we're avoiding the non-constant shift count by doing
> per-quad shifts manually), and only 1d textures even then (though the latter
> should change).

LGTM.

Reviewed-by: Brian Paul <brianp at vmware.com>