[Mesa-dev] GLSL swizzle lowering

Mon Aug 30 19:28:34 PDT 2010

On Tue, Aug 31, 2010 at 2:14 AM, Ian Romanick <idr at freedesktop.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> While I was trying to get one of the Humus demos working today, it
> occurred to me that we can possibly do better than
> ir_vec_index_to_cond_assign to lower variable indexing of vectors.  In
> addition to using conditional assignment, we can also use a dot-product
> to pick a single element out of a vector.  The variable index operation
> becomes:
>
> const vec4 gl_vec_selector[4] =
>    vec4[4](vec4(1.0, 0.0, 0.0, 0.0),
>            vec4(0.0, 1.0, 0.0, 0.0),
>            vec4(0.0, 0.0, 1.0, 0.0),
>            vec4(0.0, 0.0, 0.0, 1.0));
>
> ...
>
> float f = dot(v, gl_vec_selector[i]);
>
> This potentially replaces a big pile of instructions with three:
>
>  1. Load the address register.
>  2. Do the dot-product.
>  3. Re-load the address register.
>
> This means we'd also want to add support to ir_algebraic to convert
> dot(v, vec3(0.0, 1.0, 0.0)) to v.y.
>
> The down-side is that it uses constant slots.  Architectures that lack
> the ability to do real vector indexing also tend to be starved for both
> instructions and constant slots.  R500 may be an exception here, but
> R300 and i915 are definitely in this category.  Are there cases where
> this optimization could cause a shader to not fit in hardware limits
> when it would have otherwise?
>

Neither r300 nor r500 supports the ARL opcode in fragment shaders (it's a
D3D10 feature), which kind of makes this optimization a no-go. I suggest
using SEQ instead:

bvec4 selector = equal(vec4(i), vec4(0,1,2,3));
float f = dot(v, vec4(selector));

which should end up being just SEQ followed by DP4.

Marek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20100831/aff8c791/attachment.htm>