<div class="gmail_quote">On Tue, Aug 31, 2010 at 7:20 PM, Ian Romanick <span dir="ltr"><<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">-----BEGIN PGP SIGNED MESSAGE-----<br>
Hash: SHA1<br>
<br>
</div><div class="im">Marek Ol?ák wrote:<br>
> On Tue, Aug 31, 2010 at 2:14 AM, Ian Romanick <<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a><br>
</div><div class="im">> <mailto:<a href="mailto:idr@freedesktop.org">idr@freedesktop.org</a>>> wrote:<br>
><br>
> While I was trying to get one of the Humus demos working today, it<br>
> occurred to me that we can possibly do better than<br>
> ir_vec_index_to_cond_assign to lower variable indexing of vectors. In<br>
> addition to using conditional assignment, we can also use a dot-product<br>
> to pick a single element out of a vector. The variable index operation<br>
> becomes:<br>
><br>
> const vec4 gl_vec_selector[4] =<br>
> vec4[4](vec4(1.0, 0.0, 0.0, 0.0),<br>
> vec4(0.0, 1.0, 0.0, 0.0),<br>
> vec4(0.0, 0.0, 1.0, 0.0),<br>
> vec4(0.0, 0.0, 0.0, 1.0));<br>
><br>
> ...<br>
><br>
> float f = dot(v, gl_vec_selector[i]);<br>
<br>
</div>[snip]<br>
<div class="im"><br>
> Neither r300 nor r500 supports the ARL opcode in fragment shaders (it's<br>
<br>
</div>Meh. I always forget about that asymmetry.<br>
<div class="im"><br>
> a D3D10 feature), which kind of makes this optimization a no-go. I<br>
> suggest using SEQ instead:<br>
><br>
> bvec4 selector = equal(vec4(i), vec4(0,1,2,3));<br>
> float f = dot(v, vec4(selector));<br>
><br>
> which should end up being just SEQ followed by DP4.<br>
<br>
</div>SEQ isn't part of the ARB_fragment_program / ARB_vertex_program<br>
instruction set either. Does R300 support that? I won't be surprised<br>
if i915 doesn't. Of course, it doesn't support the ARL-based<br>
optimization either.<br></blockquote><div><br>R500 vertex shaders support SEQ natively.<br><br>For R300 vertex shaders, SEQ is lowered to a sequence of opcodes SGE, SGE, MUL. (because CMP is unsupported in hw)<br><br>For R300-R500 fragment shaders, SEQ is lowered to a sequence of opcodes ADD, CMP. (because SGE is unsupported in hw)<br>
<br>SEQ is probably the best "high-level" instruction here, but I am ok with anything other than ARL.<br><br>Marek<br><br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
SEQ is itself lowered to a sequence of instructions:<br>
<br>
SGE t0, i.xxxx, { 0, 1, 2, 3};<br>
SGE t1, i.xxxx, {-0, -1, -2, -3};<br>
MUL selector, t0, t1;<br>
DP4 result, v, selector;<br>
<br>
So, that probably still would be better than the mess that gets<br>
generated today.<br>
<div class="im">-----BEGIN PGP SIGNATURE-----<br>
Version: GnuPG v1.4.10 (GNU/Linux)<br>
Comment: Using GnuPG with Mozilla - <a href="http://enigmail.mozdev.org/" target="_blank">http://enigmail.mozdev.org/</a><br>
<br>
</div>iEYEARECAAYFAkx9OfMACgkQX1gOwKyEAw+HlQCfZCqRdGwn3x9k06/qReehqMSf<br>
nCoAnjKMPoUlb/4QXeHW0EYOozlLpQn2<br>
=X6R+<br>
-----END PGP SIGNATURE-----<br>
</blockquote></div><br>