[Mesa-dev] a newbie asking newbie questions

Tue Sep 17 10:35:44 PDT 2013

"Rogovin, Kevin" <kevin.rogovin at intel.com> writes:

> Hello,
>
>  Thank you for the very fast answers, some more questions:
>
>
>> It's not a preference question.  The registers are 8 floats wide.
>> Vertex shaders get invoked 2 vertices at a time, with a register
>> containing these values:
>>
>> .   +------+------+------+------+------+------+------+------+
>> .   | v0.x | v0.y | v0.z | v0.w | v1.x | v1.y | v1.z | v1.w |
>> .   +------+------+------+------+------+------+------+------+
>
> This seems best to me: run two vertices in each invocation with the
> hopes that the shader compiler will merge (multiple) float, vec2 and
> maybe even vec3 operations into vec4 operations (does it)?

This is the worst, actually, since you're wasting channels that could
have got some work done if you're dealing with things smaller than vec4
(and you do that a lot).  That is, unless running fewer shader instances
at a time happens to prevent register spilling.  But it's the hardware's
choice, not ours.

>> while these 8 pixels in screen space:
>> 
>> .  +----+----+----+----+ .  | p0 | p1 | p2 | p3 | .
>> +----+----+----+----+ .  | p4 | p5 | p6 | p7 | .
>> +----+----+----+----+
>>
>> are loaded in fragment shader registers as:
>>
>> .  +------+------+------+------+------+------+------+------+ .  |
>>p0.x | p1.x | p4.x | p5.x | p2.x | p3.x | p6.x | p7.x | .
>>+------+------+------+------+------+------+------+------+
>>
>> Note how one register just holds a single channel ('.x' here) of a
>> vector.  A vec4 would take up 4 registers, and to do value0.xyzw *
>> value1.xyzw, you'd emit 4 MULs.
>
> This is exactly what I was trying to ask/say about the fragment shader
> running, i.e. n-fragments are processed with 1 n-SIMD command (for
> i965, n=8), sighs my e-mail communications leave something to be
> desired.  Some questions: 1) do the fragments need to be in a 4x2
> block, or can it be two separate 2x2 blocks?  2) for tiny triangles
> for fragment shaders that do not require dFdx, dFdy or fwidth, can the
> fragments be totally scattered?

This is all the hardware's choice, not ours.  And of course, any normal
texturing at all requires the implicit calculation derivatives to
determine LOD, so it's always 2x2 subspans.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130917/aa2d614e/attachment.pgp>