[Mesa-dev] [PATCH 00/10] glsl: Implement varying packing.
maraeo at gmail.com
Wed Dec 12 12:53:01 PST 2012
On Wed, Dec 12, 2012 at 9:21 PM, Eric Anholt <eric at anholt.net> wrote:
> Marek Olšák <maraeo at gmail.com> writes:
>> On Wed, Dec 12, 2012 at 5:06 PM, Paul Berry <stereotype441 at gmail.com> wrote:
>>> On 11 December 2012 23:49, Aras Pranckevicius <aras at unity3d.com> wrote:
>>>> Not sure if relevant for Mesa, but e.g. on PowerVR SGX it's really bad to
>>>> pack two vec2 texture coordinates into a single vec4. That's because var.xy
>>>> texture read can be "prefetched", whereas var.zw texture read is not
>>>> prefetched (essentially treated as a dependent texture read), and often
>>>> causes stalls in the shader execution.
>>> Interesting--I had not thought of that possibility. On i965 all texture
>>> reads have to be done explicitly by the fragment shader (there is no
>>> prefetching IIRC), so this penalty doesn't apply. Does anyone know if a
>>> penalty like this exists in any of Mesa's other back-ends? If so that might
>>> suggest some good experiments to try. I'm open to revising my opinion if
>>> someone measures a significant performance degradation, particularly with a
>>> real-world app.
>> R300 and R400 support 4 texture indirections (as defined by
>> ARB_fragment_program). Adding ALU instructions before the first TEX
>> instruction increases the number of texture indirections by 1, which
>> might make some shaders not be executable on the hardware at all.
>> I think this optimization should be disabled on drivers where the
>> texture indirection limit is too low.
> And are swizzles of texcoords required to be separate MOVs beforehand
> (like on i915)?
Yes, swizzles aren't supported by the TEX instruction and must be
lowered. And the lowering sucks, because the only supported 3D source
operand swizzles are .xxx, .yyy, .zzz, .www, .yzw, .zxy, .wzy, .111,
.000, and 0.HHH (H=0.5), so the swizzle can occupy up to 3 MOV
instructions. The 4th channel is handled by a separate scalar
instruction, which is independent of the 3D instruction. (R300 can
execute one 3D and one scalar instruction simultaneously)
More information about the mesa-dev