[Mesa-dev] [PATCH 00/10] glsl: Implement varying packing.
stereotype441 at gmail.com
Wed Dec 12 08:06:08 PST 2012
On 11 December 2012 23:49, Aras Pranckevicius <aras at unity3d.com> wrote:
> For the initial implementation I've chosen a strategy that operates
>> exclusively at the GLSL IR level, so that it doesn't require the
>> cooperation of the driver back-ends.
> Wouldn't this negatively affect performance of some GPUs?
I'm glad you asked--I've actually had quite a bit of in person discussion
with Eric and Ian about this.
With the i965 back-end, we're expecting a slight performance improvement,
based on the following reasoning:
- Most of the packing/unpacking operations in the shader will be coalesced
with other operations by optimization passes, so they won't negatively
impact performance. This is especially true in the fragment shader, where
operations are scalarized, so the packing/unpacking should just turn into
simple scalar copies, and those should be completely eliminated by copy
propagation. Most programs spend most of their time in the fragment shader
anyhow, so the performance penalty is already limited to shaders that have
a smaller contribution to execution time.
- The extra operations we are talking about are register-to-register
moves--no memory access is involved, and no ALU resources are tied up. So
there's a pretty small upper limit to the performance penalty even in the
case where optimization can't eliminate the copy.
- Having packed varyings will mean that the vertex shader spends less time
writing its output to the VUE, and the fragment shader spends less time
reading its input from the VUE. We don't know exactly how long these VUE
reads/writes take (it is difficult to measure them because they are part of
the process of starting and terminating threads), but it's very likely that
they take longer than register moves. So the already-small performance
penalty discussed above is probably offset by a larger performance
improvement due to more efficient utilization of the VUE.
I can't speak with authority on the inner workings of the other GPUs
supported by Mesa, but it seems like most of the arguments above are
general enough to apply to most GPU architectures, not just i965.
Of course, there could be some important factor that I'm missing that makes
all of this analysis completely wrong and causes varying packing to carry a
huge penalty on some architectures. If that's the case, I think the best
way to address the problem is to find an application that is slowed down by
varying packing and run experiments to understand why.
If worse comes to worst, we could of course modify the varying packing code
so that it only takes effect when there are a large number of varyings that
there is no alternative. But that would carry a two disadvantages: it
would complicate the linker (especially the handling of transform feedback)
to have to handle both packed and unpacked varying formats, and it would
reduce test coverage of varying packing to almost nil (since most of our
piglit tests use a small number of varyings). Because of those
disadvantages, and the fact that our current understanding leads us to
expect a performance improvement, I'd like to save this strategy for a last
> Not sure if relevant for Mesa, but e.g. on PowerVR SGX it's really bad to
> pack two vec2 texture coordinates into a single vec4. That's because var.xy
> texture read can be "prefetched", whereas var.zw texture read is not
> prefetched (essentially treated as a dependent texture read), and often
> causes stalls in the shader execution.
Interesting--I had not thought of that possibility. On i965 all texture
reads have to be done explicitly by the fragment shader (there is no
prefetching IIRC), so this penalty doesn't apply. Does anyone know if a
penalty like this exists in any of Mesa's other back-ends? If so that
might suggest some good experiments to try. I'm open to revising my
opinion if someone measures a significant performance degradation,
particularly with a real-world app.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mesa-dev