[Mesa-dev] [PATCH 00/10] glsl: Implement varying packing.
Paul Berry
stereotype441 at gmail.com
Tue Dec 11 15:09:06 PST 2012
This patch series adds varying packing to Mesa, so that we can handle
varyings composed of things other than vec4's without using up extra
varying components.
For the initial implementation I've chosen a strategy that operates
exclusively at the GLSL IR level, so that it doesn't require the
cooperation of the driver back-ends. This means that varying packing
should be immediately useful for all drivers. However, there are some
types of varying packing that can't be done using GLSL IR alone (for
example, packing a "noperspective" varying and a "smooth" varying
together), but should be possible on some drivers with a small amount
of back-end work. I'm deferring that work for a later patch series.
Also, packing of floats and ints together into the same "flat varying"
should be possible for drivers that implement
ARB_shader_bit_encoding--I'm also deferring that for a later patch
series.
The strategy is as follows:
- Before assigning locations to varyings, we sort them into "packing
classes" based on base type and interpolation mode (this is to
ensure that we don't try to pack floats with ints, or smooth with
flat, for example).
- Within each packing class, we sort the varyings based on the number
of vector elements. Vec4's (as well as matrices and arrays composed
of vec4's) are packed first, then vec2's, then scalars, since this
allows us to align them all to their natural alignment boundary, so
we avoid the performance penalty of "double parking" a varying
across two varying slots. Vec3's are packed last, double parking
them if necessary.
- For any varying slot that doesn't contain exactly one vec4, we
generate GLSL IR to manually pack/unpack the varying in the shader.
For instance, the following fragment shader:
varying vec2 a;
varying vec2 b;
varying vec3 c;
varying vec3 d;
main()
{
...
}
would get rewritten as follows:
varying vec4 packed0;
varying vec4 packed1;
varying vec4 packed2;
vec2 a;
vec2 b;
vec3 c;
vec3 d;
main()
{
a = packed0.xy;
b = packed0.zw;
c = packed1.xyz;
d.x = packed1.w; // d is "double parked" across slots 1 and 2
d.yz = packed2.xy;
...
}
This GLSL IR is generated by a lowering pass, so that in the future
we will have the option of disabling it for driver back-ends that
are capable of natively understanding the packed varying format.
- Finally, the linker code to handle transform feedback is modified to
account for varying packing (e.g. by feeding back just a subset of
the components of a varying slot rather than the entire varying
slot). Fortunately transform feedback already has the
infrastructure necessary to do this, since it was needed in order to
implement glClipDistance.
I believe this is enough to be useful for the vast majority of
programs, and to get us passing the GLES3 conformance tests.
Additional improvements, which I'm planning to defer to later patch
series, include:
- Allow uints and ints to be packed together in the same varying slot.
This should be possible on all back-ends, since ints and uints may
be interconverted without losing information.
- On back-ends that support ARB_shader_bit_encoding, allow floats and
ints to be packed together in the same varying slot, since
ARB_shader_bit_encoding allows floating-point values to be encoded
into ints without losing information.
- On back-ends that can mix interpolation modes within a single
varying slot, allow additional packing, with help from the driver
back-end. For instance, i965 gen6 and above can in principle mix
together all interpolation modes except for "flat" within a single
varying slot, if we do a hopefully small amount of back-end work.
- Allow a driver back-end to advertise a larger number of varying
components to the linker than it advertises to the client
program--this will allow us to ensure that varying packing *never*
fails. For example, on i965 gen6 and above, after the above
improvements are made, we should be able to pack any possible
combination of varyings with a maximum waste of 3 varying
components. That means, for example, that if the i965 driver
advertises 17 varying slots to the linker (== 68 varying
components), but advertises only 64 varying components to the the
client program, then varying packing will always succeed.
Note: I also have a new piglit test that exercises this code; I'll be
publishing that to the Piglit list ASAP.
[PATCH 01/10] glsl/lower_clip_distance: Update symbol table.
[PATCH 02/10] glsl/linker: Always invalidate shader ins/outs, even in corner cases.
[PATCH 03/10] glsl/linker: Make separate ir_variable field to mean "unmatched".
[PATCH 04/10] glsl: Create a field to store fractional varying locations.
[PATCH 05/10] glsl/linker: Defer recording transform feedback locations.
[PATCH 06/10] glsl/linker: Subdivide the first phase of varying assignment.
[PATCH 07/10] glsl/linker: Sort varyings by packing class, then vector size.
[PATCH 08/10] glsl: Add a lowering pass for packing varyings.
[PATCH 09/10] glsl/linker: Pack within compound varyings.
[PATCH 10/10] glsl/linker: Pack between varyings.
More information about the mesa-dev
mailing list