[Mesa-dev] [PATCH 00/10] glsl: Implement varying packing.

Tue Dec 11 15:09:06 PST 2012

This patch series adds varying packing to Mesa, so that we can handle
varyings composed of things other than vec4's without using up extra
varying components.

For the initial implementation I've chosen a strategy that operates
exclusively at the GLSL IR level, so that it doesn't require the
cooperation of the driver back-ends.  This means that varying packing
should be immediately useful for all drivers.  However, there are some
types of varying packing that can't be done using GLSL IR alone (for
example, packing a "noperspective" varying and a "smooth" varying
together), but should be possible on some drivers with a small amount
of back-end work.  I'm deferring that work for a later patch series.
Also, packing of floats and ints together into the same "flat varying"
should be possible for drivers that implement
ARB_shader_bit_encoding--I'm also deferring that for a later patch
series.

The strategy is as follows:

- Before assigning locations to varyings, we sort them into "packing
  classes" based on base type and interpolation mode (this is to
  ensure that we don't try to pack floats with ints, or smooth with
  flat, for example).

- Within each packing class, we sort the varyings based on the number
  of vector elements.  Vec4's (as well as matrices and arrays composed
  of vec4's) are packed first, then vec2's, then scalars, since this
  allows us to align them all to their natural alignment boundary, so
  we avoid the performance penalty of "double parking" a varying
  across two varying slots.  Vec3's are packed last, double parking
  them if necessary.

- For any varying slot that doesn't contain exactly one vec4, we
  generate GLSL IR to manually pack/unpack the varying in the shader.
  For instance, the following fragment shader:

  varying vec2 a;
  varying vec2 b;
  varying vec3 c;
  varying vec3 d;
  main()
  {
    ...
  }

  would get rewritten as follows:

  varying vec4 packed0;
  varying vec4 packed1;
  varying vec4 packed2;
  vec2 a;
  vec2 b;
  vec3 c;
  vec3 d;
  main()
  {
    a = packed0.xy;
    b = packed0.zw;
    c = packed1.xyz;
    d.x = packed1.w; // d is "double parked" across slots 1 and 2
    d.yz = packed2.xy;
    ...
  }

  This GLSL IR is generated by a lowering pass, so that in the future
  we will have the option of disabling it for driver back-ends that
  are capable of natively understanding the packed varying format.

- Finally, the linker code to handle transform feedback is modified to
  account for varying packing (e.g. by feeding back just a subset of
  the components of a varying slot rather than the entire varying
  slot).  Fortunately transform feedback already has the
  infrastructure necessary to do this, since it was needed in order to
  implement glClipDistance.

I believe this is enough to be useful for the vast majority of
programs, and to get us passing the GLES3 conformance tests.

Additional improvements, which I'm planning to defer to later patch
series, include:

- Allow uints and ints to be packed together in the same varying slot.
  This should be possible on all back-ends, since ints and uints may
  be interconverted without losing information.

- On back-ends that support ARB_shader_bit_encoding, allow floats and
  ints to be packed together in the same varying slot, since
  ARB_shader_bit_encoding allows floating-point values to be encoded
  into ints without losing information.

- On back-ends that can mix interpolation modes within a single
  varying slot, allow additional packing, with help from the driver
  back-end.  For instance, i965 gen6 and above can in principle mix
  together all interpolation modes except for "flat" within a single
  varying slot, if we do a hopefully small amount of back-end work.

- Allow a driver back-end to advertise a larger number of varying
  components to the linker than it advertises to the client
  program--this will allow us to ensure that varying packing *never*
  fails.  For example, on i965 gen6 and above, after the above
  improvements are made, we should be able to pack any possible
  combination of varyings with a maximum waste of 3 varying
  components.  That means, for example, that if the i965 driver
  advertises 17 varying slots to the linker (== 68 varying
  components), but advertises only 64 varying components to the the
  client program, then varying packing will always succeed.

Note: I also have a new piglit test that exercises this code; I'll be
publishing that to the Piglit list ASAP.

[PATCH 01/10] glsl/lower_clip_distance: Update symbol table.
[PATCH 02/10] glsl/linker: Always invalidate shader ins/outs, even in corner cases.
[PATCH 03/10] glsl/linker: Make separate ir_variable field to mean "unmatched".
[PATCH 04/10] glsl: Create a field to store fractional varying locations.
[PATCH 05/10] glsl/linker: Defer recording transform feedback locations.
[PATCH 06/10] glsl/linker: Subdivide the first phase of varying assignment.
[PATCH 07/10] glsl/linker: Sort varyings by packing class, then vector size.
[PATCH 08/10] glsl: Add a lowering pass for packing varyings.
[PATCH 09/10] glsl/linker: Pack within compound varyings.
[PATCH 10/10] glsl/linker: Pack between varyings.