[Mesa-dev] [PATCH 00/15] i965/gen6+: Support 128 varying components.

Paul Berry stereotype441 at gmail.com
Tue Sep 3 16:18:12 PDT 2013


GL 3.2 requires us to support 128 varying components for geometry
shader outputs and fragment shader inputs, and 64 varying components
otherwise.  But there's no hardware limitation that restricts us to 64
varying components, and core Mesa doesn't currently allow different
stages to have different maximum values, so I've gone ahead and
enabled 128 varying components for all stages.  This has the advantage
of increased test coverage, since piglit already has a number of tests
to validate that the maximum advertised number of varying components
can be exchanged between VS and FS.  I've also gone ahead and
increased the limit for gen6 as well as gen7, since it required very
little extra work.

Previously, on gen6+, we relied on the SF/SBE stage of the pipeline to
reorder the outputs from the GS (or VS) to match the input ordering
required by the FS.  This allowed us to determine the order of FS
inputs solely based on the FS, so we avoided recompiles when separate
shader objects were in use.  But there's a problem with that: the
SF/SBE stage can't arbitrarily reorder more than 16 VUE slots (1 slot
= 4 varying components).  To avoid introducing additional recompiles
with previously-supported shaders, I've taken a hybrid approach to
choosing the FS input ordering: if the FS uses 16 or fewer input
varying slots, then it orders them solely based on its own
requirements.  If it uses more than 16 input varying slots, then it
orders them according to the GS (or VS) output VUE map, so that the
SF/SBE stage doesn't have to do any reordering.

Patches 1-3 modify the FS so that it exposes the order of input
varyings it needs via prog_data.

Patches 4-6 modify the SF/SBE setup so that it consults the FS
prog_data when choosing how to re-order varyings (previously, it
implicitly assumed an order that happened to match the order the FS
was using).

Patch 7 is a minor optimization made possible by patches 1-6: now that
the SF/SBE setup no longer makes implicit assumptions about the order
of the FS inputs, the FS no longer has to have dummy input slots for
gl_FragCoord and gl_FrontFacing.

Patch 8 tweaks the VUE map slightly so that it is uniquely determined
by a single 64-bit bitfield.  This will allow us to store the bitfield
in the FS program key rather than the entire VUE map.

Patch 9 is a minor optimization made possible by patch 8: now that the
VUE map is uniquely determined by a single 64-bit bitfield, we no
longer have to store the entire VUE map in the GS program key.

Patches 10-11 modify the FS to order its inputs according to the GS
(or VS) output VUE map when there are more than 16 input slots in use.

Patch 12 adjusts the VS and GS code so that it can output all 32
varyings to the VUE, even if it requires more than two URB writes to
do so.

Patches 13-14 make some minor gen6-specific adjustments to allow for
the larger URB entries needed for 32 vayings: the Gen6 transform
feedback code sometimes needs to do 2 URB writes instead of 1, and an
incorrect assertion in the gen6 URB setup needs to be fixed.

Patch 15 increases the value of MaxVarying from 16 to 32 for gen6+.

The series is available on branch "increase-max-varyings" of
https://github.com/stereotype441/mesa.git.  I've piglit tested it on
gen5, gen6, and gen7.

[PATCH 01/15] i965/fs: Expose "urb_setup" as part of brw_wm_prog_data.
[PATCH 02/15] i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.
[PATCH 03/15] i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.
[PATCH 04/15] i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.
[PATCH 05/15] i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.
[PATCH 06/15] i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.
[PATCH 07/15] i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_Frontfacing.
[PATCH 08/15] i965/gen6+: Remove VUE map dependency on userclip_active.
[PATCH 09/15] i965/gs: Stop storing an input VUE map in the GS program key.
[PATCH 10/15] i965/fs: Simplify computation of key.input_slots_valid during precompile.
[PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage.
[PATCH 12/15] i965/vec4: Generate URB writes using a loop.
[PATCH 13/15] i965/gen6: Fix assertions on VS/GS URB size.
[PATCH 14/15] i965/ff_gs: Generate URB writes using a loop.
[PATCH 15/15] i965/gen6+: Support 128 varying components.


More information about the mesa-dev mailing list