[Mesa-dev] [PATCH 0/8] i965/gs: Use DUAL_INSTANCED mode to ease register pressure.
stereotype441 at gmail.com
Thu Oct 17 17:14:03 CEST 2013
Previously, i965 geometry shaders always operated in DUAL_OBJECT mode,
which is similar to vertex shader operation in that two independent
sets of inputs get dispatched to a single SIMD4x2 geometry shader
thread, which executes them both in parallel.
When register usage is tight, we need to switch to a mechanism that
uses fewer registers. In an ideal world we'd fall back to SINGLE
mode, in which a single set of inputs is dispatched to a SIMD4x1
geometry shader thread. Effectively this makes twice as many
registers available, since it allows independent data to be
interleaved into the lower and upper halves of each register.
Unfortunately, we don't yet have the infrastructure in the vec4
back-end to support interleaving all the registers. So we do the next
best thing, which is to use DUAL_INSTANCED dispatch mode. In this
mode, a single set of geometry shader inputs is delivered to the
shader in interleaved fashion (as would happen in SINGLE mode), but
the shader operates as a SIMD4x2 shader (so all other registers are
non-interleaved). If the geometry shader is instanced, then up to two
instances may be dispatched to the geometry shader at once; otherwise,
each geometry shader invocation runs in its own thread, with the
execution mask set appropriately. Since we don't support instanced
geometry shaders yet, DUAL_INSTANCED and SINGLE modes are for all
intents and purposes equivalent, except that we don't have to do as
much back-end register interleaving work.
The compilation strategy for choosing between DUAL_INSTANCED and
DUAL_OBJECT modes is similar to what we do for 8-wide vs. 16-wide
fragment shaders. First we try compiling the shader in DUAL_OBJECT
mode with register spilling disabled. If that fails, we fall back to
DUAL_INSTANCED mode and compile with register spilling enabled.
Unfortunately, even when using DUAL_INSTANCED mode we still can't
support 128 geometry shader input components, due to other limitations
in our vec4 back-end code. So the final patch of the series reduces
gl_MaxGeometryInputComponents to 64, the minimum required by the spec.
This series needs to be applied atop "vbo: Make
vbo_sw_primitive_restart optionally count primitives." and "i965/gs:
Fix gl_PrimitiveIDIn when using SW primitive restart.", which are on
the mailing list but haven't been reviewed yet. To see the series in
context, please check out branch "gs-phase-6" from
[PATCH 1/8] i965/vec4: Add the ability for attributes to be interleaved.
[PATCH 2/8] i965/vec4: if register allocation fails, don't try to schedule.
[PATCH 3/8] i965/vec4: Add the ability to suppress register spilling.
[PATCH 4/8] i965/gs: Add the ability to compile a DUAL_INSTANCED geometry shader.
[PATCH 5/8] i965/gs: Fix up gl_PointSize input swizzling for DUAL_INSTANCED gs.
[PATCH 6/8] i965/gs: fix up primitive ID workaround for DUAL_INSTANCE dispatch.
[PATCH 7/8] i965/gs: If a DUAL_OBJECT gs would spill, fall back to DUAL_INSTANCED.
[PATCH 8/8] i965: Reduce gl_MaxGeometryInputComponents to 64.
More information about the mesa-dev