[Mesa-dev] Loop- and conditional-safe TGSI register merging
tomastrnka at gmx.com
Sun Jun 29 08:05:00 PDT 2014
Following is a patch fixing Mesa's inability to run several complex GLSL
shader benchmarks. I had never been able to successfully run the Pixmark
Piano and Volplosion tests from the GpuTest benchmark suite on my RV670 card.
They would always fail to render anything and report that "translation from
TGSI failed" (and using the LLVM backend wouldn't help, either).
After some investigation I've found out that the problem is due to the quite
complex main GLSL fragment shader programs that essentially render the whole
scene. The GLSL code is not that difficult to grasp, but after compilation to
TGSI together with heavy function inlining the resulting TGSI program is
essentially a single loop with a very long body (for Volplosion) or a
structure of several nested loop levels with intermixed conditionals (Piano).
In both cases there are hundreds of registers used within the loop, even
though each register is usually written once and then read a few instructions
later, never to be touched again.
There is a register merging pass following the translation to TGSI, however
the algorithm there completely avoids trying to optimize anything inside
loops. Thus the whole abomination using >300 GPRs is directly passed to the
r600g driver that obviously can't handle it (as the HW has only 128 GPRs) .
I've rewritten the core of the register merging algorithm to be able to cope
in the presence of (almost) arbitrarily nested loop and conditional
structures. With this patch (tested on master), the Pixmark tests finally
work just fine and some of the shadertoy demos have started working for me,
too. I had been using this patch on top of Mesa 10.1.x and 10.2.x for several
weeks now in daily use (including 3D gaming) and have seen no regressions.
I'm unfortunately unable to run piglit at all as it fails the sanity check
due to some mysterious z-buffer readback inaccuracy.
I welcome any comments and suggestions (as this is my first Mesa
contribution), but please CC me as I'm not subscribed to the list.
 Actually, there's a surprising catch to this as the reason for failure is
not as straightforward: the translation fails in check_and_set_bank_swizzle,
but not because there are not enough GPRs, but because the code in
check_vector (r600_asm.c) silently treats anything with index > 128 as a
constant file and it then runs out of available cfile read ports. If I'm not
misunderstanding this completely, it also means that programs using just
around 130 GPRs will compile with no errors, silently trying to use the first
few constants instead of the last GPRs. Maybe the error checking there could
be improved a little.
Tomáš Trnka (1):
glsl_to_tgsi: Loop- and conditional-safe TGSI register merging
src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 136
1 file changed, 108 insertions(+), 28 deletions(-)
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 2214 bytes
Desc: not available
More information about the mesa-dev