[Bug 89580] Implement a NIR -> vec4 pass

Thu Sep 3 08:19:37 PDT 2015

https://bugs.freedesktop.org/show_bug.cgi?id=89580

--- Comment #35 from Eduardo Lima Mitev <elima at igalia.com> ---
About time for another update:

The NIR-vec4 backend is the default now. Even though there are a considerable
amount of code-quality regressions compared to the vec4_visitor backend, it has
been set default already to hit the next release window and get proper testing.
For reference, here is where we stand now compared to non-NIR path:

total instructions in shared programs: 1853747 -> 1801527 (-2.82%)
instructions in affected programs:     1694180 -> 1641960 (-3.08%)
helped:                                6913
HURT:                                  10932
GAINED:                                0
LOST:                                  0

At this point, the huge majority of regressions come from lack of register
coalescing. To illustrate with a simple example, chunks like this is what NIR
is giving to backends:

r2 = fdot4 r0, r1
r3.x = imov r2.x

When we could simply have:

r3.x = fdot r0, r1

This is mostly nir_lower_vec_to_movs's fault.

While this behavior can and should be corrected by the opt_register_coalesce
pass, we have been trying to mitigate the problem at NIR level too, for two
reasons: 1) to give backends a better, more optimized code, that uses less
registers, b) since NIR is shared by different backends, it is desirable to
provide an optimization pass that more backends could potentially benefit from.

At this point, we have produced a new NIR pass which we called
'nir_lower_vec_and_coalesce', that will take the NIR shader right after SSA,
and will try to coalesce registers by propagating vecN destination components
to the instructions that define its sources. The pass will produce new reduced
vecN instructions, containing the channels that were not propagated. Hence,
this pass is compatible with lower_vec_to_movs(), and if in the future we
decide to disable it, we can still use lower_vec_and_coalesce transparently.

Right now, the pass is very conservative about the conditions in which it will
propagate register components. The reason is that at this point we are very
interested in feedback about the whole idea, before we add more complexity to
the pass.

The branch is here:
https://github.com/Igalia/mesa/commits/elima/nir-vec4-quality

There are 2 patches only, one that adds the pass to NIR; and a second that just
activates it on i965 for non-scalar shaders. Only the first one is interesting.

With this pass enabled, we get these shader-db VS results against vec4_visitor:

total instructions in shared programs: 1853747 -> 1762126 (-4.94%)
instructions in affected programs:     1681255 -> 1589634 (-5.45%)
helped:                                7751
HURT:                                  9344
GAINED:                                0
LOST:                                  0

And these against NIR-vec4 as in current master:

total instructions in shared programs: 1801527 -> 1762126 (-2.19%)
instructions in affected programs:     1156923 -> 1117522 (-3.41%)
helped:                                10283
HURT:                                  1281
GAINED:                                0
LOST:                                  0

It is not very impressive but there are a lot of clear opportunities left out,
that would add some complexity that is maybe not worth at this point.

There are no piglit or dEQP regressions observed. 

What do you think?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20150903/7270ba79/attachment-0001.html>