NIR: is_used_once breaks multi-pass rendering

Thu Jan 20 09:39:33 UTC 2022

There are precise rules for when calculations in GL must return the
same result, which are laid out in Appendix A ("Invariance"). The
relevant rule here:

"Rule 4 The same vertex or fragment shader will produce the same result when
run multiple times with the same input. The wording ‘the same shader’ means a
program object that is populated with the same source strings, which
are compiled
and then linked, possibly multiple times, and which program object is
then executed
using the same GL state vector. Invariance is relaxed for shaders with
side effects,
such as accessing atomic counters (see section A.5)"

The key part is "using the same GL state vector." In particular that
includes which buffers are attached - and the entire program has to be
the same, which allows varying linking optimizations. The intent of
this was probably exactly to enable these sorts of optimizations based
on what's using a value while allowing aggressive cross-stage
optimizations. This means that in your example, the app is broken and
it should be using "invariant gl_Position;" (i.e. what the app
workaround does).

There's a question of whether apps are broken often enough that we
ought to enable vs_position_always_invariant by default. That will
obviously come with some performance cost, although maybe it's
acceptable enough in practice.

Connor

On Thu, Jan 20, 2022 at 9:31 AM Marek Olšák <maraeo at gmail.com> wrote:
>
> Hi,
>
> "is_used_once" within an inexact transformation in nir_opt_algebraic can lead to geometry differences with multi-pass rendering, causing incorrect output. Here's an example to proof this:
>
> Let's assume there is a pass that writes out some intermediate value from the position calculation as a varying. Let's assume there is another pass that does the same thing, but only draws to the depth buffer, so varyings are eliminated. The second pass would get "is_used_once" because there is just the position, and let's assume there is an inexact transformation with "is_used_once" that matches that. On the other hand, the first pass wouldn't get "is_used_once" because there is the varying. Now the same position calculation is different for each pass, causing depth test functions commonly used in multi-pass rendering such as EQUAL to fail.
>
> The application might even use the exact same shader for both passes, and the driver might just look for depth-only rendering and remove the varyings based on that. Or it can introduce more "is_used_once" cases via uniform inlining. From the app's point of view, the positions should be identical between both passes if it's the exact same shader.
>
> The workaround we have for this issue is called "vs_position_always_invariant", which was added for inexact FMA fusing, but it works with all inexact transformations containing "is_used_once".
>
> This issue could be exacerbated by future optimizations.
>
> Some of the solutions are:
> - Remove "is_used_once" (safe)
> - Enable vs_position_always_invariant by default (not helpful if the data flow is shader->texture->shader->position)
> - Always suppress inexact transformations containing "is_used_once" for all instructions contributing to the final position value (less aggressive than vs_position_always_invariant; it needs a proof that it's equivalent to vs_position_always_invariant in terms of invariance, not behavior)
> - Continue using app workarounds.
>
> Just some food for thought.
>
> Marek