[Mesa-dev] [PATCH 2/6] glsl: Implement any(v) as any_nequal(v, false).

Ian Romanick idr at freedesktop.org
Tue Dec 1 13:24:36 PST 2015


On 12/01/2015 11:28 AM, Ilia Mirkin wrote:
> On Tue, Dec 1, 2015 at 2:13 PM, Ian Romanick <idr at freedesktop.org> wrote:
>> On 11/30/2015 04:41 PM, Ilia Mirkin wrote:
>>> On Mon, Nov 30, 2015 at 7:15 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>> On Mon, Nov 30, 2015 at 3:51 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>> On Mon, Nov 30, 2015 at 6:38 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>>>> On Mon, Nov 30, 2015 at 3:34 PM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>>>>> On Mon, Nov 30, 2015 at 6:32 PM, Matt Turner <mattst88 at gmail.com> wrote:
>>>>>>>> ---
>>>>>>>>  src/glsl/builtin_functions.cpp | 15 ++++++++++++++-
>>>>>>>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/src/glsl/builtin_functions.cpp b/src/glsl/builtin_functions.cpp
>>>>>>>> index 88f9a71..2f21ffc 100644
>>>>>>>> --- a/src/glsl/builtin_functions.cpp
>>>>>>>> +++ b/src/glsl/builtin_functions.cpp
>>>>>>>> @@ -538,6 +538,7 @@ private:
>>>>>>>>     ir_variable *in_var(const glsl_type *type, const char *name);
>>>>>>>>     ir_variable *out_var(const glsl_type *type, const char *name);
>>>>>>>>     ir_constant *imm(float f, unsigned vector_elements=1);
>>>>>>>> +   ir_constant *imm(bool b, unsigned vector_elements=1);
>>>>>>>>     ir_constant *imm(int i, unsigned vector_elements=1);
>>>>>>>>     ir_constant *imm(unsigned u, unsigned vector_elements=1);
>>>>>>>>     ir_constant *imm(double d, unsigned vector_elements=1);
>>>>>>>> @@ -3000,6 +3001,12 @@ builtin_builder::out_var(const glsl_type *type, const char *name)
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  ir_constant *
>>>>>>>> +builtin_builder::imm(bool b, unsigned vector_elements)
>>>>>>>> +{
>>>>>>>> +   return new(mem_ctx) ir_constant(b, vector_elements);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +ir_constant *
>>>>>>>>  builtin_builder::imm(float f, unsigned vector_elements)
>>>>>>>>  {
>>>>>>>>     return new(mem_ctx) ir_constant(f, vector_elements);
>>>>>>>> @@ -4358,7 +4365,13 @@ builtin_builder::_notEqual(builtin_available_predicate avail,
>>>>>>>>  ir_function_signature *
>>>>>>>>  builtin_builder::_any(const glsl_type *type)
>>>>>>>>  {
>>>>>>>> -   return unop(always_available, ir_unop_any, glsl_type::bool_type, type);
>>>>>>>> +   ir_variable *v = in_var(type, "v");
>>>>>>>> +   MAKE_SIG(glsl_type::bool_type, always_available, 1, v);
>>>>>>>> +
>>>>>>>> +   const unsigned vec_elem = v->type->vector_elements;
>>>>>>>> +   body.emit(ret(expr(ir_binop_any_nequal, v, imm(false, vec_elem))));
>>>>>>>
>>>>>>> This results in an extra comparison generated by the st_glsl_to_tgsi
>>>>>>> code, which can be annoying to get rid of... Why do you need to do
>>>>>>> this?
>>>>>>
>>>>>> Couldn't we fix the st_glsl_to_tgsi to handle that better? I wouldn't
>>>>>> be surprised to find such a thing in the wild.
>>>>>
>>>>> I guess it could special-case the exact IR generated there? The thing
>>>>> is that it just OR's everything together (when you have native integer
>>>>> support... which everything does outside of r300/nv30), which you can
>>>>> only do in a few select cases. If you'd like to implement that, I
>>>>> wouldn't object. As-is, it's a pessimization though.
>>>>
>>>> I'm not familiar with any of this code and I don't know how to test or
>>>> debug it -- sorry.
>>>>
>>>> It should be trivial to optimize if someone cares. I believe all you
>>>> need to do is skip emitting the SNE and instead copy src[0] to temp
>>>> under these circumstances:
>>>>
>>>>    if (ir->operands[0]->type == glsl_type::bool_type &&
>>>>        ir->operands[1]->as_constant() &&
>>>>        ir->operands[1]->as_constant()->is_zero())
>>>
>>> As long as you can build the gallium swrast (and enable debug), you
>>> can add ST_DEBUG=tgsi to get the generated TGSI printed out.
>>
>> Does the generated TGSI really matter?  We don't generally care if a
>> change causes more NIR instructions to be generated, for example...
>> unless it's a massive amount more.
> 
> Agreed.
> 
>>
>> The real question is what actually gets executed on the GPUs.  If we
>> produce more intermediate instructions, but the backends still generate
>> the same (or perhaps even better) code, it should be okay.  Does any GPU
>> actually emit worse code due to this change or the related changes later
>> in the series?
> 
> I full agree with your comments. If there's a sampler shader, I can
> easily test what nouveau does with it... I was going based on how I
> thought it would work though based on my familiarity with the various
> optimizations. I could have gotten it wrong though. The problem is
> that the boolean-ness of the values gets lost, and these optimizations
> become less safe. Or less detected.
> 
> I can guarantee that it'll generate an extra instruction nv30 since
> that doesn't have a compiler, so it'll go from just "DP4" to "SNE;
> DP4". But I don't know if I care *that* much.
> 
> So this is the test I came up with:
> 
> FRAG
> DCL IN[0], GENERIC[0], PERSPECTIVE
> DCL OUT[0], COLOR
> DCL TEMP[0..4], LOCAL
> IMM[0] UINT32 { 0, 0, 0, 0 }
> 0: MOV TEMP[0], IN[0]
> 1: USNE TEMP[0], TEMP[0], IMM[0]
> 2: OR TEMP[0].x, TEMP[0].xxxx, TEMP[0].yyyy
> 3: OR TEMP[0].x, TEMP[0].xxxx, TEMP[0].zzzz
> 4: OR TEMP[0].x, TEMP[0].xxxx, TEMP[0].wwww
> 5: MOV OUT[0], TEMP[0].xxxx
> 9: END

I think the more common case, and certainly the case that Matt was
targeting with the optimization in patch 6, is where the result of
any(v) is used in a conditional:

    if (any(v)) {
        do_this();
    } else {
        do_that();
    }

or

    result = any(v) ? a : b;

Looking at my very out-of-date shader-db, 100% of the uses excluding one
application fit one of these patterns.  That app has a bunch of macros
that do 'any(notEqual(v, vec#(0.0)))', so TGSI is already going to
generate extra instructions.

> So let's see what happens on nv50 (targeted at a G84 to avoid
> potential G80-related idiocy... don't think there would be any here,
> but who knows):
> 
> NV50_PROG_DEBUG=1 ./src/gallium/drivers/nouveau/nouveau_compiler -a 84 slt.tgsi
> 
> which ends up in:
> 
> EMIT: mov u32 $r4 0x00000000 (8)
> EMIT: set u32 $r1 ne $r1 $r4 (8)
> EMIT: set u32 $r2 ne $r2 $r4 (8)
> EMIT: set u32 $r3 ne $r3 $r4 (8)
> EMIT: set u32 $r0 ne $r0 $r4 (8)
> EMIT: or u32 $r1 $r1 $r2 (8)
> EMIT: or u32 $r1 $r1 $r3 (8)
> EMIT: or u32 $r3 $r1 $r0 (8)

It seems like the rotation of (x != 0) || (y != 0) || (z != 0) || (w !=
0) to (x | y | z | w) != 0 would really help here... and should be safe?
 You'd end up with:

EMIT: mov u32 $r4 0x00000000 (8)
EMIT: or u32 $r1 $r1 $r2 (8)
EMIT: or u32 $r1 $r1 $r3 (8)
EMIT: or u32 $r3 $r1 $r0 (8)
EMIT: set u32 $r3 ne $r3 $r4 (8)

> Naturally, those set's all go away if the USNE does. With -a c0 (i.e.
> fermi, which will be ~identical for everything later as well), we get:
> 
>   6: set u8 $p0 ne u32 $r1 $r63 (8)
>   7: set or u8 $p0 ne u32 $r2 $r63 $p0 (8)
>   8: set or u8 $p0 ne u32 $r3 $r63 $p0 (8)
>   9: set or u32 $r3 ne $r0 $r63 $p0 (8)
> 
> [$r63 is always 0]
> 
> And without the USNE it just has the or's:
> 
>   6: or u32 $r1 $r1 $r2 (8)
>   7: or u32 $r1 $r1 $r3 (8)
>   8: or u32 $r3 $r1 $r0 (8)
> 
> [I'm ignoring the unrelated bits, like moving stuff around,
> interpolation, etc]. If nothing else, this is one less instruction.
> 
> Basically the issue is that we don't have the information that the
> input is a *clean* bool, like we would if there were type info, and so

And for the record, I had always opposed NIR being typeless fir this
very reason.  In this case the type is, basically, a primitive form of
range tracking.  Sophisticated range tracking makes it obsolete, but
that's also a lot of work.

> we must "clean" it first. We could try to see if the input comes from
> another set... and we sometimes do, but for a limited quantity of
> scenarios, and I don't think this is one of them.
> 
>   -ilia



More information about the mesa-dev mailing list