[Mesa-dev] [PATCH v2 1/2] nv50/ir: optimize signed integer modulo by pow-of-2

Tobias Klausmann tobias.johannes.klausmann at mni.thm.de
Sun Nov 12 21:34:26 UTC 2017


On 11/12/17 4:16 PM, Ilia Mirkin wrote:
> On Sun, Nov 12, 2017 at 9:09 AM, Tobias Klausmann
> <tobias.johannes.klausmann at mni.thm.de> wrote:
>> On 11/12/17 3:53 AM, Ilia Mirkin wrote:
>>> It's common to use signed int modulo in GLSL. As it happens, the GLSL
>>> specs allow the result to be undefined, but that seems fairly
>>> surprising. It's not that much more effort to get it right, at least for
>>> positive modulo operators.
>>>
>>> Signed-off-by: Ilia Mirkin <imirkin at alum.mit.edu>
>>> ---
>>>
>>> v1 -> v2:
>>>
>>>    - fix SHLADD folding with relaxed isPow2 function
>>>    - use correct FILE_PREDICATE/FILE_FLAGS variant on nv50/nvc0
>>>
>>>    src/gallium/drivers/nouveau/codegen/nv50_ir.cpp    |  8 +------
>>>    .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 28
>>> +++++++++++++++++++---
>>>    2 files changed, 26 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
>>> index 4076177e56d..657784163b3 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
>>> @@ -429,13 +429,7 @@ ImmediateValue::isNegative() const
>>>    bool
>>>    ImmediateValue::isPow2() const
>>>    {
>>> -   switch (reg.type) {
>>> -   case TYPE_U8:
>>> -   case TYPE_U16:
>>> -   case TYPE_U32: return util_is_power_of_two(reg.data.u32);
>>> -   default:
>>> -      return false;
>>> -   }
>>> +   return util_is_power_of_two(reg.data.u32);
>>>    }
>>
>>
>> maybe go back to special casing UINT/INT again here, instead of fixing up
>> SHLADD in the next hunk?
> Immediate's type is fairly random and can't be relied on for anything.
> It's the type of the consuming instruction which defines its meaning.
> Until it's associated with an instruction, it's just bits.
>
>>
>>
>>>      void
>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> index 7e4e193e3d2..5028fd71951 100644
>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>> @@ -1054,6 +1054,7 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue
>>> &imm0, int s)
>>>             i->op = OP_ADD;
>>>          } else
>>>          if (s == 1 && !imm0.isNegative() && imm0.isPow2() &&
>>> +          !isFloatType(i->dType) &&
>>>              target->isOpSupported(OP_SHLADD, i->dType)) {
>>>             i->op = OP_SHLADD;
>>>             imm0.applyLog2();
>>> @@ -1163,10 +1164,31 @@ ConstantFolding::opnd(Instruction *i,
>>> ImmediateValue &imm0, int s)
>>>          break;
>>>         case OP_MOD:
>>> -      if (i->sType == TYPE_U32 && imm0.isPow2()) {
>>> +      if (s == 1 && imm0.isPow2()) {
>>>             bld.setPosition(i, false);
>>> -         i->op = OP_AND;
>>> -         i->setSrc(1, bld.loadImm(NULL, imm0.reg.data.u32 - 1));
>>> +         if (i->sType == TYPE_U32) {
>>> +            i->op = OP_AND;
>>> +            i->setSrc(1, bld.loadImm(NULL, imm0.reg.data.u32 - 1));
>>> +         } else if (i->sType == TYPE_S32) {
>>> +            // Do it on the absolute value of the input, and then restore
>>> the
>>> +            // sign. The only odd case is MIN_INT, but that should work
>>> out
>>> +            // as well, since MIN_INT mod any power of 2 is 0.
>>> +            //
>>> +            // Technically we don't have to do any of this since MOD is
>>> +            // undefined with negative arguments in GLSL, but this seems
>>> like
>>> +            // the nice thing to do.
>>> +            Value *abs = bld.mkOp1v(OP_ABS, TYPE_S32, bld.getSSA(),
>>> i->getSrc(0));
>>> +            Value *neg, *v1, *v2;
>>> +            bld.mkCmp(OP_SET, CC_LT, TYPE_S32, (neg = bld.getSSA(1,
>>> prog->getTarget()->nativeFile(FILE_PREDICATE))), TYPE_S32, i->getSrc(0),
>>> bld.loadImm(NULL, 0));
>>> +            Value *mod = bld.mkOp2v(OP_AND, TYPE_U32, bld.getSSA(), abs,
>>> bld.loadImm(NULL, imm0.reg.data.u32 - 1));
>>
>>
>> have a few linebreakes here, but i don't have to tell ya that (still
>> considered experimental? :D)
> Ehh yeah.
>
>>
>>> +            bld.mkOp1(OP_NEG, TYPE_S32, (v1 = bld.getSSA()), mod)
>>> +               ->setPredicate(CC_P, neg);
>>> +            bld.mkOp1(OP_MOV, TYPE_S32, (v2 = bld.getSSA()), mod)
>>> +               ->setPredicate(CC_NOT_P, neg);
>>> +            newi = bld.mkOp2(OP_UNION, TYPE_S32, i->getDef(0), v1, v2);
>>> +
>>> +            delete_Instruction(prog, i);
>>> +         }
>>>          }
>>>          break;
>>>
>>
>>
>> The patch looks fine otherwise, i will test it a bit later and see if i can
>> trigger something else with it!
> OK thanks!
>

Yeah ok, this patch looks fine! Out of couriositiy i ran shader-db:


total instructions in shared programs : 5301838 -> 5304696 (0.05%)
total gprs used in shared programs    : 625594 -> 625437 (-0.03%)
total local used in shared programs   : 19816 -> 20152 (1.70%)
total shared used in shared programs   : 355224 -> 355224 (0.00%)

                 local     shared        gpr       inst      bytes
     helped           0           0          94 6           6
       hurt           0           0           0         117 117

but this includes a newly compilable shader:

nvidia_shaderdb/Civilization VI/1501528914/90.shader_test - type: 1, 
local: 0, shared: 0, gpr: 12, inst: 48, bytes: 512


Reviewed-by & Tested-by: Tobias Klausmann 
<tobias.johannes.klausmann at mni.thm.de>

Greetings,

Tobias



More information about the mesa-dev mailing list