[Mesa-dev] [PATCH] nvc0/ir: use the combined tid special register
Karol Herbst
kherbst at redhat.com
Sat Jul 7 14:59:23 UTC 2018
On Sat, Jul 7, 2018 at 12:56 PM, Rhys Perry <pendingchaos02 at gmail.com> wrote:
> I don't really see how things like RDSV(COMBINED_TID) + EXTBF + EXTBF
> can be improved unless you create separate RDSV instructions (which is
> against the point of this patch) or merge the EXTBF with later
> instructions (and I don't really see how that can be done).
>
> It's only increases the size of gl_LocalInvocationID.xy or
> gl_LocalInvocationID.xyz by at most one instruction and shaders
> typically by 1 or 2 instructions (or rarely 3 or even 4). As a result,
> gl_LocalInvocationID.xy and gl_LocalInvocationID.xyz only creates one
> RDSV, instead of two or three.
>
yeah, you are right. I was looking at the shaders getting more than 1
instructions and it was cases where there were multiple instances of
the rdsv(TID), where all got replaced with its own rdsv(combined_tid),
so this should be fine in the end.
Reviewed-by: Karol Herbst <kherbst at redhat.com>
> On Sat, Jul 7, 2018 at 2:46 AM, Karol Herbst <kherbst at redhat.com> wrote:
>> anyway, I think it might make sense to take a look at the shaders hurt
>> most as I suspect there might be a way to improve the situation a
>> little
>>
>> On Sat, Jul 7, 2018 at 3:38 AM, Karol Herbst <kherbst at redhat.com> wrote:
>>> okay right, if loading those special regs is indeed more expensive
>>> than doing the read + a few extbf then I see the point of this
>>> optimization
>>>
>>> On Sat, Jul 7, 2018 at 2:46 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
>>>> Are they? Fewer special reg loads = better...
>>>>
>>>> On Fri, Jul 6, 2018 at 8:31 PM, Karol Herbst <kherbst at redhat.com> wrote:
>>>>> somehow it doesn't really look like that it is worth the effort as the
>>>>> generated shaders are worse in avg?
>>>>>
>>>>> On Fri, Jul 6, 2018 at 10:32 PM, Rhys Perry <pendingchaos02 at gmail.com> wrote:
>>>>>> This patch doesn't touch NTID since it seems very difficult (or
>>>>>> impossible) to generate. Seemingly because the state tracker or glsl
>>>>>> compiler is turning gl_WorkGroupSize into a immediate. Such a
>>>>>> transformation is not possible with GL_ARB_compute_variable_group_size
>>>>>> but gl_WorkGroupSize is not available when that extension is enabled.
>>>>>>
>>>>>> On Kepler+, it seems to end up being lowered into constant buffer loads anyway.
>>>>>>
>>>>>> On Fri, Jul 6, 2018 at 9:21 PM, Rhys Perry <pendingchaos02 at gmail.com> wrote:
>>>>>>> total instructions in shared programs : 5804448 -> 5804690 (0.00%)
>>>>>>> total gprs used in shared programs : 670065 -> 670065 (0.00%)
>>>>>>> total shared used in shared programs : 548832 -> 548832 (0.00%)
>>>>>>> total local used in shared programs : 21068 -> 21068 (0.00%)
>>>>>>>
>>>>>>> local shared gpr inst bytes
>>>>>>> helped 0 0 0 5 5
>>>>>>> hurt 0 0 0 191 191
>>>>>>>
>>>>>>> Signed-off-by: Rhys Perry <pendingchaos02 at gmail.com>
>>>>>>> ---
>>>>>>> Instead of combining SV_TID reads into a SV_COMBINED_TID read, this has
>>>>>>> SV_TID reads lowered into a SV_COMBINED_TID read which can them be
>>>>>>> combined by CSE and then turned into a SV_TID read by AlgebraicOpt if it
>>>>>>> doesn't create another RDSV.
>>>>>>>
>>>>>>> src/gallium/drivers/nouveau/codegen/nv50_ir.h | 1 +
>>>>>>> .../drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 1 +
>>>>>>> .../drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 1 +
>>>>>>> .../drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 1 +
>>>>>>> .../nouveau/codegen/nv50_ir_lowering_nv50.cpp | 3 ++
>>>>>>> .../nouveau/codegen/nv50_ir_lowering_nvc0.cpp | 12 +++++++
>>>>>>> .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 40 ++++++++++++++++++++++
>>>>>>> .../drivers/nouveau/codegen/nv50_ir_print.cpp | 1 +
>>>>>>> .../nouveau/codegen/nv50_ir_target_nv50.cpp | 1 +
>>>>>>> 9 files changed, 61 insertions(+)
>>>>>>>
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>>>>>>> index f4f3c70888..0b220cc48d 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
>>>>>>> @@ -453,6 +453,7 @@ enum SVSemantic
>>>>>>> SV_TESS_INNER,
>>>>>>> SV_TESS_COORD,
>>>>>>> SV_TID,
>>>>>>> + SV_COMBINED_TID,
>>>>>>> SV_CTAID,
>>>>>>> SV_NTID,
>>>>>>> SV_GRIDID,
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>>>>>> index 647d1a5d0e..2118c3153f 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
>>>>>>> @@ -2297,6 +2297,7 @@ CodeEmitterGK110::getSRegEncoding(const ValueRef& ref)
>>>>>>> case SV_INVOCATION_ID: return 0x11;
>>>>>>> case SV_YDIR: return 0x12;
>>>>>>> case SV_THREAD_KILL: return 0x13;
>>>>>>> + case SV_COMBINED_TID: return 0x20;
>>>>>>> case SV_TID: return 0x21 + SDATA(ref).sv.index;
>>>>>>> case SV_CTAID: return 0x25 + SDATA(ref).sv.index;
>>>>>>> case SV_NTID: return 0x29 + SDATA(ref).sv.index;
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>>>>>>> index 26826d6360..694d1b10a3 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
>>>>>>> @@ -267,6 +267,7 @@ CodeEmitterGM107::emitSYS(int pos, const Value *val)
>>>>>>> case SV_INVOCATION_ID : id = 0x11; break;
>>>>>>> case SV_THREAD_KILL : id = 0x13; break;
>>>>>>> case SV_INVOCATION_INFO: id = 0x1d; break;
>>>>>>> + case SV_COMBINED_TID : id = 0x20; break;
>>>>>>> case SV_TID : id = 0x21 + val->reg.data.sv.index; break;
>>>>>>> case SV_CTAID : id = 0x25 + val->reg.data.sv.index; break;
>>>>>>> case SV_LANEMASK_EQ : id = 0x38; break;
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>>>>>> index d85fdda56f..b6e35dd0ee 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
>>>>>>> @@ -1990,6 +1990,7 @@ CodeEmitterNVC0::getSRegEncoding(const ValueRef& ref)
>>>>>>> case SV_INVOCATION_ID: return 0x11;
>>>>>>> case SV_YDIR: return 0x12;
>>>>>>> case SV_THREAD_KILL: return 0x13;
>>>>>>> + case SV_COMBINED_TID: return 0x20;
>>>>>>> case SV_TID: return 0x21 + SDATA(ref).sv.index;
>>>>>>> case SV_CTAID: return 0x25 + SDATA(ref).sv.index;
>>>>>>> case SV_NTID: return 0x29 + SDATA(ref).sv.index;
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
>>>>>>> index 36ab837f6e..1f0fd466a9 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nv50.cpp
>>>>>>> @@ -1201,6 +1201,9 @@ NV50LoweringPreSSA::handleRDSV(Instruction *i)
>>>>>>> bld.mkMov(def, bld.mkImm(0));
>>>>>>> }
>>>>>>> break;
>>>>>>> + case SV_COMBINED_TID:
>>>>>>> + bld.mkMov(def, tid);
>>>>>>> + break;
>>>>>>> case SV_SAMPLE_POS: {
>>>>>>> Value *off = new_LValue(func, FILE_ADDRESS);
>>>>>>> bld.mkOp1(OP_RDSV, TYPE_U32, def, bld.mkSysVal(SV_SAMPLE_INDEX, 0));
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>>>>>>> index 597dcdffbe..1410cf26c8 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_nvc0.cpp
>>>>>>> @@ -2576,6 +2576,18 @@ NVC0LoweringPass::handleRDSV(Instruction *i)
>>>>>>> // TGSI backend may use 4th component of TID,NTID,CTAID,NCTAID
>>>>>>> i->op = OP_MOV;
>>>>>>> i->setSrc(0, bld.mkImm((sv == SV_NTID || sv == SV_NCTAID) ? 1 : 0));
>>>>>>> + } else
>>>>>>> + if (sv == SV_TID) {
>>>>>>> + // Help CSE combine TID fetches
>>>>>>> + Value *tid = bld.mkOp1v(OP_RDSV, TYPE_U32, bld.getScratch(),
>>>>>>> + bld.mkSysVal(SV_COMBINED_TID, 0));
>>>>>>> + i->op = OP_EXTBF;
>>>>>>> + i->setSrc(0, tid);
>>>>>>> + switch (sym->reg.data.sv.index) {
>>>>>>> + case 0: i->setSrc(1, bld.mkImm(0x1000)); break;
>>>>>>> + case 1: i->setSrc(1, bld.mkImm(0x0a10)); break;
>>>>>>> + case 2: i->setSrc(1, bld.mkImm(0x061a)); break;
>>>>>>> + }
>>>>>>> }
>>>>>>> if (sv == SV_VERTEX_COUNT) {
>>>>>>> bld.setPosition(i, true);
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>>>> index 39177bd044..4dd3a6cdaa 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
>>>>>>> @@ -1654,6 +1654,7 @@ ModifierFolding::visit(BasicBlock *bb)
>>>>>>> // SLCT(a, b, const) -> cc(const) ? a : b
>>>>>>> // RCP(RCP(a)) -> a
>>>>>>> // MUL(MUL(a, b), const) -> MUL_Xconst(a, b)
>>>>>>> +// EXTBF(RDSV(COMBINED_TID)) -> RDSV(TID)
>>>>>>> class AlgebraicOpt : public Pass
>>>>>>> {
>>>>>>> private:
>>>>>>> @@ -1671,6 +1672,7 @@ private:
>>>>>>> void handleCVT_EXTBF(Instruction *);
>>>>>>> void handleSUCLAMP(Instruction *);
>>>>>>> void handleNEG(Instruction *);
>>>>>>> + void handleEXTBF_RDSV(Instruction *);
>>>>>>>
>>>>>>> BuildUtil bld;
>>>>>>> };
>>>>>>> @@ -2175,6 +2177,41 @@ AlgebraicOpt::handleNEG(Instruction *i) {
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> +// EXTBF(RDSV(COMBINED_TID)) -> RDSV(TID)
>>>>>>> +void
>>>>>>> +AlgebraicOpt::handleEXTBF_RDSV(Instruction *i)
>>>>>>> +{
>>>>>>> + Instruction *rdsv = i->getSrc(0)->getUniqueInsn();
>>>>>>> + if (rdsv->op != OP_RDSV ||
>>>>>>> + rdsv->getSrc(0)->asSym()->reg.data.sv.sv != SV_COMBINED_TID)
>>>>>>> + return;
>>>>>>> + // Avoid creating more RDSV instructions
>>>>>>> + if (rdsv->getDef(0)->refCount() > 1)
>>>>>>> + return;
>>>>>>> +
>>>>>>> + ImmediateValue imm;
>>>>>>> + if (!i->src(1).getImmediate(imm))
>>>>>>> + return;
>>>>>>> +
>>>>>>> + int index;
>>>>>>> + if (imm.isInteger(0x1000))
>>>>>>> + index = 0;
>>>>>>> + else
>>>>>>> + if (imm.isInteger(0x0a10))
>>>>>>> + index = 1;
>>>>>>> + else
>>>>>>> + if (imm.isInteger(0x061a))
>>>>>>> + index = 2;
>>>>>>> + else
>>>>>>> + return;
>>>>>>> +
>>>>>>> + bld.setPosition(i, false);
>>>>>>> +
>>>>>>> + i->op = OP_RDSV;
>>>>>>> + i->setSrc(0, bld.mkSysVal(SV_TID, index));
>>>>>>> + i->setSrc(1, NULL);
>>>>>>> +}
>>>>>>> +
>>>>>>> bool
>>>>>>> AlgebraicOpt::visit(BasicBlock *bb)
>>>>>>> {
>>>>>>> @@ -2215,6 +2252,9 @@ AlgebraicOpt::visit(BasicBlock *bb)
>>>>>>> case OP_NEG:
>>>>>>> handleNEG(i);
>>>>>>> break;
>>>>>>> + case OP_EXTBF:
>>>>>>> + handleEXTBF_RDSV(i);
>>>>>>> + break;
>>>>>>> default:
>>>>>>> break;
>>>>>>> }
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
>>>>>>> index cbb21f5f72..ee3506fbae 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_print.cpp
>>>>>>> @@ -306,6 +306,7 @@ static const char *SemanticStr[SV_LAST + 1] =
>>>>>>> "TESS_INNER",
>>>>>>> "TESS_COORD",
>>>>>>> "TID",
>>>>>>> + "COMBINED_TID",
>>>>>>> "CTAID",
>>>>>>> "NTID",
>>>>>>> "GRIDID",
>>>>>>> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
>>>>>>> index dc73231394..1ad3467337 100644
>>>>>>> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
>>>>>>> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
>>>>>>> @@ -257,6 +257,7 @@ TargetNV50::getSVAddress(DataFile shaderFile, const Symbol *sym) const
>>>>>>> case SV_NTID:
>>>>>>> return 0x2 + 2 * sym->reg.data.sv.index;
>>>>>>> case SV_TID:
>>>>>>> + case SV_COMBINED_TID:
>>>>>>> return 0;
>>>>>>> case SV_SAMPLE_POS:
>>>>>>> return 0; /* sample position is handled differently */
>>>>>>> --
>>>>>>> 2.14.4
>>>>>>>
>>>>>> _______________________________________________
>>>>>> mesa-dev mailing list
>>>>>> mesa-dev at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list