[Mesa-dev] [PATCH] nouveau: codegen: combineLd/St do not combine indirect loads

Hans de Goede hdegoede at redhat.com
Mon Apr 25 10:05:02 UTC 2016


Hi,

On 24-04-16 23:35, Ilia Mirkin wrote:
> On Fri, Apr 22, 2016 at 7:06 AM, Hans de Goede <hdegoede at redhat.com> wrote:

<snip>

>> folding indirect add into offset ld src 0x2dda328 ind0 0x27ca558 add def0
>> 0x27ca558 2
>> ref value 0x27ca558, indirect -1 -1 file 1 size 4
>> st  u32 # g[%r70+0x0] %r38 (0)
>> ref value 0x27ca558, indirect -1 -1 file 1 size 4
>>
>> Note how the uses unordered_set contains 2 ValueRefs pointing to getDef(0)
>> of the instruction which is not being eliminated as it should.
>>
>> The second reference does not have insn set, and is still present after
>> IndirectPropagation::visit() has done its work (I had a printf there
>> too in an earlier version of my debug printfs).
>>
>> I've a feeling I'm getting close to the problem, but I'm not familiar
>> enough with this bit of the code to figure out where the second ref
>> comes from, or how te remove it.
> .
> OK, so the issue here is that with TGSI there is no way to determine
> what the arguments or return values of a subroutine are. So nouveau
> figures it out implicitly - have a look at the
> Converter::BindArgumentsPass. It basically just takes all the
> registers that are read in, and makes them arguments, and then
> everything that's written are the return values. As such, it
> determines that those offset addresses are return values, and that's
> why you see
>
> SUB:25 (out %r76 %r72)
> BB:0 (20 instructions) - df = { }
>   -> BB:1 (cross)
>    0: rdsv u32 %r34 sv[CTAID:2] (0)
>    1: rdsv u32 %r36 sv[TID:2] (0)
>    2: mad u32 %r37 %r34 c15[0x8] %r36 (0)
>    3: ld  u64 { %r39 %r78 } c15[0x10] (0)
>    4: rdsv u32 %r42 sv[CTAID:1] (0)
>    5: mad u32 %r43 %r37 c15[0x10] %r42 (0)
>    6: rdsv u32 %r48 sv[TID:1] (0)
>    7: mad u32 %r49 %r43 c15[0x4] %r48 (0)
>    8: ld  u32 %r51 c15[0xc] (0)
>    9: rdsv u32 %r54 sv[CTAID:0] (0)
>   10: mad u32 %r55 %r49 c15[0xc] %r54 (0)
>   11: rdsv u32 %r60 sv[TID:0] (0)
>   12: mad u32 %r61 %r55 c15[0x0] %r60 (0)
>   13: mov u32 %r63 0x0000000c (0)
>   14: mad u32 %r68 %r61 %r63 c0[0x4] (0)
>   15: add u32 %r72 %r68 0x00000004 (0)
>   16: st  u64 # g[%r68+0x0] %r51 %r39 (0)
>   17: add u32 %r76 %r68 0x00000008 (0)
>   18: st  u32 # g[%r68+0x8] %r78 (0)
>   19: ret (0)
> BB:1 (0 instructions) - idom = BB:0, df = { }
>
> Note that the outs are %r72 and %r76. (Because they happen to end up
> being written last into TEMP[1].xy .) Not a great system, but within
> the confines of TGSI, pretty much all you can do. Adding an extension
> which adds a calling convention onto subroutines would fix this issue.

Thanks for the input. Exactly what I needed to know, for llvm
generated TGSI we always inline everything (or we should, I've
not tested this yet) the only reason there are multiple subs is
because this piglit test uses a .cl file with multiple kernels
in one file.

So I can fix this by simple declaring all the TEMP registers
as LOCAL, since they actually are local only :)

I've just given this a quick test and I can confirm that it
fixes things.

Regards,

Hans


More information about the mesa-dev mailing list