[Mesa-dev] [PATCH] nouveau: codegen: combineLd/St do not combine indirect loads

Thu Apr 21 15:18:11 UTC 2016

On Thu, Apr 21, 2016 at 11:15 AM, Ilia Mirkin <imirkin at alum.mit.edu> wrote:
> On Thu, Apr 21, 2016 at 10:46 AM, Hans de Goede <hdegoede at redhat.com> wrote:
>> Hi,
>>
>> On 21-04-16 16:28, Ilia Mirkin wrote:
>>>
>>> On Thu, Apr 21, 2016 at 9:55 AM, Hans de Goede <hdegoede at redhat.com>
>>> wrote:
>>>>
>>>> combineLd/St would combine, i.e. :
>>>>
>>>> st  u32 # g[$r2+0x0] $r2
>>>> st  u32 # g[$r2+0x4] $r3
>>>>
>>>> into:
>>>>
>>>> st  u64 # g[$r2+0x0] $r2d
>>>>
>>>> But this is only valid if r2 contains an 8 byte aligned address,
>>>> which is unknown.
>>>>
>>>> This commit checks for src0 dim 0 not being indirect when combining
>>>> loads / stores as combining indirect loads / stores may break alignment
>>>> rules.
>>>
>>>
>>> I believe the assumption is that all indirect addresses are 16-byte
>>> aligned. This works out for GL, I think. Although hm... I wonder what
>>> happens if you have a
>>>
>>> layout (std430) buffer foo {
>>>    int x[16];
>>> }
>>>
>>> And you access x[i], x[i+1], and i == 1. I think we end up doing a ton
>>> of size-based validation which might avoid the problem.
>>>
>>> My concern is that now constbufs will get the same treatment, and for
>>> constbufs the alignment is always 16 :(
>>>
>>> What do you think? Just drop those, or add extra conditionals to allow
>>> it for constbufs?
>>
>>
>> I'm not sure we've the alignment guarantee for constbufs, IIRC we lower
>> const buf accesses to be indirect because we want to provide more then 8
>> UBO-s,
>> right ? So we read the offset from NVC0_CB_AUX_BUF_INFO and then end up with
>> e.g.:
>>
>> ld u64  r2d c7[r1+0x0]
>>
>> Where r1 contains the offset of the user-buf. But what if the user is
>> somehow
>> indirectly accessing the userbuf, then we will have added that indirect
>> offset
>> to r1, and we can no longer assume that we can safely merge the loads
>> without
>> breaking alignment rules.
>>
>> I hope I'm making sense here, I'm still a bit unsure about the details how
>> this all works.
>
> Yes, that can happen, except... it can't. The addresses are 64-bit,
> and this only happens on Kepler+, which limits constbuf loads to
> 64-bit at a time (no 128-bit loads for some unknown reason -- I've
> tried to enable it, and it indeed doesn't work, even though nvdisasm
> is perfectly happy with it). So ... it can't happen. But in principle
> you're right :)

The only way this can happen is if I'm wrong about how std140 packing
works. Which isn't too crazy of a bet to make :) The question is what
happens if you have

layout (std140) uniform buffer {
  int x[16];
}

Is that laid out with a stride of 4, or 16? If 16, we're all good. If
4, then you're right and we can't make that guarantee. I'm pretty sure
the answer is 16, since otherwise I don't see how the TGSI would work
out. And I'm fairly sure we get this sort of thing right. [And std430
packing can only be used on SSBO's, not on UBO's.]

  -ilia