[Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

Sat Nov 11 03:09:25 UTC 2017

On Fri, Nov 10, 2017 at 9:58 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 10.11.2017 19:24, Connor Abbott wrote:
>>
>> On Fri, Nov 10, 2017 at 1:19 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>
>>> On Fri, Nov 10, 2017 at 6:55 PM, Nicolai Hähnle <nhaehnle at gmail.com>
>>> wrote:
>>>>
>>>> On 10.11.2017 18:43, Marek Olšák wrote:
>>>>>
>>>>>
>>>>> On Fri, Nov 10, 2017 at 2:09 AM, Connor Abbott <cwabbott0 at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 9, 2017 at 7:17 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 10, 2017 at 12:40 AM, Matt Arsenault <arsenm2 at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Nov 10, 2017, at 07:41, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This fixes the TCS gl_ClipDistance piglit failure that was
>>>>>>>>> uncovered
>>>>>>>>> by a recent LLVM change. The solution is to set volatile on loads
>>>>>>>>> and stores to enforce proper ordering.
>>>>>>>>>
>>>>>>>>> Please review.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Every LDS access certainly should not be volatile. This kills all
>>>>>>>> optimizations, like formation of ds_read2_b32. What ordering issue
>>>>>>>> are you
>>>>>>>> having?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It might be caused by inttoptr(NULL) that we do to declare LDS. There
>>>>>>> is simply no ordering enforced, which is weird.
>>>>>>
>>>>>>
>>>>>>
>>>>>> As soon as you do inttoptr(NULL), you've generated a poison value (in
>>>>>> LLVM legalese), so LLVM will assume that you never dereference it and
>>>>>> optimize accordingly. I think a GEP instruction without the inbounds
>>>>>> parameter set will get rid of the poison value, although I'm not sure
>>>>>> about the case where the offset is known to be zero. At least, that's
>>>>>> my reading of the langref text for the GEP instruction
>>>>>> (https://llvm.org/docs/LangRef.html#id215). If zero is a valid address
>>>>>> in LDS, then it sounds like LLVM needs to be fixed to disable this
>>>>>> optimization for certain address spaces. On the other hand, if you're
>>>>>> doing inttoptr(NULL) + offset, where "offset" is the result of a
>>>>>> ptrtoint somewhere, you should be doing inttoptr(offset) instead, and
>>>>>> then LLVM should never misbehave.
>>>>>
>>>>>
>>>>>
>>>>> I don't think that using inttoptr before every load and store would be
>>>>> better than volatile. The must be a better solution.
>>>>
>>>>
>>>>
>>>> Can't we just allocate the required LDS memory explicitly like we did
>>>> for
>>>> the LDS-based derivative computations?
>>>>
>>>> It may require shuffling around a bit how/when we calculate the required
>>>> sizes, but it doesn't seem impossible.
>>>
>>>
>>> We want to share the same declaration in TCS main and epilog parts.
>>>
>>> Does LLVM know that LDS declarations are pre-initialized?
>>> Do sized LDS declarations affect SIMD-occupancy-based optimizations?
>>> Because Mesa always declares 64kB of LDS and the real value is
>>> determined at runtime.
>>
>>
>> I don't know about the latter, but for the former, if you declare the
>> LDS variable as having external linkage, LLVM should assume that it
>> might be initialized beforehand -- exactly like a global non-static
>> variable in C.
>
>
> Makes sense.
>
> I don't think LLVM is really looking at LDS size too closely for anything,
> since LDS is per-thread group. But it's been a while since I checked.
>
> So just declaring a 64/32 KB memory block and then potentially not using all
> of it is probably fine and is probably the best short-term solution (if it
> works).
>
> It's a good point though that "shuffling around the computation of the
> required sizes" is potentially much more involved than I was thinking at
> first. It looks like if we wanted to be perfectly honest with LLVM about
> what's going on (and I believe we should, in the long run), we'd have to
> teach it a notion of "per-thread LDS memory". That requires more thought.

We can't tell LLVM the size of LDS, because we don't know it - it's
computed from 2 independent shaders (LS and HS, or ES and GS).

Marek