[Mesa-dev] [PATCH 0/5] Volatile and invariant LDS memory ops

Mon Nov 13 14:50:22 UTC 2017

On Mon, Nov 13, 2017 at 9:31 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 11.11.2017 04:09, Marek Olšák wrote:
>>
>> On Fri, Nov 10, 2017 at 9:58 PM, Nicolai Hähnle <nhaehnle at gmail.com>
>> wrote:
>>>
>>> On 10.11.2017 19:24, Connor Abbott wrote:
>>>>
>>>>
>>>> On Fri, Nov 10, 2017 at 1:19 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>
>>>>>
>>>>> On Fri, Nov 10, 2017 at 6:55 PM, Nicolai Hähnle <nhaehnle at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 10.11.2017 18:43, Marek Olšák wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Nov 10, 2017 at 2:09 AM, Connor Abbott <cwabbott0 at gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 9, 2017 at 7:17 PM, Marek Olšák <maraeo at gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Nov 10, 2017 at 12:40 AM, Matt Arsenault
>>>>>>>>> <arsenm2 at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Nov 10, 2017, at 07:41, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> This fixes the TCS gl_ClipDistance piglit failure that was
>>>>>>>>>>> uncovered
>>>>>>>>>>> by a recent LLVM change. The solution is to set volatile on loads
>>>>>>>>>>> and stores to enforce proper ordering.
>>>>>>>>>>>
>>>>>>>>>>> Please review.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Every LDS access certainly should not be volatile. This kills all
>>>>>>>>>> optimizations, like formation of ds_read2_b32. What ordering issue
>>>>>>>>>> are you
>>>>>>>>>> having?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It might be caused by inttoptr(NULL) that we do to declare LDS.
>>>>>>>>> There
>>>>>>>>> is simply no ordering enforced, which is weird.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> As soon as you do inttoptr(NULL), you've generated a poison value
>>>>>>>> (in
>>>>>>>> LLVM legalese), so LLVM will assume that you never dereference it
>>>>>>>> and
>>>>>>>> optimize accordingly. I think a GEP instruction without the inbounds
>>>>>>>> parameter set will get rid of the poison value, although I'm not
>>>>>>>> sure
>>>>>>>> about the case where the offset is known to be zero. At least,
>>>>>>>> that's
>>>>>>>> my reading of the langref text for the GEP instruction
>>>>>>>> (https://llvm.org/docs/LangRef.html#id215). If zero is a valid
>>>>>>>> address
>>>>>>>> in LDS, then it sounds like LLVM needs to be fixed to disable this
>>>>>>>> optimization for certain address spaces. On the other hand, if
>>>>>>>> you're
>>>>>>>> doing inttoptr(NULL) + offset, where "offset" is the result of a
>>>>>>>> ptrtoint somewhere, you should be doing inttoptr(offset) instead,
>>>>>>>> and
>>>>>>>> then LLVM should never misbehave.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I don't think that using inttoptr before every load and store would
>>>>>>> be
>>>>>>> better than volatile. The must be a better solution.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Can't we just allocate the required LDS memory explicitly like we did
>>>>>> for
>>>>>> the LDS-based derivative computations?
>>>>>>
>>>>>> It may require shuffling around a bit how/when we calculate the
>>>>>> required
>>>>>> sizes, but it doesn't seem impossible.
>>>>>
>>>>>
>>>>>
>>>>> We want to share the same declaration in TCS main and epilog parts.
>>>>>
>>>>> Does LLVM know that LDS declarations are pre-initialized?
>>>>> Do sized LDS declarations affect SIMD-occupancy-based optimizations?
>>>>> Because Mesa always declares 64kB of LDS and the real value is
>>>>> determined at runtime.
>>>>
>>>>
>>>>
>>>> I don't know about the latter, but for the former, if you declare the
>>>> LDS variable as having external linkage, LLVM should assume that it
>>>> might be initialized beforehand -- exactly like a global non-static
>>>> variable in C.
>>>
>>>
>>>
>>> Makes sense.
>>>
>>> I don't think LLVM is really looking at LDS size too closely for
>>> anything,
>>> since LDS is per-thread group. But it's been a while since I checked.
>>>
>>> So just declaring a 64/32 KB memory block and then potentially not using
>>> all
>>> of it is probably fine and is probably the best short-term solution (if
>>> it
>>> works).
>>>
>>> It's a good point though that "shuffling around the computation of the
>>> required sizes" is potentially much more involved than I was thinking at
>>> first. It looks like if we wanted to be perfectly honest with LLVM about
>>> what's going on (and I believe we should, in the long run), we'd have to
>>> teach it a notion of "per-thread LDS memory". That requires more thought.
>>
>>
>> We can't tell LLVM the size of LDS, because we don't know it - it's
>> computed from 2 independent shaders (LS and HS, or ES and GS).
>
>
> That's true, but purely as a short-term solution, can't we tell LLVM the
> size of LDS is 64KB anyway instead of the inttoptr(0), even if we're not
> going to use it all? What goes wrong in that approach?

This: "Do sized LDS declarations affect SIMD-occupancy-based optimizations?"

And shader parts should be allowed to (re)declare the same LDS.

BTW, the max LDS size for non-pixel non-compute shaders is 32KB.

Marek