[Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

Tue Oct 11 13:30:42 UTC 2016

On 11.10.2016 15:21, Nicolai Hähnle wrote:
> On 11.10.2016 07:36, Dave Airlie wrote:
>> On 11 October 2016 at 12:13, Dave Airlie <airlied at gmail.com> wrote:
>>> On 11 October 2016 at 11:42, Dave Airlie <airlied at gmail.com> wrote:
>>>> On 11 October 2016 at 05:50, Dave Airlie <airlied at gmail.com> wrote:
>>>>> On 10 October 2016 at 21:45, Arsenault, Matthew
>>>>> <Matthew.Arsenault at amd.com> wrote:
>>>>>> I don't like adding explicit IR arguments for ABI arguments,
>>>>>> especially this
>>>>>> one. Adding a special case for the first index feels dirty. The
>>>>>> rest of llvm
>>>>>> also won't be aware of the specialness of the argument. It would be
>>>>>> problematic because bugpoint would eliminate the unused argument
>>>>>> and then
>>>>>> codegen would have to fail in some way when the argument is missing
>
> That's a good point, but is there an alternative without burning two
> userdata SGPRs?
>
> One possibility is to define an ABI that says:
>
> 1. SGPR0/1 points to an extra data region; it is reserved independently
> from the shader arguments.
> 2. The first 64 bits of that extra data region point to the scratch buffer.
> 3. The main shader code can retrieve SGPR0/1 using an intrinsic.
>
> This can be made to look somewhat similar to what HSA does.

For radeonsi this might break some of the optimizations that allow LLVM 
to move code around: if RW_BUFFERS are now loaded from a pointer 
returned by a new intrinsic, then the returned value should have the 
same speculability properties as today's argument. That's probably 
possible, but we need to keep it in mind.

Nicolai

>
>
>>>>> We should just hardcode the behaviour and switch both radv/radeonsi
>>>>> over in one go?
>>>>>
>>>>> I'll try and code up, using the first 64-bits of the first buffer
>>>>> pointed to by userdata 0/1,
>>>>> to store things.
>>>>
>>>> I've looked at doing a dword fetch from the first two words of the
>>>> 0/1 userdata,
>>>>
>>>> It's not optimal for vulkan unfortunately, since the idea I had was
>>>> per command
>>>> buffer I just allocate one scratch buffer of the size required at
>>>> the end, and
>>>> patch it in at the start of the command buffer. However in the first
>>>> slot I was going
>>>> to use the push constants/dynamic buffer to store the value, however
>>>> it looks
>>>> like I need to keep a list of everyone of these buffers I emit, and
>>>> backpatch them
>>>> all. It might not be too insane, just a slight bump in the keeping
>>>> it simple.
>>>
>>> I'm probably losing te plot here, but I'm considering a double
>>> indirection,
>>>
>>> we load the 64-bit address from the first two dwords, then load the
>>> 64-bits dword
>>> from that address to get the value.
>>>
>>> This saves me allocating scratch bo's for secondary command buffers,
>>> and also having to allocating ever increasing scratch bo's as shaders
>>> that
>>> need more scratch get bound to the pipeline.
>>> I'm not sure how much of an effect this should have for GL though.
>>
>> I've posted a patch to this affect to the llvm phabricator.
>>
>> It definitely is cleaner for the radv driver.
>
> I still think it would be nice to have the level of indirection or
> whatever one wants to call it as a function attribute. This would allow
> you to change your mind about e.g. just sticking the scratch pointer
> directly into SGPR0/1. radeonsi and radv don't have to be identical in
> that regard.
>
> Cheers
> Nicolai