[Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

Tue Oct 11 13:21:24 UTC 2016

On 11.10.2016 07:36, Dave Airlie wrote:
> On 11 October 2016 at 12:13, Dave Airlie <airlied at gmail.com> wrote:
>> On 11 October 2016 at 11:42, Dave Airlie <airlied at gmail.com> wrote:
>>> On 11 October 2016 at 05:50, Dave Airlie <airlied at gmail.com> wrote:
>>>> On 10 October 2016 at 21:45, Arsenault, Matthew
>>>> <Matthew.Arsenault at amd.com> wrote:
>>>>> I don't like adding explicit IR arguments for ABI arguments, especially this
>>>>> one. Adding a special case for the first index feels dirty. The rest of llvm
>>>>> also won't be aware of the specialness of the argument. It would be
>>>>> problematic because bugpoint would eliminate the unused argument and then
>>>>> codegen would have to fail in some way when the argument is missing

That's a good point, but is there an alternative without burning two 
userdata SGPRs?

One possibility is to define an ABI that says:

1. SGPR0/1 points to an extra data region; it is reserved independently 
from the shader arguments.
2. The first 64 bits of that extra data region point to the scratch buffer.
3. The main shader code can retrieve SGPR0/1 using an intrinsic.

This can be made to look somewhat similar to what HSA does.

>>>> We should just hardcode the behaviour and switch both radv/radeonsi
>>>> over in one go?
>>>>
>>>> I'll try and code up, using the first 64-bits of the first buffer
>>>> pointed to by userdata 0/1,
>>>> to store things.
>>>
>>> I've looked at doing a dword fetch from the first two words of the 0/1 userdata,
>>>
>>> It's not optimal for vulkan unfortunately, since the idea I had was per command
>>> buffer I just allocate one scratch buffer of the size required at the end, and
>>> patch it in at the start of the command buffer. However in the first
>>> slot I was going
>>> to use the push constants/dynamic buffer to store the value, however it looks
>>> like I need to keep a list of everyone of these buffers I emit, and
>>> backpatch them
>>> all. It might not be too insane, just a slight bump in the keeping it simple.
>>
>> I'm probably losing te plot here, but I'm considering a double indirection,
>>
>> we load the 64-bit address from the first two dwords, then load the
>> 64-bits dword
>> from that address to get the value.
>>
>> This saves me allocating scratch bo's for secondary command buffers,
>> and also having to allocating ever increasing scratch bo's as shaders that
>> need more scratch get bound to the pipeline.
>> I'm not sure how much of an effect this should have for GL though.
>
> I've posted a patch to this affect to the llvm phabricator.
>
> It definitely is cleaner for the radv driver.

I still think it would be nice to have the level of indirection or 
whatever one wants to call it as a function attribute. This would allow 
you to change your mind about e.g. just sticking the scratch pointer 
directly into SGPR0/1. radeonsi and radv don't have to be identical in 
that regard.

Cheers
Nicolai