[Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

Tue Oct 11 16:22:41 UTC 2016

On 11.10.2016 16:36, Arsenault, Matthew wrote:
> Why does the IR need access to the scratch pointer at all? I don't
> understand what problems the explicit IR argument is supposed to solve
> or why an intrinsic would be needed. The value is only needed for
> lowering the access to scratch, the user program shouldn't need to be
> aware it exists

The IR doesn't need access to the scratch pointer, but userdata 
registers are a scarce resource and it would be useful to be able to 
avoid wasting two full userdata SGPRs on _only_ the scratch pointer when 
(a) most shaders don't even need it, and (b) we already have a pair of 
SGPRs that points to a table of mostly global data (e.g. tessellation 
rings, streamout, polygon stippling mask, ...). It is perfectly natural 
to stick the scratch buffer into that table, but the IR needs to be able 
to access everything else in the table.

Cheers,
Nicolai

> On Tue, Oct 11, 2016 at 9:30 PM +0800, "Nicolai Hähnle"
> <nhaehnle at gmail.com <mailto:nhaehnle at gmail.com>> wrote:
>
> On 11.10.2016 15:21, Nicolai Hähnle wrote:
>> On 11.10.2016 07:36, Dave Airlie wrote:
>>> On 11 October 2016 at 12:13, Dave Airlie <airlied at gmail.com> wrote:
>>>> On 11 October 2016 at 11:42, Dave Airlie <airlied at gmail.com> wrote:
>>>>> On 11 October 2016 at 05:50, Dave Airlie <airlied at gmail.com> wrote:
>>>>>> On 10 October 2016 at 21:45, Arsenault, Matthew
>>>>>> <Matthew.Arsenault at amd.com> wrote:
>>>>>>> I don't like adding explicit IR arguments for ABI arguments,
>>>>>>> especially this
>>>>>>> one. Adding a special case for the first index feels dirty. The
>>>>>>> rest of llvm
>>>>>>> also won't be aware of the specialness of the argument. It would be
>>>>>>> problematic because bugpoint would eliminate the unused argument
>>>>>>> and then
>>>>>>> codegen would have to fail in some way when the argument is missing
>>
>> That's a good point, but is there an alternative without burning two
>> userdata SGPRs?
>>
>> One possibility is to define an ABI that says:
>>
>> 1. SGPR0/1 points to an extra data region; it is reserved independently
>> from the shader arguments.
>> 2. The first 64 bits of that extra data region point to the scratch buffer.
>> 3. The main shader code can retrieve SGPR0/1 using an intrinsic.
>>
>> This can be made to look somewhat similar to what HSA does.
>
> For radeonsi this might break some of the optimizations that allow LLVM
> to move code around: if RW_BUFFERS are now loaded from a pointer
> returned by a new intrinsic, then the returned value should have the
> same speculability properties as today's argument. That's probably
> possible, but we need to keep it in mind.
>
> Nicolai
>
>>
>>
>>>>>> We should just hardcode the behaviour and switch both radv/radeonsi
>>>>>> over in one go?
>>>>>>
>>>>>> I'll try and code up, using the first 64-bits of the first buffer
>>>>>> pointed to by userdata 0/1,
>>>>>> to store things.
>>>>>
>>>>> I've looked at doing a dword fetch from the first two words of the
>>>>> 0/1 userdata,
>>>>>
>>>>> It's not optimal for vulkan unfortunately, since the idea I had was
>>>>> per command
>>>>> buffer I just allocate one scratch buffer of the size required at
>>>>> the end, and
>>>>> patch it in at the start of the command buffer. However in the first
>>>>> slot I was going
>>>>> to use the push constants/dynamic buffer to store the value, however
>>>>> it looks
>>>>> like I need to keep a list of everyone of these buffers I emit, and
>>>>> backpatch them
>>>>> all. It might not be too insane, just a slight bump in the keeping
>>>>> it simple.
>>>>
>>>> I'm probably losing te plot here, but I'm considering a double
>>>> indirection,
>>>>
>>>> we load the 64-bit address from the first two dwords, then load the
>>>> 64-bits dword
>>>> from that address to get the value.
>>>>
>>>> This saves me allocating scratch bo's for secondary command buffers,
>>>> and also having to allocating ever increasing scratch bo's as shaders
>>>> that
>>>> need more scratch get bound to the pipeline.
>>>> I'm not sure how much of an effect this should have for GL though.
>>>
>>> I've posted a patch to this affect to the llvm phabricator.
>>>
>>> It definitely is cleaner for the radv driver.
>>
>> I still think it would be nice to have the level of indirection or
>> whatever one wants to call it as a function attribute. This would allow
>> you to change your mind about e.g. just sticking the scratch pointer
>> directly into SGPR0/1. radeonsi and radv don't have to be identical in
>> that regard.
>>
>> Cheers
>> Nicolai