[Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.

Mon Oct 10 11:45:05 UTC 2016

I don't like adding explicit IR arguments for ABI arguments, especially this one. Adding a special case for the first index feels dirty. The rest of llvm also won't be aware of the specialness of the argument. It would be problematic because bugpoint would eliminate the unused argument and then codegen would have to fail in some way when the argument is missing

_____________________________
From: Nicolai Hähnle <nhaehnle at gmail.com<mailto:nhaehnle at gmail.com>>
Sent: Monday, October 10, 2016 16:07
Subject: Re: [Mesa-dev] [PATCH 2/2] [RFC] radv: add scratch support for spilling.
To: Arsenault, Matthew <matthew.arsenault at amd.com<mailto:matthew.arsenault at amd.com>>, Dave Airlie <airlied at gmail.com<mailto:airlied at gmail.com>>, <mesa-dev at lists.freedesktop.org<mailto:mesa-dev at lists.freedesktop.org>>

On 10.10.2016 05:45, Dave Airlie wrote:
> On 10 October 2016 at 13:25, Dave Airlie <airlied at gmail.com<mailto:airlied at gmail.com>> wrote:
>> From: Dave Airlie <airlied at redhat.com<mailto:airlied at redhat.com>>
>>
>> This is a bit of a hack due to how llvm currently handles
>> spilling in it's shader ABI. Currently llvm amdgpu backend
>> uses relocations to patch the shader with the address of
>> the tmpring. The driver loads the shader and patches the
>> relocations.
>>
>> However for vulkan this doesn't work so well for a few reasons
>> a) when we build/load the shaders we aren't constructing the
>> command stream yet, and the same shader could be using in multiple
>> command streams.
>>
>> b) multiple command execution engines for compute shaders.
>>
>> So ideally we'd fix LLVM to understand the ABI convention, possibly
>> we'd fix it so user sgpr 0,1 are used (this hack uses 10/11).
>>
>> This patch when it gets the shader back from llvm it patches
>> the relocation dword to a nop, and patches to previous mov command
>> to move from SGPR 10 and 11. This works usually as it seems the
>> SGPR loading of the spill stuff is at the start of shaders always
>> so the 10/11 user sgprs haven't been trashed yet. I'm not 100%
>> sure this will work all the time, but for now this should allow
>> us to pass a bunch more CTS tests and make the Sascha computeshader
>> demo to work.
>
> So I found a shader that this doesn't work so well with unfortunately, so
> while I'd like this as a temporary solution I probably need to start
> digging into llvm.
>
> My current plan is to add a flag to llvm to denote ability to spill to
> the address
> in userdata sgpr 0/1, have llvm preload those and use them instead.
>
> Now the other question I have is, should I be killing two user data sgprs for
> this purpose, or should we define a better ABI, so that the first descriptor
> in the buffer pointed to by these is the scratch buffer, and other things
> could be queued after it, (like push constants and dynamic descriptor).
>
> Bas, nha? not sure if Matt is on this list.

Getting rid of the relocations would be nice for OpenGL as well. It
always feels like a bit of a hack.

For OpenGL, it's important that non-monolithic shaders work. In
practice, this means that it must be possible to pass the scratch buffer
pointer through the prolog shader into the main part.

It seems to me the simplest way to ensure this is to add an explicit
function argument. Something like:

define amdgpu_XX TYPE @prolog(i64 inreg %spillptr, ....)
"amdgpu-spill-ptr"="0" {
...
%retval = insertvalue TYPE %prev, i64 %spillptr, 0
ret TYPE %retval
}

amdgpu-spill-ptr would be a function attribute indicating which function
argument (by position) holds the base pointer for spilling.

I also think that loading the spill pointer indirectly makes a lot of
sense. After all, user data is scarce and most shaders don't spill. This
could easily fit in the function attribute scheme. Something like
"amdgpu-spill-ptr=1:16" to indicate that the second function argument is
a 64-bit pointer, and the spill pointer should be loaded from there with
a 16 byte offset.

FWIW, even with the indirect load, I think we should only load a 64-bit
pointer instead of a full descriptor. This is because we should really
set the size limit of the spill descriptor correctly to implement
robustness semantics (this is relevant because the spill descriptor is
also used for temporary arrays; not sure if Vulkan cares, but OpenGL
definitely does).

Cheers,
Nicolai

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161010/5c940ae6/attachment.html>