[Mesa-dev] [PATCH 0/5] Gallium: Passing drirc options to create_screen() and fixing Rocket League

Thu Jun 22 16:30:42 UTC 2017

Am 22.06.2017 um 18:22 schrieb Marek Olšák:
> On Thu, Jun 22, 2017 at 6:13 PM, Alex Smith <asmith at feralinteractive.com> wrote:
>> On 22 June 2017 at 15:52, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 22.06.2017 um 13:09 schrieb Nicolai Hähnle:
>>>> On 22.06.2017 10:14, Michel Dänzer wrote:
>>>>> On 22/06/17 04:34 PM, Nicolai Hähnle wrote:
>>>>>> On 22.06.2017 03:38, Rob Clark wrote:
>>>>>>> On Wed, Jun 21, 2017 at 8:15 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>>> On Wed, Jun 21, 2017 at 10:37 PM, Rob Clark <robdclark at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> On Tue, Jun 20, 2017 at 6:54 PM, Marek Olšák <maraeo at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> This series updates pipe loaders so that flags such as drirc options
>>>>>>>>>> can be passed to create_screen(). I have compile-tested everything
>>>>>>>>>> except clover.
>>>>>>>>>>
>>>>>>>>>> The first pipe_screen flag is a drirc option to fix incorrect grass
>>>>>>>>>> rendering in Rocket League for radeonsi. Rocket League expects
>>>>>>>>>> DirectX
>>>>>>>>>> behavior for partial derivative computations after discard/kill, but
>>>>>>>>>> radeonsi implements the more efficient but stricter OpenGL behavior
>>>>>>>>>> and that will remain our default behavior. The new screen flag
>>>>>>>>>> forces
>>>>>>>>>> radeonsi to use the DX behavior for that game.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> do we really want this to be a *global* option for the screen?
>>>>>>>>
>>>>>>>> Yes. Shaders are pipe_screen (global) objects in radeonsi, so a
>>>>>>>> compiler option also has to be global. We can't look at the context
>>>>>>>> during the TGSI->LLVM translation.
>>>>>>>
>>>>>>> well, I didn't really mean per-screen vs per-context, as much as
>>>>>>> per-screen vs per-shader (or maybe more per-screen vs
>>>>>>> per-instruction?)
>>>>>>
>>>>>> I honestly don't think it's worth the trouble. Applications that are
>>>>>> properly coded against GLSL can benefit from the relaxed semantics, and
>>>>>> applications that get it wrong in one shader are rather likely to get it
>>>>>> wrong everywhere.
>>>>>>
>>>>>> Since GLSL simply says derivatives are undefined after non-uniform
>>>>>> discard, and this option makes it defined instead, setting this flag can
>>>>>> never break the behavior of a correctly written shader.
>>>>>
>>>>> BTW, how expensive is the radeonsi workaround when it isn't needed?
>>>>>
>>>>> I'm starting to wonder if we shouldn't just make it always safe and call
>>>>> it a day, saving the trouble of identifying broken apps and plumbing the
>>>>> info through the API layers...
>>>>
>>>> As-is, the workaround can be *very* expensive in the worst case. A large
>>>> number of pixels could be disabled by a discard early in the shader, and
>>>> we're now moving the discard down, which means a lot of unnecessary
>>>> texture fetches may be happening.
>>>>
>>>> Also, I think I spoke too soon about this flag not having negative
>>>> effects: if a shader has an image/buffer write after a discard, that
>>>> write is now no longer disabled.
>>>>
>>>> A more efficient workaround can be done at the LLVM level by doing the
>>>> discard early, but then re-enabling WQM "relative to" the new set of
>>>> active pixels. It's a bit involved, especially when the discard itself
>>>> happens in a branch, and still a little more expensive, but it's an option.
>>>>
>>>
>>> I'm wondering what your driver for the other OS does (afaik dx10 is
>>> really the odd man out, all of glsl, spir-v, even metal have undefined
>>> derivatives after non-uniform discards). Thinking surely there must be
>>> something clever you could do...
>>
>> I'm wondering the same.
>>
>> This is an issue we come across from time to time, where a game's
>> shaders are expecting the D3D behaviour of derivatives remaining
>> defined post-discard. For this we usually do essentially what this
>> workaround is doing, just postpone the discard until the very end of
>> the shader.
>>
>> However it seems like doing this is less performant than the original
>> shaders running on D3D. One case I've seen had a big performance loss
>> against D3D when doing a delayed discard (which was being used early
>> in a complex shader to cull a lot of unneeded pixels), on both AMD and
>> NVIDIA.
>>
>> Given that, I've wondered whether there's something clever that the
>> D3D drivers are doing to optimise this. Maybe, for example, discarding
>> immediately if all pixels in a quad used for derivative calculations
>> get discarded? Is something like that possible on AMD hardware?
> 
> Yes, it's possible but not implemented in LLVM yet.
> 

Albeit if you'd wanted to do it correctly in the app, I'm not sure how
you could achieve that...

Roland