[Mesa-dev] [PATCH 0/5] Gallium: Passing drirc options to create_screen() and fixing Rocket League

Thu Jun 22 16:22:03 UTC 2017

On Thu, Jun 22, 2017 at 6:13 PM, Alex Smith <asmith at feralinteractive.com> wrote:
> On 22 June 2017 at 15:52, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 22.06.2017 um 13:09 schrieb Nicolai Hähnle:
>>> On 22.06.2017 10:14, Michel Dänzer wrote:
>>>> On 22/06/17 04:34 PM, Nicolai Hähnle wrote:
>>>>> On 22.06.2017 03:38, Rob Clark wrote:
>>>>>> On Wed, Jun 21, 2017 at 8:15 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>>> On Wed, Jun 21, 2017 at 10:37 PM, Rob Clark <robdclark at gmail.com>
>>>>>>> wrote:
>>>>>>>> On Tue, Jun 20, 2017 at 6:54 PM, Marek Olšák <maraeo at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> This series updates pipe loaders so that flags such as drirc options
>>>>>>>>> can be passed to create_screen(). I have compile-tested everything
>>>>>>>>> except clover.
>>>>>>>>>
>>>>>>>>> The first pipe_screen flag is a drirc option to fix incorrect grass
>>>>>>>>> rendering in Rocket League for radeonsi. Rocket League expects
>>>>>>>>> DirectX
>>>>>>>>> behavior for partial derivative computations after discard/kill, but
>>>>>>>>> radeonsi implements the more efficient but stricter OpenGL behavior
>>>>>>>>> and that will remain our default behavior. The new screen flag
>>>>>>>>> forces
>>>>>>>>> radeonsi to use the DX behavior for that game.
>>>>>>>>>
>>>>>>>>
>>>>>>>> do we really want this to be a *global* option for the screen?
>>>>>>>
>>>>>>> Yes. Shaders are pipe_screen (global) objects in radeonsi, so a
>>>>>>> compiler option also has to be global. We can't look at the context
>>>>>>> during the TGSI->LLVM translation.
>>>>>>
>>>>>> well, I didn't really mean per-screen vs per-context, as much as
>>>>>> per-screen vs per-shader (or maybe more per-screen vs
>>>>>> per-instruction?)
>>>>>
>>>>> I honestly don't think it's worth the trouble. Applications that are
>>>>> properly coded against GLSL can benefit from the relaxed semantics, and
>>>>> applications that get it wrong in one shader are rather likely to get it
>>>>> wrong everywhere.
>>>>>
>>>>> Since GLSL simply says derivatives are undefined after non-uniform
>>>>> discard, and this option makes it defined instead, setting this flag can
>>>>> never break the behavior of a correctly written shader.
>>>>
>>>> BTW, how expensive is the radeonsi workaround when it isn't needed?
>>>>
>>>> I'm starting to wonder if we shouldn't just make it always safe and call
>>>> it a day, saving the trouble of identifying broken apps and plumbing the
>>>> info through the API layers...
>>>
>>> As-is, the workaround can be *very* expensive in the worst case. A large
>>> number of pixels could be disabled by a discard early in the shader, and
>>> we're now moving the discard down, which means a lot of unnecessary
>>> texture fetches may be happening.
>>>
>>> Also, I think I spoke too soon about this flag not having negative
>>> effects: if a shader has an image/buffer write after a discard, that
>>> write is now no longer disabled.
>>>
>>> A more efficient workaround can be done at the LLVM level by doing the
>>> discard early, but then re-enabling WQM "relative to" the new set of
>>> active pixels. It's a bit involved, especially when the discard itself
>>> happens in a branch, and still a little more expensive, but it's an option.
>>>
>>
>> I'm wondering what your driver for the other OS does (afaik dx10 is
>> really the odd man out, all of glsl, spir-v, even metal have undefined
>> derivatives after non-uniform discards). Thinking surely there must be
>> something clever you could do...
>
> I'm wondering the same.
>
> This is an issue we come across from time to time, where a game's
> shaders are expecting the D3D behaviour of derivatives remaining
> defined post-discard. For this we usually do essentially what this
> workaround is doing, just postpone the discard until the very end of
> the shader.
>
> However it seems like doing this is less performant than the original
> shaders running on D3D. One case I've seen had a big performance loss
> against D3D when doing a delayed discard (which was being used early
> in a complex shader to cull a lot of unneeded pixels), on both AMD and
> NVIDIA.
>
> Given that, I've wondered whether there's something clever that the
> D3D drivers are doing to optimise this. Maybe, for example, discarding
> immediately if all pixels in a quad used for derivative calculations
> get discarded? Is something like that possible on AMD hardware?

Yes, it's possible but not implemented in LLVM yet.

Marek