[Mesa-dev] [PATCH 0/5] Gallium: Passing drirc options to create_screen() and fixing Rocket League

Thu Jun 22 16:13:48 UTC 2017

On 22 June 2017 at 15:52, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 22.06.2017 um 13:09 schrieb Nicolai Hähnle:
>> On 22.06.2017 10:14, Michel Dänzer wrote:
>>> On 22/06/17 04:34 PM, Nicolai Hähnle wrote:
>>>> On 22.06.2017 03:38, Rob Clark wrote:
>>>>> On Wed, Jun 21, 2017 at 8:15 PM, Marek Olšák <maraeo at gmail.com> wrote:
>>>>>> On Wed, Jun 21, 2017 at 10:37 PM, Rob Clark <robdclark at gmail.com>
>>>>>> wrote:
>>>>>>> On Tue, Jun 20, 2017 at 6:54 PM, Marek Olšák <maraeo at gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> This series updates pipe loaders so that flags such as drirc options
>>>>>>>> can be passed to create_screen(). I have compile-tested everything
>>>>>>>> except clover.
>>>>>>>>
>>>>>>>> The first pipe_screen flag is a drirc option to fix incorrect grass
>>>>>>>> rendering in Rocket League for radeonsi. Rocket League expects
>>>>>>>> DirectX
>>>>>>>> behavior for partial derivative computations after discard/kill, but
>>>>>>>> radeonsi implements the more efficient but stricter OpenGL behavior
>>>>>>>> and that will remain our default behavior. The new screen flag
>>>>>>>> forces
>>>>>>>> radeonsi to use the DX behavior for that game.
>>>>>>>>
>>>>>>>
>>>>>>> do we really want this to be a *global* option for the screen?
>>>>>>
>>>>>> Yes. Shaders are pipe_screen (global) objects in radeonsi, so a
>>>>>> compiler option also has to be global. We can't look at the context
>>>>>> during the TGSI->LLVM translation.
>>>>>
>>>>> well, I didn't really mean per-screen vs per-context, as much as
>>>>> per-screen vs per-shader (or maybe more per-screen vs
>>>>> per-instruction?)
>>>>
>>>> I honestly don't think it's worth the trouble. Applications that are
>>>> properly coded against GLSL can benefit from the relaxed semantics, and
>>>> applications that get it wrong in one shader are rather likely to get it
>>>> wrong everywhere.
>>>>
>>>> Since GLSL simply says derivatives are undefined after non-uniform
>>>> discard, and this option makes it defined instead, setting this flag can
>>>> never break the behavior of a correctly written shader.
>>>
>>> BTW, how expensive is the radeonsi workaround when it isn't needed?
>>>
>>> I'm starting to wonder if we shouldn't just make it always safe and call
>>> it a day, saving the trouble of identifying broken apps and plumbing the
>>> info through the API layers...
>>
>> As-is, the workaround can be *very* expensive in the worst case. A large
>> number of pixels could be disabled by a discard early in the shader, and
>> we're now moving the discard down, which means a lot of unnecessary
>> texture fetches may be happening.
>>
>> Also, I think I spoke too soon about this flag not having negative
>> effects: if a shader has an image/buffer write after a discard, that
>> write is now no longer disabled.
>>
>> A more efficient workaround can be done at the LLVM level by doing the
>> discard early, but then re-enabling WQM "relative to" the new set of
>> active pixels. It's a bit involved, especially when the discard itself
>> happens in a branch, and still a little more expensive, but it's an option.
>>
>
> I'm wondering what your driver for the other OS does (afaik dx10 is
> really the odd man out, all of glsl, spir-v, even metal have undefined
> derivatives after non-uniform discards). Thinking surely there must be
> something clever you could do...

I'm wondering the same.

This is an issue we come across from time to time, where a game's
shaders are expecting the D3D behaviour of derivatives remaining
defined post-discard. For this we usually do essentially what this
workaround is doing, just postpone the discard until the very end of
the shader.

However it seems like doing this is less performant than the original
shaders running on D3D. One case I've seen had a big performance loss
against D3D when doing a delayed discard (which was being used early
in a complex shader to cull a lot of unneeded pixels), on both AMD and
NVIDIA.

Given that, I've wondered whether there's something clever that the
D3D drivers are doing to optimise this. Maybe, for example, discarding
immediately if all pixels in a quad used for derivative calculations
get discarded? Is something like that possible on AMD hardware?

Alex

>
> Roland
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev