[Mesa-dev] [PATCH 1/2] radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs

Nicolai Hähnle nhaehnle at gmail.com
Thu Sep 7 15:25:03 UTC 2017


On 07.09.2017 16:56, Marek Olšák wrote:
>>>    +static bool si_out_of_order_rasterization(struct si_context *sctx)
>>> +{
>>> +       struct si_state_blend *blend = sctx->queued.named.blend;
>>> +       struct si_state_dsa *dsa = sctx->queued.named.dsa;
>>> +
>>> +       if (!sctx->screen->has_out_of_order_rast)
>>> +               return false;
>>> +
>>> +       /* PS with memory stores can't run out-of-order. */
>>> +       if (sctx->ps_shader.cso &&
>>> +           sctx->ps_shader.cso->info.writes_memory)
>>> +               return false;
>>
>>
>> I'm actually not sure this is necessary. The spec is quite relaxed about the
>> order of pixel shader invocations and whether they happen at all.
> 
> If DEPTH_BEFORE_SHADER is 1, depth tests have to be done in-order. If
> they are done out-of-order, pixel shader invocations that would
> normally be rejected can be executed needlessly. Does the spec allow
> that?

Good point.


> 
>>
>>
>>
>>> +
>>> +       unsigned colormask = sctx->framebuffer.colorbuf_enabled_4bit &
>>> +                            blend->cb_target_enabled_4bit;
>>> +
>>> +       /* No logic op. */
>>> +       if (colormask && blend->logicop_enable)
>>> +               return false;
>>> +
>>> +       struct r600_texture *zstex =
>>> +               (struct
>>> r600_texture*)sctx->framebuffer.state.zsbuf->texture;
>>> +       bool has_stencil = sctx->framebuffer.state.zsbuf &&
>>> +                          zstex->surface.flags & RADEON_SURF_SBUFFER;
>>> +       bool blend_enabled = (colormask & blend->blend_enable_4bit) != 0;
>>> +
>>> +       /* Out-of-order rasterization can be enabled for these cases:
>>> +        *
>>> +        * - color-only rendering:
>>> +        *   + blending must be enabled and commutative
>>> +        *   + only when inexact behavior due to rounding is allowed
>>> +        *
>>> +        * - depth-only rendering:
>>> +        *   + depth must force ordering
>>> +        *
>>> +        * - stencil-only rendering:
>>> +        *   + never --- can we do better here?
>>> +        *
>>> +        * - color rendering with read-only depth:
>>> +        *   + blending must be disabled
>>> +        *   + depth must force ordering
>>> +        *
>>> +        * - color rendering with read-only stencil:
>>> +        *   + blending must be disabled
>>> +        *
>>> +        * - color+depth rendering:
>>> +        *   + blending must be disabled
>>> +        *   + depth must force ordering
>>> +        *   + only when Z-fighting is allowed to result in inexact
>>> behavior
>>> +        *
>>> +        * - color+stencil rendering:
>>> +        *   + never --- can we do better here?
>>> +        *
>>> +        * - color+depth+stencil rendering:
>>> +        *   + never --- can we do better here?
>>> +        */
>>
>>
>> I can't quite wrap my head around the logic here.
>>
>> Here's a suggestion for cleaning it up conceptually:
>>
>> - Record in DSA whether DSA *by itself* can run out-of-order or not, meaning
>> that the final result in Z/S is unaffected by out-of-order
>> -- This is trivially the case when there are no Z/S writes
>> -- It is also the case when stencil writes are disabled and Zfunc is NEVER
>> or one of the ordered ones ("depth_forces_ordering", currently)
>> -- It is also the case when depth writes are disabled, Sfunc is ALWAYS, and
>> zpass_op/zfail_op are KEEP, ZERO, REPLACE, INVERT, INCR_WRAP, DECR_WRAP, or
>> Sfunc is NEVER and the same applies to fail_op [I think this allows
>> out-of-order to be enabled for stencil shadow passes]
>>
>> - Record in DSA whether the set of fragments passing DSA is unaffected by
>> out-of-order
>> -- This is trivially the case when there are no Z/S writes
>> -- It is the case when stencil writes are disabled and Zfunc is ALWAYS or
>> NEVER
>> -- It is the case when depth writes are disabled and Sfunc is ALWAYS or
>> NEVER
>>
>> - Record in DSA whether the *last* fragment passing DSA for each sample is
>> unaffected by out-of-order
>> -- This is *never* the case if we're being honest, but we can enable it in
>> an optional "aggressive" mode when stencil writes are disabled, Z writes are
>> enabled and Z func is one of the ordered functions
>>
>> The overall out-of-order enable is then:
>>
>> - if DSA by itself cannot run out-of-order, disable
>> - if color writes are disabled, enable
>> - if logic op is enabled, disable
>> - if blending is enabled:
>>   o disable if non-commutative
>>   o enable if commutative and the set of fragments passing DSA is unaffected
>> by out-of-rder
>> - if blending is disabled, enable iff the *last* fragment passing DSA is
>> unaffected
>>
>> Thinking it through in this way, I believe I discovered at least one bug in
>> the patch as-is, in the following configuration
>>
>> - blending disabled and no stencil
>> - depth is enabled, depth writes are disabled, and Zfunc == LESS
>>
>> In this case, dsa->depth_forces_ordering_color_on will be true and
>> out-of-order will be enabled. But that's not correct, because there may be
>> multiple triangles with Z-values less than whatever's in the depth buffer.
>>
>> On second thought, the whole "record in DSA" thing gets a bit more
>> complicated because it interacts with whether Z/S buffers are actually
>> present. The no-Z/S case is easy (the first two bits are "Yes", the last one
>> is "No"), but we need to distinguish whether stencil is present or not.
>> Maybe both of these can be pre-calculated and stored in DSA.
> 
> You seem to be much better at it than I am. If you wanna take a stab
> at it, here are both patches in the reversed order:
> https://cgit.freedesktop.org/~mareko/mesa/log/?h=out-of-order-rast

I'll take a stab at it.

Cheers,
Nicolai


> Marek
> 


-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


More information about the mesa-dev mailing list