[Mesa-dev] [PATCH V2 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading

Tue Oct 29 19:14:51 CET 2013

On Mon, Oct 28, 2013 at 7:00 PM, Paul Berry <stereotype441 at gmail.com> wrote:

> On 28 October 2013 18:14, Paul Berry <stereotype441 at gmail.com> wrote:
>
>> On 25 October 2013 16:45, Anuj Phogat <anuj.phogat at gmail.com> wrote:
>>
>>> - Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
>>>   GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
>>> - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or
>>> GEN6_WM_16_DISPATCH_ENABLE
>>>   when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
>>>   Refer SNB PRM Vol. 2, Part 1, Page 279 for details.
>>>
>>> V2: - Add a comment explaining why only SIMD8 mode is enabled with
>>>       MSDISPMODE_PERSAMPLE.
>>>     - Use shared function _mesa_get_min_invocations_per_fragment().
>>>     - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
>>>
>>> Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
>>> ---
>>>  src/mesa/drivers/dri/i965/gen6_wm_state.c | 52
>>> +++++++++++++++++++++++++++++--
>>>  1 file changed, 49 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c
>>> b/src/mesa/drivers/dri/i965/gen6_wm_state.c
>>> index e3395ce..25ecc11 100644
>>> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
>>> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
>>> @@ -30,6 +30,7 @@
>>>  #include "brw_defines.h"
>>>  #include "brw_util.h"
>>>  #include "brw_wm.h"
>>> +#include "program/program.h"
>>>  #include "program/prog_parameter.h"
>>>  #include "program/prog_statevars.h"
>>>  #include "intel_batchbuffer.h"
>>> @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw)
>>>     dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT;
>>>
>>>     /* CACHE_NEW_WM_PROG */
>>> +
>>> +   /* In case of non 1x (i.e 4x, 8x) multisampling with
>>> MSDISPMODE_PERSAMPLE,
>>> +    * only one of SIMD8 and SIMD16 should be enabled. So, we have two
>>> options
>>> +    * in above mentioned case:
>>> +    * 'SIMD8 only' dispatch:   allowed on gen6.
>>> +    * 'SIMD16 only' dispatch:  not allowed on gen6.
>>> +    *
>>> +    * So, we enable 'SIMD8 only' dispatch in above case.
>>>
>>
>> I went back and re-read the docs, and I think it's fine to enable just
>> SIMD8, but don't think it's correct to say that "SIMD16 only" is not
>> allowed on gen6.
>>
>> The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages
>> > Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size)
>> Control" shows:
>>
>> SIMD8 only with validity criterion A
>> SIMD16 only with validity criterion B
>> SIMD8+SIMD16 with validity criterion D
>> ...skipping some irrelevant ones...
>> SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't
>> support SIMD32)
>> ...skipping some more irrelevant ones...
>>
>> (note: the SNB and IVB PRMs show SIMD16 only with validity criterion F,
>> but the end result is the same.)
>>
>> In the SNB and IVB PRM, Criteria A-F are defined right next to the
>> table.  In the current bspec, they're defined in 3DSTATE_WM and
>> 3DSTATE_PS.  They say:
>>
>> A: Valid on all products.
>> B: Valid on [DevCTG+].  Not valid on [DevSNB] if 4x PERPIXEL mode with
>> pixel shader computed depth.
>> C: Valid only on [DevCTG] and [DevIL]
>> D: Valid on all products, except when in non-1x PERSAMPLE mode (applies
>> to [DevSNB] only).  Not valid on [DevSNB] if 4x PERPIXEL mode with pixel
>> shader computed depth.
>> E: Valid on all products, except when in PERSAMPLE mode with number of
>> multisamples >= 8 (applies to [DevIVB+] only).  Not valid on [DevSNB] if 4x
>> PERPIXEL mode with pixel shader computed depth.
>> F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL
>> mode with pixel shader computed depth.
>>
>>
>> So the upshot of all this is that in non-1x PERSAMPLE mode:
>>
>> - On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one
>> or the other.
>> - On gen7, we can enable either SIMD8 or SIMD16 or both.
>>
>> (previously, we thought that 8x MSAA had to be SIMD8 only, but I believe
>> that was a misunderstanding.  The restriction on 8x MSAA is only in
>> validity criterion E, and that criterion only takes effect if we're doing
>> SIMD16+SIMD32, which we never do.)
>>
>> Please correct me if my logic is wrong.  I want to make sure we get to
>> the bottom of this.
>>
>> Assuming your analysis agrees with mine, then the only reason we aren't
>> doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a
>> chance to get it working yet.  At a minimum, we need to update the comment
>> to reflect that.  It might be worth adding some of the above analysis to
>> the comment so that the next person to work on this code doesn't get as
>> confused as we did.
>>
>
> Two other notes about this:
>
> 1. Currently we aren't respecting the restrictions that say "Not valid on
> [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth."  I'm happy
> to take care of this once Anuj's series lands.
>
> 2. Once we've verified that SIMD16 code generation works properly on Gen7
> for gl_SampleID, gl_SamplePosition, and gl_SampleMask[], I believe we
> should switch our Sandy Bridge logic so that in non-1x PERSAMPLE mode, we
> do "SIMD16 only" dispatch if a SIMD16 shader was successfully compiled.  I
> believe that in the vast majority of cases that will be more performant
> than "SIMD8 only" dispatch.
>
I agree. This applies to both gen6 and gen7.  gl_SampleMask[] doesn't
require MSDISPMODE_PERSAMPLE. So, both SIMD8 and SIMD16 are already enabled
in case of  gl_SampleMask[].

>
>>
>>> +    */
>>>     dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
>>> -   if (brw->wm.prog_data->prog_offset_16)
>>> +
>>> +     if (brw->wm.prog_data->prog_offset_16 &&
>>> +         !(_mesa_get_min_invocations_per_fragment(ctx,
>>> +
>>>  brw->fragment_program) > 1))
>>>
>>
>> This "if" statement is overly indented.  Also,
>> !(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing.  Let's
>> just do _mesa_get_min_invocations_per_fragment(...) == 1.
>>
>> (Note: this is assuming you agreed with me that we need to add "MAX2" in
>> patch 4.  If we don't do that then "== 1" needs to change to "<= 1".)
>>
>>
>>>        dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
>>>
>>>     /* CACHE_NEW_WM_PROG | _NEW_COLOR */
>>> @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw)
>>>
>>>     /* _NEW_COLOR, _NEW_MULTISAMPLE */
>>>     if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
>>> -       ctx->Multisample.SampleAlphaToCoverage)
>>> +       ctx->Multisample.SampleAlphaToCoverage ||
>>> +       brw->wm.prog_data->uses_omask)
>>>        dw5 |= GEN6_WM_KILL_ENABLE;
>>>
>>>     if (brw_color_buffer_write_enabled(brw) ||
>>> @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw)
>>>        dw5 |= GEN6_WM_DISPATCH_ENABLE;
>>>     }
>>>
>>> +   /* From the SNB PRM, volume 2 part 1, page 278:
>>> +    * "This bit is inserted in the PS payload header and made available
>>> to
>>> +    * the DataPort (either via the message header or via header bypass)
>>> to
>>> +    * indicate that oMask data (one or two phases) is included in Render
>>> +    * Target Write messages. If present, the oMask data is used to mask
>>> off
>>> +    * samples."
>>> +    */
>>> +    if(brw->wm.prog_data->uses_omask)
>>> +      dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
>>> +
>>>     /* CACHE_NEW_WM_PROG */
>>>     dw6 |= brw->wm.prog_data->num_varying_inputs <<
>>>        GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
>>> @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw)
>>>           dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
>>>        else
>>>           dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>>> -      dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>>> +
>>> +      if (_mesa_get_min_invocations_per_fragment(ctx,
>>> brw->fragment_program) > 1)
>>> +         dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>>> +      else
>>> +         dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>>>     } else {
>>>        dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>>>        dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>>>     }
>>>
>>> +   /* _NEW_BUFFERS, _NEW_MULTISAMPLE */
>>> +   /* From the SNB PRM, volume 2 part 1, page 281:
>>> +    * "If the PS kernel does not need the Position XY Offsets
>>> +    * to compute a Position XY value, then this field should be
>>> +    * programmed to POSOFFSET_NONE."
>>> +    *
>>> +    * "SW Recommendation: If the PS kernel needs the Position Offsets
>>> +    * to compute a Position XY value, this field should match Position
>>> +    * ZW Interpolation Mode to ensure a consistent position.xyzw
>>> +    * computation."
>>> +    * We only require XY sample offsets. So, this recommendation doesn't
>>> +    * look useful at the moment. We might need this in future.
>>> +    */
>>> +   if (brw->wm.prog_data->uses_pos_offset)
>>> +      dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
>>> +   else
>>> +      dw6 |= GEN6_WM_POSOFFSET_NONE;
>>> +
>>>     BEGIN_BATCH(9);
>>>     OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));
>>>     OUT_BATCH(brw->wm.base.prog_offset);
>>> --
>>> 1.8.1.4
>>>
>>>
>> With those changes, the patch is:
>>
>> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131029/83f2a2e4/attachment.html>