[Mesa-dev] [PATCH V2 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading

Paul Berry stereotype441 at gmail.com
Tue Oct 29 03:00:40 CET 2013


On 28 October 2013 18:14, Paul Berry <stereotype441 at gmail.com> wrote:

> On 25 October 2013 16:45, Anuj Phogat <anuj.phogat at gmail.com> wrote:
>
>> - Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
>>   GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
>> - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or
>> GEN6_WM_16_DISPATCH_ENABLE
>>   when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
>>   Refer SNB PRM Vol. 2, Part 1, Page 279 for details.
>>
>> V2: - Add a comment explaining why only SIMD8 mode is enabled with
>>       MSDISPMODE_PERSAMPLE.
>>     - Use shared function _mesa_get_min_invocations_per_fragment().
>>     - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
>>
>> Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
>> ---
>>  src/mesa/drivers/dri/i965/gen6_wm_state.c | 52
>> +++++++++++++++++++++++++++++--
>>  1 file changed, 49 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> b/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> index e3395ce..25ecc11 100644
>> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> @@ -30,6 +30,7 @@
>>  #include "brw_defines.h"
>>  #include "brw_util.h"
>>  #include "brw_wm.h"
>> +#include "program/program.h"
>>  #include "program/prog_parameter.h"
>>  #include "program/prog_statevars.h"
>>  #include "intel_batchbuffer.h"
>> @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw)
>>     dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT;
>>
>>     /* CACHE_NEW_WM_PROG */
>> +
>> +   /* In case of non 1x (i.e 4x, 8x) multisampling with
>> MSDISPMODE_PERSAMPLE,
>> +    * only one of SIMD8 and SIMD16 should be enabled. So, we have two
>> options
>> +    * in above mentioned case:
>> +    * 'SIMD8 only' dispatch:   allowed on gen6.
>> +    * 'SIMD16 only' dispatch:  not allowed on gen6.
>> +    *
>> +    * So, we enable 'SIMD8 only' dispatch in above case.
>>
>
> I went back and re-read the docs, and I think it's fine to enable just
> SIMD8, but don't think it's correct to say that "SIMD16 only" is not
> allowed on gen6.
>
> The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages >
> Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size)
> Control" shows:
>
> SIMD8 only with validity criterion A
> SIMD16 only with validity criterion B
> SIMD8+SIMD16 with validity criterion D
> ...skipping some irrelevant ones...
> SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't
> support SIMD32)
> ...skipping some more irrelevant ones...
>
> (note: the SNB and IVB PRMs show SIMD16 only with validity criterion F,
> but the end result is the same.)
>
> In the SNB and IVB PRM, Criteria A-F are defined right next to the table.
> In the current bspec, they're defined in 3DSTATE_WM and 3DSTATE_PS.  They
> say:
>
> A: Valid on all products.
> B: Valid on [DevCTG+].  Not valid on [DevSNB] if 4x PERPIXEL mode with
> pixel shader computed depth.
> C: Valid only on [DevCTG] and [DevIL]
> D: Valid on all products, except when in non-1x PERSAMPLE mode (applies to
> [DevSNB] only).  Not valid on [DevSNB] if 4x PERPIXEL mode with pixel
> shader computed depth.
> E: Valid on all products, except when in PERSAMPLE mode with number of
> multisamples >= 8 (applies to [DevIVB+] only).  Not valid on [DevSNB] if 4x
> PERPIXEL mode with pixel shader computed depth.
> F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL mode
> with pixel shader computed depth.
>
>
> So the upshot of all this is that in non-1x PERSAMPLE mode:
>
> - On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one
> or the other.
> - On gen7, we can enable either SIMD8 or SIMD16 or both.
>
> (previously, we thought that 8x MSAA had to be SIMD8 only, but I believe
> that was a misunderstanding.  The restriction on 8x MSAA is only in
> validity criterion E, and that criterion only takes effect if we're doing
> SIMD16+SIMD32, which we never do.)
>
> Please correct me if my logic is wrong.  I want to make sure we get to the
> bottom of this.
>
> Assuming your analysis agrees with mine, then the only reason we aren't
> doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a
> chance to get it working yet.  At a minimum, we need to update the comment
> to reflect that.  It might be worth adding some of the above analysis to
> the comment so that the next person to work on this code doesn't get as
> confused as we did.
>

Two other notes about this:

1. Currently we aren't respecting the restrictions that say "Not valid on
[DevSNB] if 4x PERPIXEL mode with pixel shader computed depth."  I'm happy
to take care of this once Anuj's series lands.

2. Once we've verified that SIMD16 code generation works properly on Gen7
for gl_SampleID, gl_SamplePosition, and gl_SampleMask[], I believe we
should switch our Sandy Bridge logic so that in non-1x PERSAMPLE mode, we
do "SIMD16 only" dispatch if a SIMD16 shader was successfully compiled.  I
believe that in the vast majority of cases that will be more performant
than "SIMD8 only" dispatch.


>
>
>> +    */
>>     dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
>> -   if (brw->wm.prog_data->prog_offset_16)
>> +
>> +     if (brw->wm.prog_data->prog_offset_16 &&
>> +         !(_mesa_get_min_invocations_per_fragment(ctx,
>> +                                                  brw->fragment_program)
>> > 1))
>>
>
> This "if" statement is overly indented.  Also,
> !(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing.  Let's
> just do _mesa_get_min_invocations_per_fragment(...) == 1.
>
> (Note: this is assuming you agreed with me that we need to add "MAX2" in
> patch 4.  If we don't do that then "== 1" needs to change to "<= 1".)
>
>
>>        dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
>>
>>     /* CACHE_NEW_WM_PROG | _NEW_COLOR */
>> @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw)
>>
>>     /* _NEW_COLOR, _NEW_MULTISAMPLE */
>>     if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
>> -       ctx->Multisample.SampleAlphaToCoverage)
>> +       ctx->Multisample.SampleAlphaToCoverage ||
>> +       brw->wm.prog_data->uses_omask)
>>        dw5 |= GEN6_WM_KILL_ENABLE;
>>
>>     if (brw_color_buffer_write_enabled(brw) ||
>> @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw)
>>        dw5 |= GEN6_WM_DISPATCH_ENABLE;
>>     }
>>
>> +   /* From the SNB PRM, volume 2 part 1, page 278:
>> +    * "This bit is inserted in the PS payload header and made available
>> to
>> +    * the DataPort (either via the message header or via header bypass)
>> to
>> +    * indicate that oMask data (one or two phases) is included in Render
>> +    * Target Write messages. If present, the oMask data is used to mask
>> off
>> +    * samples."
>> +    */
>> +    if(brw->wm.prog_data->uses_omask)
>> +      dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
>> +
>>     /* CACHE_NEW_WM_PROG */
>>     dw6 |= brw->wm.prog_data->num_varying_inputs <<
>>        GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
>> @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw)
>>           dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
>>        else
>>           dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>> -      dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>> +
>> +      if (_mesa_get_min_invocations_per_fragment(ctx,
>> brw->fragment_program) > 1)
>> +         dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>> +      else
>> +         dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>>     } else {
>>        dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>>        dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>>     }
>>
>> +   /* _NEW_BUFFERS, _NEW_MULTISAMPLE */
>> +   /* From the SNB PRM, volume 2 part 1, page 281:
>> +    * "If the PS kernel does not need the Position XY Offsets
>> +    * to compute a Position XY value, then this field should be
>> +    * programmed to POSOFFSET_NONE."
>> +    *
>> +    * "SW Recommendation: If the PS kernel needs the Position Offsets
>> +    * to compute a Position XY value, this field should match Position
>> +    * ZW Interpolation Mode to ensure a consistent position.xyzw
>> +    * computation."
>> +    * We only require XY sample offsets. So, this recommendation doesn't
>> +    * look useful at the moment. We might need this in future.
>> +    */
>> +   if (brw->wm.prog_data->uses_pos_offset)
>> +      dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
>> +   else
>> +      dw6 |= GEN6_WM_POSOFFSET_NONE;
>> +
>>     BEGIN_BATCH(9);
>>     OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));
>>     OUT_BATCH(brw->wm.base.prog_offset);
>> --
>> 1.8.1.4
>>
>>
> With those changes, the patch is:
>
> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131028/be2e3243/attachment-0001.html>


More information about the mesa-dev mailing list