[Mesa-dev] [PATCH V2 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading

Paul Berry stereotype441 at gmail.com
Tue Oct 29 02:14:08 CET 2013


On 25 October 2013 16:45, Anuj Phogat <anuj.phogat at gmail.com> wrote:

> - Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
>   GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
> - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or
> GEN6_WM_16_DISPATCH_ENABLE
>   when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
>   Refer SNB PRM Vol. 2, Part 1, Page 279 for details.
>
> V2: - Add a comment explaining why only SIMD8 mode is enabled with
>       MSDISPMODE_PERSAMPLE.
>     - Use shared function _mesa_get_min_invocations_per_fragment().
>     - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
>
> Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
> ---
>  src/mesa/drivers/dri/i965/gen6_wm_state.c | 52
> +++++++++++++++++++++++++++++--
>  1 file changed, 49 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c
> b/src/mesa/drivers/dri/i965/gen6_wm_state.c
> index e3395ce..25ecc11 100644
> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
> @@ -30,6 +30,7 @@
>  #include "brw_defines.h"
>  #include "brw_util.h"
>  #include "brw_wm.h"
> +#include "program/program.h"
>  #include "program/prog_parameter.h"
>  #include "program/prog_statevars.h"
>  #include "intel_batchbuffer.h"
> @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw)
>     dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT;
>
>     /* CACHE_NEW_WM_PROG */
> +
> +   /* In case of non 1x (i.e 4x, 8x) multisampling with
> MSDISPMODE_PERSAMPLE,
> +    * only one of SIMD8 and SIMD16 should be enabled. So, we have two
> options
> +    * in above mentioned case:
> +    * 'SIMD8 only' dispatch:   allowed on gen6.
> +    * 'SIMD16 only' dispatch:  not allowed on gen6.
> +    *
> +    * So, we enable 'SIMD8 only' dispatch in above case.
>

I went back and re-read the docs, and I think it's fine to enable just
SIMD8, but don't think it's correct to say that "SIMD16 only" is not
allowed on gen6.

The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages >
Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size)
Control" shows:

SIMD8 only with validity criterion A
SIMD16 only with validity criterion B
SIMD8+SIMD16 with validity criterion D
...skipping some irrelevant ones...
SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't
support SIMD32)
...skipping some more irrelevant ones...

(note: the SNB and IVB PRMs show SIMD16 only with validity criterion F, but
the end result is the same.)

In the SNB and IVB PRM, Criteria A-F are defined right next to the table.
In the current bspec, they're defined in 3DSTATE_WM and 3DSTATE_PS.  They
say:

A: Valid on all products.
B: Valid on [DevCTG+].  Not valid on [DevSNB] if 4x PERPIXEL mode with
pixel shader computed depth.
C: Valid only on [DevCTG] and [DevIL]
D: Valid on all products, except when in non-1x PERSAMPLE mode (applies to
[DevSNB] only).  Not valid on [DevSNB] if 4x PERPIXEL mode with pixel
shader computed depth.
E: Valid on all products, except when in PERSAMPLE mode with number of
multisamples >= 8 (applies to [DevIVB+] only).  Not valid on [DevSNB] if 4x
PERPIXEL mode with pixel shader computed depth.
F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL mode
with pixel shader computed depth.


So the upshot of all this is that in non-1x PERSAMPLE mode:

- On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one or
the other.
- On gen7, we can enable either SIMD8 or SIMD16 or both.

(previously, we thought that 8x MSAA had to be SIMD8 only, but I believe
that was a misunderstanding.  The restriction on 8x MSAA is only in
validity criterion E, and that criterion only takes effect if we're doing
SIMD16+SIMD32, which we never do.)

Please correct me if my logic is wrong.  I want to make sure we get to the
bottom of this.

Assuming your analysis agrees with mine, then the only reason we aren't
doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a
chance to get it working yet.  At a minimum, we need to update the comment
to reflect that.  It might be worth adding some of the above analysis to
the comment so that the next person to work on this code doesn't get as
confused as we did.


> +    */
>     dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
> -   if (brw->wm.prog_data->prog_offset_16)
> +
> +     if (brw->wm.prog_data->prog_offset_16 &&
> +         !(_mesa_get_min_invocations_per_fragment(ctx,
> +                                                  brw->fragment_program)
> > 1))
>

This "if" statement is overly indented.  Also,
!(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing.  Let's
just do _mesa_get_min_invocations_per_fragment(...) == 1.

(Note: this is assuming you agreed with me that we need to add "MAX2" in
patch 4.  If we don't do that then "== 1" needs to change to "<= 1".)


>        dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
>
>     /* CACHE_NEW_WM_PROG | _NEW_COLOR */
> @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw)
>
>     /* _NEW_COLOR, _NEW_MULTISAMPLE */
>     if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
> -       ctx->Multisample.SampleAlphaToCoverage)
> +       ctx->Multisample.SampleAlphaToCoverage ||
> +       brw->wm.prog_data->uses_omask)
>        dw5 |= GEN6_WM_KILL_ENABLE;
>
>     if (brw_color_buffer_write_enabled(brw) ||
> @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw)
>        dw5 |= GEN6_WM_DISPATCH_ENABLE;
>     }
>
> +   /* From the SNB PRM, volume 2 part 1, page 278:
> +    * "This bit is inserted in the PS payload header and made available to
> +    * the DataPort (either via the message header or via header bypass) to
> +    * indicate that oMask data (one or two phases) is included in Render
> +    * Target Write messages. If present, the oMask data is used to mask
> off
> +    * samples."
> +    */
> +    if(brw->wm.prog_data->uses_omask)
> +      dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
> +
>     /* CACHE_NEW_WM_PROG */
>     dw6 |= brw->wm.prog_data->num_varying_inputs <<
>        GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
> @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw)
>           dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
>        else
>           dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
> -      dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
> +
> +      if (_mesa_get_min_invocations_per_fragment(ctx,
> brw->fragment_program) > 1)
> +         dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
> +      else
> +         dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>     } else {
>        dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>        dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>     }
>
> +   /* _NEW_BUFFERS, _NEW_MULTISAMPLE */
> +   /* From the SNB PRM, volume 2 part 1, page 281:
> +    * "If the PS kernel does not need the Position XY Offsets
> +    * to compute a Position XY value, then this field should be
> +    * programmed to POSOFFSET_NONE."
> +    *
> +    * "SW Recommendation: If the PS kernel needs the Position Offsets
> +    * to compute a Position XY value, this field should match Position
> +    * ZW Interpolation Mode to ensure a consistent position.xyzw
> +    * computation."
> +    * We only require XY sample offsets. So, this recommendation doesn't
> +    * look useful at the moment. We might need this in future.
> +    */
> +   if (brw->wm.prog_data->uses_pos_offset)
> +      dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
> +   else
> +      dw6 |= GEN6_WM_POSOFFSET_NONE;
> +
>     BEGIN_BATCH(9);
>     OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));
>     OUT_BATCH(brw->wm.base.prog_offset);
> --
> 1.8.1.4
>
>
With those changes, the patch is:

Reviewed-by: Paul Berry <stereotype441 at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131028/1e37b8e5/attachment-0001.html>


More information about the mesa-dev mailing list