[Mesa-dev] [PATCH V2 10/12] i965/gen6: Enable the features required for GL_ARB_sample_shading

Anuj Phogat anuj.phogat at gmail.com
Tue Oct 29 03:50:32 CET 2013


On Mon, Oct 28, 2013 at 6:14 PM, Paul Berry <stereotype441 at gmail.com> wrote:

> On 25 October 2013 16:45, Anuj Phogat <anuj.phogat at gmail.com> wrote:
>
>> - Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
>>   GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
>> - Only enable one of GEN6_WM_8_DISPATCH_ENABLE or
>> GEN6_WM_16_DISPATCH_ENABLE
>>   when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
>>   Refer SNB PRM Vol. 2, Part 1, Page 279 for details.
>>
>> V2: - Add a comment explaining why only SIMD8 mode is enabled with
>>       MSDISPMODE_PERSAMPLE.
>>     - Use shared function _mesa_get_min_invocations_per_fragment().
>>     - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.
>>
>> Signed-off-by: Anuj Phogat <anuj.phogat at gmail.com>
>> ---
>>  src/mesa/drivers/dri/i965/gen6_wm_state.c | 52
>> +++++++++++++++++++++++++++++--
>>  1 file changed, 49 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> b/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> index e3395ce..25ecc11 100644
>> --- a/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> +++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c
>> @@ -30,6 +30,7 @@
>>  #include "brw_defines.h"
>>  #include "brw_util.h"
>>  #include "brw_wm.h"
>> +#include "program/program.h"
>>  #include "program/prog_parameter.h"
>>  #include "program/prog_statevars.h"
>>  #include "intel_batchbuffer.h"
>> @@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw)
>>     dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT;
>>
>>     /* CACHE_NEW_WM_PROG */
>> +
>> +   /* In case of non 1x (i.e 4x, 8x) multisampling with
>> MSDISPMODE_PERSAMPLE,
>> +    * only one of SIMD8 and SIMD16 should be enabled. So, we have two
>> options
>> +    * in above mentioned case:
>> +    * 'SIMD8 only' dispatch:   allowed on gen6.
>> +    * 'SIMD16 only' dispatch:  not allowed on gen6.
>> +    *
>> +    * So, we enable 'SIMD8 only' dispatch in above case.
>>
>
> I went back and re-read the docs, and I think it's fine to enable just
> SIMD8, but don't think it's correct to say that "SIMD16 only" is not
> allowed on gen6.
>
Yeah, I confused criteria E to be a 'SIMD16 only' dispatch which is
applicable for [DevIVB+] only. It's the criteria B which applies here and
allowed on SNB as well.
I'll update this comment.

>
> The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages >
> Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size)
> Control" shows:
>
> SIMD8 only with validity criterion A
> SIMD16 only with validity criterion B
> SIMD8+SIMD16 with validity criterion D
> ...skipping some irrelevant ones...
> SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't
> support SIMD32)
> ...skipping some more irrelevant ones...
>
> (note: the SNB and IVB PRMs show SIMD16 only with validity criterion F,
> but the end result is the same.)
>
> In the SNB and IVB PRM, Criteria A-F are defined right next to the table.
> In the current bspec, they're defined in 3DSTATE_WM and 3DSTATE_PS.  They
> say:
>
> A: Valid on all products.
> B: Valid on [DevCTG+].  Not valid on [DevSNB] if 4x PERPIXEL mode with
> pixel shader computed depth.
> C: Valid only on [DevCTG] and [DevIL]
> D: Valid on all products, except when in non-1x PERSAMPLE mode (applies to
> [DevSNB] only).  Not valid on [DevSNB] if 4x PERPIXEL mode with pixel
> shader computed depth.
>
This is what I see in the SNB and IVB spec. Notice "[DevSNB+]":
D:  Valid on all products, except when in non-1x PERSAMPLE mode (applies to
[DevSNB+] only).  Not valid on [DevSNB]if 4x PERPIXEL mode with pixel
shader computed depth.


>  E: Valid on all products, except when in PERSAMPLE mode with number of
> multisamples >= 8 (applies to [DevIVB+] only).  Not valid on [DevSNB] if 4x
> PERPIXEL mode with pixel shader computed depth.
> F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL mode
> with pixel shader computed depth.
>
>
> So the upshot of all this is that in non-1x PERSAMPLE mode:
>
> - On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one
> or the other.
>
right.

- On gen7, we can enable either SIMD8 or SIMD16 or both.
>
No,  we cannot enable both SIMD8 and SIMD16, but we can enable one or the
other. I get this error in simulator on IVB if I attempt to enable both:
If MSAA 4x with MDISPMODE_PERSAMPLE, SIMD8 should not be enabled with
SIMD16 or SIMD32

It hangs the gpu with 8x MSAA.

>
> (previously, we thought that 8x MSAA had to be SIMD8 only, but I believe
> that was a misunderstanding.  The restriction on 8x MSAA is only in
> validity criterion E, and that criterion only takes effect if we're doing
> SIMD16+SIMD32, which we never do.)
>
Right. We can enable 'SIMD16 only' dispatch for both 4x and 8x
multisampling. I'll update my 'TODO' comment in gen7_wm_state.c.
Thanks for finding this Paul.

>
> Please correct me if my logic is wrong.  I want to make sure we get to the
> bottom of this.
>
> Assuming your analysis agrees with mine, then the only reason we aren't
> doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a
> chance to get it working yet.  At a minimum, we need to update the comment
> to reflect that.  It might be worth adding some of the above analysis to
> the comment so that the next person to work on this code doesn't get as
> confused as we did.
>
I'll add more details in comment.

>
>
>> +    */
>>     dw5 |= GEN6_WM_8_DISPATCH_ENABLE;
>> -   if (brw->wm.prog_data->prog_offset_16)
>> +
>> +     if (brw->wm.prog_data->prog_offset_16 &&
>> +         !(_mesa_get_min_invocations_per_fragment(ctx,
>> +                                                  brw->fragment_program)
>> > 1))
>>
>
> This "if" statement is overly indented.  Also,
> !(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing.  Let's
> just do _mesa_get_min_invocations_per_fragment(...) == 1.
>
> (Note: this is assuming you agreed with me that we need to add "MAX2" in
> patch 4.  If we don't do that then "== 1" needs to change to "<= 1".)
>
Yes, this definitely looks better.


>
>>        dw5 |= GEN6_WM_16_DISPATCH_ENABLE;
>>
>>     /* CACHE_NEW_WM_PROG | _NEW_COLOR */
>> @@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw)
>>
>>     /* _NEW_COLOR, _NEW_MULTISAMPLE */
>>     if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||
>> -       ctx->Multisample.SampleAlphaToCoverage)
>> +       ctx->Multisample.SampleAlphaToCoverage ||
>> +       brw->wm.prog_data->uses_omask)
>>        dw5 |= GEN6_WM_KILL_ENABLE;
>>
>>     if (brw_color_buffer_write_enabled(brw) ||
>> @@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw)
>>        dw5 |= GEN6_WM_DISPATCH_ENABLE;
>>     }
>>
>> +   /* From the SNB PRM, volume 2 part 1, page 278:
>> +    * "This bit is inserted in the PS payload header and made available
>> to
>> +    * the DataPort (either via the message header or via header bypass)
>> to
>> +    * indicate that oMask data (one or two phases) is included in Render
>> +    * Target Write messages. If present, the oMask data is used to mask
>> off
>> +    * samples."
>> +    */
>> +    if(brw->wm.prog_data->uses_omask)
>> +      dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;
>> +
>>     /* CACHE_NEW_WM_PROG */
>>     dw6 |= brw->wm.prog_data->num_varying_inputs <<
>>        GEN6_WM_NUM_SF_OUTPUTS_SHIFT;
>> @@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw)
>>           dw6 |= GEN6_WM_MSRAST_ON_PATTERN;
>>        else
>>           dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>> -      dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>> +
>> +      if (_mesa_get_min_invocations_per_fragment(ctx,
>> brw->fragment_program) > 1)
>> +         dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>> +      else
>> +         dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;
>>     } else {
>>        dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;
>>        dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;
>>     }
>>
>> +   /* _NEW_BUFFERS, _NEW_MULTISAMPLE */
>> +   /* From the SNB PRM, volume 2 part 1, page 281:
>> +    * "If the PS kernel does not need the Position XY Offsets
>> +    * to compute a Position XY value, then this field should be
>> +    * programmed to POSOFFSET_NONE."
>> +    *
>> +    * "SW Recommendation: If the PS kernel needs the Position Offsets
>> +    * to compute a Position XY value, this field should match Position
>> +    * ZW Interpolation Mode to ensure a consistent position.xyzw
>> +    * computation."
>> +    * We only require XY sample offsets. So, this recommendation doesn't
>> +    * look useful at the moment. We might need this in future.
>> +    */
>> +   if (brw->wm.prog_data->uses_pos_offset)
>> +      dw6 |= GEN6_WM_POSOFFSET_SAMPLE;
>> +   else
>> +      dw6 |= GEN6_WM_POSOFFSET_NONE;
>> +
>>     BEGIN_BATCH(9);
>>     OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));
>>     OUT_BATCH(brw->wm.base.prog_offset);
>> --
>> 1.8.1.4
>>
>>
> With those changes, the patch is:
>
> Reviewed-by: Paul Berry <stereotype441 at gmail.com>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20131028/cd7c3f02/attachment-0001.html>


More information about the mesa-dev mailing list