<div dir="ltr">On 28 October 2013 18:14, Paul Berry <span dir="ltr"><<a href="mailto:stereotype441@gmail.com" target="_blank">stereotype441@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div><div class="h5">On 25 October 2013 16:45, Anuj Phogat <span dir="ltr"><<a href="mailto:anuj.phogat@gmail.com" target="_blank">anuj.phogat@gmail.com</a>></span> wrote:<br>
</div></div><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,<br>
GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.<br>
- Only enable one of GEN6_WM_8_DISPATCH_ENABLE or GEN6_WM_16_DISPATCH_ENABLE<br>
when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.<br>
Refer SNB PRM Vol. 2, Part 1, Page 279 for details.<br>
<br>
V2: - Add a comment explaining why only SIMD8 mode is enabled with<br>
MSDISPMODE_PERSAMPLE.<br>
- Use shared function _mesa_get_min_invocations_per_fragment().<br>
- Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.<br>
<br>
Signed-off-by: Anuj Phogat <<a href="mailto:anuj.phogat@gmail.com" target="_blank">anuj.phogat@gmail.com</a>><br>
---<br>
src/mesa/drivers/dri/i965/gen6_wm_state.c | 52 +++++++++++++++++++++++++++++--<br>
1 file changed, 49 insertions(+), 3 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/gen6_wm_state.c b/src/mesa/drivers/dri/i965/gen6_wm_state.c<br>
index e3395ce..25ecc11 100644<br>
--- a/src/mesa/drivers/dri/i965/gen6_wm_state.c<br>
+++ b/src/mesa/drivers/dri/i965/gen6_wm_state.c<br>
@@ -30,6 +30,7 @@<br>
#include "brw_defines.h"<br>
#include "brw_util.h"<br>
#include "brw_wm.h"<br>
+#include "program/program.h"<br>
#include "program/prog_parameter.h"<br>
#include "program/prog_statevars.h"<br>
#include "intel_batchbuffer.h"<br>
@@ -153,8 +154,20 @@ upload_wm_state(struct brw_context *brw)<br>
dw5 |= (brw->max_wm_threads - 1) << GEN6_WM_MAX_THREADS_SHIFT;<br>
<br>
/* CACHE_NEW_WM_PROG */<br>
+<br>
+ /* In case of non 1x (i.e 4x, 8x) multisampling with MSDISPMODE_PERSAMPLE,<br>
+ * only one of SIMD8 and SIMD16 should be enabled. So, we have two options<br>
+ * in above mentioned case:<br>
+ * 'SIMD8 only' dispatch: allowed on gen6.<br>
+ * 'SIMD16 only' dispatch: not allowed on gen6.<br>
+ *<br>
+ * So, we enable 'SIMD8 only' dispatch in above case.<br></blockquote><div><br></div></div></div><div>I went back and re-read the docs, and I think it's fine to enable just SIMD8, but don't think it's correct to say that "SIMD16 only" is not allowed on gen6.<br>
<br></div><div>The table in "Graphics BSpec: 3D-Media-GPGPU Engine > 3D Pipeline Stages > Pixel > Pixel Shader Thread Generation > Pixel Grouping (Dispatch Size) Control" shows:<br>
<br></div><div>SIMD8 only with validity criterion A<br></div><div>SIMD16 only with validity criterion B<br></div><div>SIMD8+SIMD16 with validity criterion D<br></div><div>...skipping some irrelevant ones...<br></div><div>
SIMD16+SIMD32 with validity criterion E (note: irrelevant since we don't support SIMD32)<br></div><div>...skipping some more irrelevant ones...<br></div><div><br></div><div>(note: the SNB and IVB PRMs show SIMD16 only with validity criterion F, but the end result is the same.)<br>
</div><div><br></div><div>In the SNB and IVB PRM, Criteria A-F are defined right next to the table. In the current bspec, they're defined in 3DSTATE_WM and 3DSTATE_PS. They say:<br><br>A: Valid on all products.<br>
B: Valid on [DevCTG+]. Not valid on [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth.<br>C: Valid only on [DevCTG] and [DevIL]<br>D: Valid on all products, except when in non-1x PERSAMPLE mode (applies to [DevSNB] only). Not valid on [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth.<br>
E: Valid on all products, except when in PERSAMPLE mode with number of multisamples >= 8 (applies to [DevIVB+] only). Not valid on [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth.<br>
F: Valid on all products, except not valid on [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth.<br><br></div><div><br></div><div>So the upshot of all this is that in non-1x PERSAMPLE mode:<br>
<br></div><div>- On gen6, we cannot enable both SIMD8 and SIMD16, but we can enable one or the other.<br>- On gen7, we can enable either SIMD8 or SIMD16 or both.<br><br></div><div>(previously, we thought that 8x MSAA had to be SIMD8 only, but I believe that was a misunderstanding. The restriction on 8x MSAA is only in validity criterion E, and that criterion only takes effect if we're doing SIMD16+SIMD32, which we never do.)<br>
</div><div><br></div><div>Please correct me if my logic is wrong. I want to make sure we get to the bottom of this.<br><br></div><div>Assuming your analysis agrees with mine, then the only reason we aren't doing SIMD16 PERSAMPLE shading on Sandy Bridge is because we haven't had a chance to get it working yet. At a minimum, we need to update the comment to reflect that. It might be worth adding some of the above analysis to the comment so that the next person to work on this code doesn't get as confused as we did. <br>
</div></div></div></div></blockquote><div><br></div><div>Two other notes about this:<br><br></div><div>1. Currently we aren't respecting the restrictions that say "Not valid on [DevSNB] if 4x PERPIXEL mode with pixel shader computed depth." I'm happy to take care of this once Anuj's series lands.<br>
<br></div><div>2. Once we've verified that SIMD16 code generation works properly on Gen7 for gl_SampleID, gl_SamplePosition, and gl_SampleMask[], I believe we should switch our Sandy Bridge logic so that in non-1x PERSAMPLE mode, we do "SIMD16 only" dispatch if a SIMD16 shader was successfully compiled. I believe that in the vast majority of cases that will be more performant than "SIMD8 only" dispatch.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>
</div><div class="im">
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ */<br>
dw5 |= GEN6_WM_8_DISPATCH_ENABLE;<br>
- if (brw->wm.prog_data->prog_offset_16)<br>
+<br>
+ if (brw->wm.prog_data->prog_offset_16 &&<br>
+ !(_mesa_get_min_invocations_per_fragment(ctx,<br>
+ brw->fragment_program) > 1))<br></blockquote><div><br></div></div><div>This "if" statement is overly indented. Also, !(_mesa_get_min_invocations_per_fragment(...) > 1) is confusing. Let's just do _mesa_get_min_invocations_per_fragment(...) == 1.<br>
<br></div><div>(Note: this is assuming you agreed with me that we need to add "MAX2" in patch 4. If we don't do that then "== 1" needs to change to "<= 1".)<br></div><div><div class="h5">
<div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
dw5 |= GEN6_WM_16_DISPATCH_ENABLE;<br>
<br>
/* CACHE_NEW_WM_PROG | _NEW_COLOR */<br>
@@ -183,7 +196,8 @@ upload_wm_state(struct brw_context *brw)<br>
<br>
/* _NEW_COLOR, _NEW_MULTISAMPLE */<br>
if (fp->program.UsesKill || ctx->Color.AlphaEnabled ||<br>
- ctx->Multisample.SampleAlphaToCoverage)<br>
+ ctx->Multisample.SampleAlphaToCoverage ||<br>
+ brw->wm.prog_data->uses_omask)<br>
dw5 |= GEN6_WM_KILL_ENABLE;<br>
<br>
if (brw_color_buffer_write_enabled(brw) ||<br>
@@ -191,6 +205,16 @@ upload_wm_state(struct brw_context *brw)<br>
dw5 |= GEN6_WM_DISPATCH_ENABLE;<br>
}<br>
<br>
+ /* From the SNB PRM, volume 2 part 1, page 278:<br>
+ * "This bit is inserted in the PS payload header and made available to<br>
+ * the DataPort (either via the message header or via header bypass) to<br>
+ * indicate that oMask data (one or two phases) is included in Render<br>
+ * Target Write messages. If present, the oMask data is used to mask off<br>
+ * samples."<br>
+ */<br>
+ if(brw->wm.prog_data->uses_omask)<br>
+ dw5 |= GEN6_WM_OMASK_TO_RENDER_TARGET;<br>
+<br>
/* CACHE_NEW_WM_PROG */<br>
dw6 |= brw->wm.prog_data->num_varying_inputs <<<br>
GEN6_WM_NUM_SF_OUTPUTS_SHIFT;<br>
@@ -200,12 +224,34 @@ upload_wm_state(struct brw_context *brw)<br>
dw6 |= GEN6_WM_MSRAST_ON_PATTERN;<br>
else<br>
dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;<br>
- dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;<br>
+<br>
+ if (_mesa_get_min_invocations_per_fragment(ctx, brw->fragment_program) > 1)<br>
+ dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;<br>
+ else<br>
+ dw6 |= GEN6_WM_MSDISPMODE_PERPIXEL;<br>
} else {<br>
dw6 |= GEN6_WM_MSRAST_OFF_PIXEL;<br>
dw6 |= GEN6_WM_MSDISPMODE_PERSAMPLE;<br>
}<br>
<br>
+ /* _NEW_BUFFERS, _NEW_MULTISAMPLE */<br>
+ /* From the SNB PRM, volume 2 part 1, page 281:<br>
+ * "If the PS kernel does not need the Position XY Offsets<br>
+ * to compute a Position XY value, then this field should be<br>
+ * programmed to POSOFFSET_NONE."<br>
+ *<br>
+ * "SW Recommendation: If the PS kernel needs the Position Offsets<br>
+ * to compute a Position XY value, this field should match Position<br>
+ * ZW Interpolation Mode to ensure a consistent position.xyzw<br>
+ * computation."<br>
+ * We only require XY sample offsets. So, this recommendation doesn't<br>
+ * look useful at the moment. We might need this in future.<br>
+ */<br>
+ if (brw->wm.prog_data->uses_pos_offset)<br>
+ dw6 |= GEN6_WM_POSOFFSET_SAMPLE;<br>
+ else<br>
+ dw6 |= GEN6_WM_POSOFFSET_NONE;<br>
+<br>
BEGIN_BATCH(9);<br>
OUT_BATCH(_3DSTATE_WM << 16 | (9 - 2));<br>
OUT_BATCH(brw->wm.base.prog_offset);<br>
<span><font color="#888888">--<br>
1.8.1.4<br>
<br>
</font></span></blockquote></div></div></div><br></div><div class="gmail_extra">With those changes, the patch is:<br><br></div><div class="gmail_extra">Reviewed-by: Paul Berry <<a href="mailto:stereotype441@gmail.com" target="_blank">stereotype441@gmail.com</a>><br>
</div></div>
</blockquote></div><br></div></div>