[Mesa-dev] [PATCH 22/22] meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit

Fri Feb 19 00:13:22 UTC 2016

On 02/18/2016 01:47 PM, Ian Romanick wrote:
> On 02/18/2016 08:44 AM, Neil Roberts wrote:
>> I made a pathological test case (attached) which repeatedly renders into
>> an MSAA FBO and then blits it to the screen and measures the framerate.
>> It checks it with a range of different sample counts. The rendering is
>> done either by rendering two triangles to fill the framebuffer or by
>> calling glClear.
>>
>> The percentage increase in framerate after applying the patch is like
>> this:
>>
>> With triangles to fill buffer:
>> 	
>> 16      62,27%
>> 8       48,59%
>> 4       27,72%
>> 2       5,34%
>> 0       0,58%
>>         
>> With glClear:   
>>         
>> 16      -5,20%
>> 8       -7,08%
>> 4       -2,45%
>> 2       -20,76%
>> 0       3,71%
> 
> I find this result interesting.  In the samples=0 case, the before and
> after shaders should be identical.  I dug into this a bit, and I think
> the problem is the previous patch.  Using do { ... } while (false) in
> the macro makes the compiler generate some really, really bad code.
> Just deleting those two lines makes a huge difference.
> 
> With do { ... } while (false):
> 
> SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills.
> Promoted 0 constants. Compacted 3648 to 2192 bytes (40%)
> 
> Without:
> 
> SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills.
> Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
> SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills.
> Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
> 
> This is for samples=16 and sampler2DMS.  The cycle counts are bogus
> because the 4 loops have their cycle counts statically multiplied by
> 10... even though the loop will only execute once.
> 
> So... I'll try some more experiments.  I also wonder about changing the
> SAMPLES > 1 to SAMPLES > 2.

This is weird, and I'm not sure what to make of it.  I ran your test at
a bunch of commits on my branch.  Getting rid of gl_FragColor was the
big win.  The GL_EXT_shader_samples_identical patch is still the big
loss.  I'm assuming that causes enough register pressure over the
previous commit to lose SIMD16.  The following patch just removes the
"do {" and "} while (false)".  That gets back some of the losses,
but not all of it.

The raw per-commit data is below.

5c7f974 Android: disable unused-parameter warning
With triangles to fill buffer:

8,419.603882
4,633.834045
2,595.628113
0,795.798157

With glClear:

8,450.937958
4,594.000610
2,641.354553
0,767.400818

89b8c0a meta/blit: Replace the gl_FragColor write
With triangles to fill buffer:

8,597.943054
4,803.083862
2,942.862549
0,1129.432983

With glClear:

8,643.500671
4,980.392151
2,982.704407
0,1116.569946

7025d43 meta/blit: Don't dynamically select the MSAA "merge" function
With triangles to fill buffer:

8,599.412598
4,954.745056
2,933.184021
0,1129.305420

With glClear:

8,582.106079
4,965.810303
2,975.419434
0,1169.180420

b895a4d meta/blit: Use a single, master shader template for MSAA-SS blits
With triangles to fill buffer:

8,591.786011
4,932.835815
2,910.498047
0,1072.501099

With glClear:

8,623.441406
4,952.562378
2,941.442261
0,1081.314819

197e4be meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit
With triangles to fill buffer:

8,559.252869
4,795.355103
2,781.494202
0,941.619568

With glClear:

8,577.400513
4,806.321533
2,811.030029
0,957.579224

ec01613 fix loop nonsense derp
With triangles to fill buffer:

8,559.221558
4,788.767944
2,786.287170
0,1078.632324

With glClear:

8,575.771545
4,956.205750
2,813.206482
0,1111.111084

> I will probably also make a patch to fix the damage done by the do { ...
> } while (false).  That's just comedy.
> 
>> It seems like a pretty convincing win for the triangle case but the
>> clear case makes it slightly worse. Presumably this is because we don't
>> do anything to detect the value stored in the MCS buffer when doing a
>> fast clear so the fast path isn't taken and the shader being more
>> complicated makes it slower.
>>
>> Not sure if we want to try and do anything about that because
>> potentially the cleared pixels aren't very common in a framebuffer from
>> a real use case so it might not really matter.
>>
>> Currently we don't use SIMD16 for 16x MSAA because we can't allocate the
>> registers well enough to make it worthwhile. This patch makes that
>> problem a bit more interesting because even if we end up spilling a lot
>> it might still be worth doing SIMD16 because the cases where the spilled
>> instructions are hit would be much less common.
>>
>> - Neil
>>
>> Ian Romanick <idr at freedesktop.org> writes:
>>
>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>
>>> Somewhat surprisingly, this didn't have any affect on performance in the
>>> benchmarks that Martin tried for me.
>>>
>>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>>> ---
>>>  src/mesa/drivers/common/meta_blit.c | 10 +++++++++-
>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c
>>> index 28aabd3..c0ec51f 100644
>>> --- a/src/mesa/drivers/common/meta_blit.c
>>> +++ b/src/mesa/drivers/common/meta_blit.c
>>> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>>>           fs_source = ralloc_asprintf(mem_ctx,
>>>                                       "#version 130\n"
>>>                                       "#extension GL_ARB_texture_multisample: require\n"
>>> +                                     "#extension GL_EXT_shader_samples_identical: enable\n"
>>>                                       "#define gvec4 %svec4\n"
>>>                                       "uniform %ssampler2DMS%s texSampler;\n"
>>>                                       "in %s texCoords;\n"
>>> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>>>                                       "   i%s tc = i%s(texCoords);\n"
>>>                                       "   int i;\n"
>>>                                       "\n"
>>> -                                     "   for (i = 0; i < SAMPLES; i++)\n"
>>> +                                     "   S[0] = texelFetch(texSampler, tc, 0);\n"
>>> +                                     "#if defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n"
>>> +                                     "   if (textureSamplesIdenticalEXT(texSampler, tc)) {\n"
>>> +                                     "      emit2(S[0]);\n"
>>> +                                     "      return;\n"
>>> +                                     "   }\n"
>>> +                                     "#endif\n"
>>> +                                     "   for (i = 1; i < SAMPLES; i++)\n"
>>>                                       "      S[i] = texelFetch(texSampler, tc, i);\n"
>>>                                       "\n"
>>>                                       "   REDUCE(s16, s32);\n"
>>> -- 
>>> 2.5.0
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev