[Mesa-dev] [PATCH 22/22] meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit
Ian Romanick
idr at freedesktop.org
Fri Feb 19 00:13:22 UTC 2016
On 02/18/2016 01:47 PM, Ian Romanick wrote:
> On 02/18/2016 08:44 AM, Neil Roberts wrote:
>> I made a pathological test case (attached) which repeatedly renders into
>> an MSAA FBO and then blits it to the screen and measures the framerate.
>> It checks it with a range of different sample counts. The rendering is
>> done either by rendering two triangles to fill the framebuffer or by
>> calling glClear.
>>
>> The percentage increase in framerate after applying the patch is like
>> this:
>>
>> With triangles to fill buffer:
>>
>> 16 62,27%
>> 8 48,59%
>> 4 27,72%
>> 2 5,34%
>> 0 0,58%
>>
>> With glClear:
>>
>> 16 -5,20%
>> 8 -7,08%
>> 4 -2,45%
>> 2 -20,76%
>> 0 3,71%
>
> I find this result interesting. In the samples=0 case, the before and
> after shaders should be identical. I dug into this a bit, and I think
> the problem is the previous patch. Using do { ... } while (false) in
> the macro makes the compiler generate some really, really bad code.
> Just deleting those two lines makes a huge difference.
>
> With do { ... } while (false):
>
> SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills.
> Promoted 0 constants. Compacted 3648 to 2192 bytes (40%)
>
> Without:
>
> SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills.
> Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
> SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills.
> Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
>
> This is for samples=16 and sampler2DMS. The cycle counts are bogus
> because the 4 loops have their cycle counts statically multiplied by
> 10... even though the loop will only execute once.
>
> So... I'll try some more experiments. I also wonder about changing the
> SAMPLES > 1 to SAMPLES > 2.
This is weird, and I'm not sure what to make of it. I ran your test at
a bunch of commits on my branch. Getting rid of gl_FragColor was the
big win. The GL_EXT_shader_samples_identical patch is still the big
loss. I'm assuming that causes enough register pressure over the
previous commit to lose SIMD16. The following patch just removes the
"do {" and "} while (false)". That gets back some of the losses,
but not all of it.
The raw per-commit data is below.
5c7f974 Android: disable unused-parameter warning
With triangles to fill buffer:
8,419.603882
4,633.834045
2,595.628113
0,795.798157
With glClear:
8,450.937958
4,594.000610
2,641.354553
0,767.400818
89b8c0a meta/blit: Replace the gl_FragColor write
With triangles to fill buffer:
8,597.943054
4,803.083862
2,942.862549
0,1129.432983
With glClear:
8,643.500671
4,980.392151
2,982.704407
0,1116.569946
7025d43 meta/blit: Don't dynamically select the MSAA "merge" function
With triangles to fill buffer:
8,599.412598
4,954.745056
2,933.184021
0,1129.305420
With glClear:
8,582.106079
4,965.810303
2,975.419434
0,1169.180420
b895a4d meta/blit: Use a single, master shader template for MSAA-SS blits
With triangles to fill buffer:
8,591.786011
4,932.835815
2,910.498047
0,1072.501099
With glClear:
8,623.441406
4,952.562378
2,941.442261
0,1081.314819
197e4be meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit
With triangles to fill buffer:
8,559.252869
4,795.355103
2,781.494202
0,941.619568
With glClear:
8,577.400513
4,806.321533
2,811.030029
0,957.579224
ec01613 fix loop nonsense derp
With triangles to fill buffer:
8,559.221558
4,788.767944
2,786.287170
0,1078.632324
With glClear:
8,575.771545
4,956.205750
2,813.206482
0,1111.111084
> I will probably also make a patch to fix the damage done by the do { ...
> } while (false). That's just comedy.
>
>> It seems like a pretty convincing win for the triangle case but the
>> clear case makes it slightly worse. Presumably this is because we don't
>> do anything to detect the value stored in the MCS buffer when doing a
>> fast clear so the fast path isn't taken and the shader being more
>> complicated makes it slower.
>>
>> Not sure if we want to try and do anything about that because
>> potentially the cleared pixels aren't very common in a framebuffer from
>> a real use case so it might not really matter.
>>
>> Currently we don't use SIMD16 for 16x MSAA because we can't allocate the
>> registers well enough to make it worthwhile. This patch makes that
>> problem a bit more interesting because even if we end up spilling a lot
>> it might still be worth doing SIMD16 because the cases where the spilled
>> instructions are hit would be much less common.
>>
>> - Neil
>>
>> Ian Romanick <idr at freedesktop.org> writes:
>>
>>> From: Ian Romanick <ian.d.romanick at intel.com>
>>>
>>> Somewhat surprisingly, this didn't have any affect on performance in the
>>> benchmarks that Martin tried for me.
>>>
>>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>>> ---
>>> src/mesa/drivers/common/meta_blit.c | 10 +++++++++-
>>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c
>>> index 28aabd3..c0ec51f 100644
>>> --- a/src/mesa/drivers/common/meta_blit.c
>>> +++ b/src/mesa/drivers/common/meta_blit.c
>>> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>>> fs_source = ralloc_asprintf(mem_ctx,
>>> "#version 130\n"
>>> "#extension GL_ARB_texture_multisample: require\n"
>>> + "#extension GL_EXT_shader_samples_identical: enable\n"
>>> "#define gvec4 %svec4\n"
>>> "uniform %ssampler2DMS%s texSampler;\n"
>>> "in %s texCoords;\n"
>>> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>>> " i%s tc = i%s(texCoords);\n"
>>> " int i;\n"
>>> "\n"
>>> - " for (i = 0; i < SAMPLES; i++)\n"
>>> + " S[0] = texelFetch(texSampler, tc, 0);\n"
>>> + "#if defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n"
>>> + " if (textureSamplesIdenticalEXT(texSampler, tc)) {\n"
>>> + " emit2(S[0]);\n"
>>> + " return;\n"
>>> + " }\n"
>>> + "#endif\n"
>>> + " for (i = 1; i < SAMPLES; i++)\n"
>>> " S[i] = texelFetch(texSampler, tc, i);\n"
>>> "\n"
>>> " REDUCE(s16, s32);\n"
>>> --
>>> 2.5.0
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list