[Mesa-dev] [PATCH 22/22] meta/blit: Use GL_EXT_shader_samples_identical in MSAA-SS resolve blit
Ian Romanick
idr at freedesktop.org
Thu Feb 18 21:47:04 UTC 2016
On 02/18/2016 08:44 AM, Neil Roberts wrote:
> I made a pathological test case (attached) which repeatedly renders into
> an MSAA FBO and then blits it to the screen and measures the framerate.
> It checks it with a range of different sample counts. The rendering is
> done either by rendering two triangles to fill the framebuffer or by
> calling glClear.
>
> The percentage increase in framerate after applying the patch is like
> this:
>
> With triangles to fill buffer:
>
> 16 62,27%
> 8 48,59%
> 4 27,72%
> 2 5,34%
> 0 0,58%
>
> With glClear:
>
> 16 -5,20%
> 8 -7,08%
> 4 -2,45%
> 2 -20,76%
> 0 3,71%
I find this result interesting. In the samples=0 case, the before and
after shaders should be identical. I dug into this a bit, and I think
the problem is the previous patch. Using do { ... } while (false) in
the macro makes the compiler generate some really, really bad code.
Just deleting those two lines makes a huge difference.
With do { ... } while (false):
SIMD8 shader: 228 instructions. 4 loops. 1554 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 3648 to 2192 bytes (40%)
Without:
SIMD8 shader: 159 instructions. 0 loops. 380 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
SIMD16 shader: 159 instructions. 0 loops. 898 cycles. 0:0 spills:fills.
Promoted 0 constants. Compacted 2544 to 1600 bytes (37%)
This is for samples=16 and sampler2DMS. The cycle counts are bogus
because the 4 loops have their cycle counts statically multiplied by
10... even though the loop will only execute once.
So... I'll try some more experiments. I also wonder about changing the
SAMPLES > 1 to SAMPLES > 2.
I will probably also make a patch to fix the damage done by the do { ...
} while (false). That's just comedy.
> It seems like a pretty convincing win for the triangle case but the
> clear case makes it slightly worse. Presumably this is because we don't
> do anything to detect the value stored in the MCS buffer when doing a
> fast clear so the fast path isn't taken and the shader being more
> complicated makes it slower.
>
> Not sure if we want to try and do anything about that because
> potentially the cleared pixels aren't very common in a framebuffer from
> a real use case so it might not really matter.
>
> Currently we don't use SIMD16 for 16x MSAA because we can't allocate the
> registers well enough to make it worthwhile. This patch makes that
> problem a bit more interesting because even if we end up spilling a lot
> it might still be worth doing SIMD16 because the cases where the spilled
> instructions are hit would be much less common.
>
> - Neil
>
>
>
>
> Ian Romanick <idr at freedesktop.org> writes:
>
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> Somewhat surprisingly, this didn't have any affect on performance in the
>> benchmarks that Martin tried for me.
>>
>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>> ---
>> src/mesa/drivers/common/meta_blit.c | 10 +++++++++-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c
>> index 28aabd3..c0ec51f 100644
>> --- a/src/mesa/drivers/common/meta_blit.c
>> +++ b/src/mesa/drivers/common/meta_blit.c
>> @@ -530,6 +530,7 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>> fs_source = ralloc_asprintf(mem_ctx,
>> "#version 130\n"
>> "#extension GL_ARB_texture_multisample: require\n"
>> + "#extension GL_EXT_shader_samples_identical: enable\n"
>> "#define gvec4 %svec4\n"
>> "uniform %ssampler2DMS%s texSampler;\n"
>> "in %s texCoords;\n"
>> @@ -569,7 +570,14 @@ setup_glsl_msaa_blit_shader(struct gl_context *ctx,
>> " i%s tc = i%s(texCoords);\n"
>> " int i;\n"
>> "\n"
>> - " for (i = 0; i < SAMPLES; i++)\n"
>> + " S[0] = texelFetch(texSampler, tc, 0);\n"
>> + "#if defined(GL_EXT_shader_samples_identical) && SAMPLES > 1\n"
>> + " if (textureSamplesIdenticalEXT(texSampler, tc)) {\n"
>> + " emit2(S[0]);\n"
>> + " return;\n"
>> + " }\n"
>> + "#endif\n"
>> + " for (i = 1; i < SAMPLES; i++)\n"
>> " S[i] = texelFetch(texSampler, tc, i);\n"
>> "\n"
>> " REDUCE(s16, s32);\n"
>> --
>> 2.5.0
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
More information about the mesa-dev
mailing list