[Mesa-dev] [PATCH] st/mesa: expose EXT_framebuffer_multisample_blit_scaled if MSAA is supported

Tue Jul 16 18:09:41 PDT 2013

Evergreen and later chipsets can sample from compressed colorbuffers.
Cayman and later chipsets cannot even decompress them. On those
chipsets, the decompression code only converts the CMASK+FMASK combo
to a texturable FMASK.

Marek

On Wed, Jul 17, 2013 at 2:52 AM, Grigori Goronzy <greg at chown.ath.cx> wrote:
> On 17.07.2013 02:05, Marek Olšák wrote:
>>
>> No, it's not faster, but it's not slower either.
>>
>> Now that I think about it, I can't come up with a good shader-based
>> algorithm for the resolve operation.
>>
>> I don't think Christoph's approach that an MSAA texture can be viewed
>> as a larger single-sample texture is correct, because the physical
>> locations of the samples in memory usually do not correspond to the
>> sample locations the 3D engine used for rasterization. so fetching a
>> texel from the larger texture at (x,y) physical coordinates won't
>> always return the closest rasterized sample at those coordinates. Also
>> the bilinear filter would be horrible in this case, because it only
>> takes 4 samples per pixel.
>>
>> Now let's consider implementing the scaled resolve operation in the
>> shader by texelFetch-ing all samples and using a bilinear filter. For
>> Nx MSAA, there would be N*4 texel fetches per pixel; in comparison,
>> separate resolve+blit needs only N+4 texel fetches per pixel. In
>> addition to that, the resolve is a special fixed-function blending
>> operation and the fragment shader is not even executed. See? Separate
>> resolve+blit beats everything.
>>
>
> AFAICS the point of the spec is that it allows cheaper approximations that
> don't use all texels and it allows the implementation to avoid writes to a
> temp texture, both to save memory bandwidth. I am not sure if it is
> reasonably possible to do this (without causing aliasing). How does scaled
> blit on Intel hardware perform compared to resolve+blit? Maybe it helps on
> bandwidth-constrained GPU configurations.
>
> In terms of memory bandwidth per pixel, resolve+blit needs N reads and 1
> write for the resolve step and 1 read for the blit step. If we assume 100%
> hit rate for the texture cache, scaled blit needs only N reads and that's
> it. So in theory it may work. OTOH, compressed colorbuffers and fast clear
> that are used by r600g should reduce actual bandwidth requirements for the
> resolve step a lot. And we cannot take advantage of the compression when
> we're sampling from colorbuffers. I probably just answered this myself:
> resolve+blit is easier and better at least on Radeon hardware. :)
>
> Grigori