[Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

Kenneth Graunke kenneth at whitecape.org
Mon Apr 14 19:54:15 PDT 2014


On 04/14/2014 05:33 PM, Eric Anholt wrote:
> This manifested as rendering failures or sometimes GPU hangs in
> compositors when they accidentally got MSAA visuals due to a bug in the X
> Server.  Today we decided that the problem in compositors was equivalent
> to a corruption bug we'd noticed recently in resizing MSAA-visual
> glxgears, and debugging got a lot easier.
> 
> When we allocate our MCS MT, libdrm takes the size we request, aligns it
> to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
> bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
> 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to it.
> When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
> data gets stored to the first 20 or so scanlines of each of the 3 tiled
> pages in that row, even though only 2 of those pages were allocated by
> libdrm.

What?

I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
drm_intel_bo where bo->size is larger than what you asked for, due to
the BO cache.  But...what you're saying is, it doesn't actually allocate
enough pages to back the whole bo->size it gives you?  So, if you write
bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's
really difficult to detect?

I never realized that.  It seems pretty scary.

There are other places where we memset an entire BO using bo->size.  For
example, your INTEL_DEBUG=shader_time code does exactly that (though it
isn't tiled).

Could we change libdrm to set bo->size to the actual usable size of the
buffer, rather than the bucket size?  That seems much safer, and would
prevent mistakes like this in the future.  If libdrm needs to know the
bucket size internally, we could store that as a field in the private
drm_intel_gem_bo structure, and not expose it to consumers of the
library.  I can't imagine users of libdrm want the current value.

> In the glxgears case, the missing 3rd page happened to
> consistently be the static VBO that got mapped right after the first MCS
> allocation, so corruption only appeared once window resize made us throw
> out the old MCS and then allocate the same BO to back the new MCS.
> 
> Instead, just memset the amount of data we actually asked libdrm to
> allocate for, which will be smaller (more efficient) and not overrun.
> Thanks go to Kenneth for doing most of the hard debugging to eliminate a
> lot of the search space for the bug.
> 
> Cc: "10.0 10.1" <mesa-stable at lists.freedesktop.org>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
> ---
> 
> Ken: how I eventually figured this out was I thought "well, I'm
> clearing the whole surface at the start of rendering glxgears, so I
> don't *really* need to memset because I'll be initializing the whole
> buffer during fallback glClear() anyway, so let me just drop the whole
> memset step to definitely eliminate that as a potential problem.  Huh,
> the problem's gone."
> 
> The worst is I remember briefly thinking about this code when it was
> landing, and thought "nope, seems safe."
> 
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 5996a1b..59700ed 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
>      * Note: the clear value for MCS buffers is all 1's, so we memset to 0xff.
>      */
>     void *data = intel_miptree_map_raw(brw, mt->mcs_mt);
> -   memset(data, 0xff, mt->mcs_mt->region->bo->size);
> +   memset(data, 0xff, mt->mcs_mt->region->height * mt->mcs_mt->region->pitch);
>     intel_miptree_unmap_raw(brw, mt->mcs_mt);
>     mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;

This does seem to fix the KWin problem, as well as the glxgears problem.

I agree this is the correct amount of data to memset, and even if we
make the libdrm change I suggested, this seems worth doing.  bo->size
may have been rounded up beyond what we need, and memsetting that extra
space is wasteful (even if it did work).

Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

Thanks a ton for your help on this, Eric.  I was really stumped.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140414/6d555b0c/attachment-0001.sig>


More information about the mesa-dev mailing list