[Mesa-stable] [Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

Tue Apr 15 09:43:31 PDT 2014

What timing, we were just looking at this exact issue - intermittent
glxgears rendering issues when using multisample buffers.

What's the plan for DRM? Seems like it's broken if writing to the buffer
size indicated (bo->size) causes us to clobber other BOs.

Courtney

On Mon, Apr 14, 2014 at 8:54 PM, Kenneth Graunke <kenneth at whitecape.org>wrote:

> On 04/14/2014 05:33 PM, Eric Anholt wrote:
> > This manifested as rendering failures or sometimes GPU hangs in
> > compositors when they accidentally got MSAA visuals due to a bug in the X
> > Server.  Today we decided that the problem in compositors was equivalent
> > to a corruption bug we'd noticed recently in resizing MSAA-visual
> > glxgears, and debugging got a lot easier.
> >
> > When we allocate our MCS MT, libdrm takes the size we request, aligns it
> > to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
> > bytes, 30 pages), then puts it into a power-of-two-sized BO (131072
> bytes,
> > 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to
> it.
> > When we memset by the BO size in Mesa, between bytes 122880 and 131072
> the
> > data gets stored to the first 20 or so scanlines of each of the 3 tiled
> > pages in that row, even though only 2 of those pages were allocated by
> > libdrm.
>
> What?
>
> I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
> drm_intel_bo where bo->size is larger than what you asked for, due to
> the BO cache.  But...what you're saying is, it doesn't actually allocate
> enough pages to back the whole bo->size it gives you?  So, if you write
> bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's
> really difficult to detect?
>
> I never realized that.  It seems pretty scary.
>
> There are other places where we memset an entire BO using bo->size.  For
> example, your INTEL_DEBUG=shader_time code does exactly that (though it
> isn't tiled).
>
> Could we change libdrm to set bo->size to the actual usable size of the
> buffer, rather than the bucket size?  That seems much safer, and would
> prevent mistakes like this in the future.  If libdrm needs to know the
> bucket size internally, we could store that as a field in the private
> drm_intel_gem_bo structure, and not expose it to consumers of the
> library.  I can't imagine users of libdrm want the current value.
>
> > In the glxgears case, the missing 3rd page happened to
> > consistently be the static VBO that got mapped right after the first MCS
> > allocation, so corruption only appeared once window resize made us throw
> > out the old MCS and then allocate the same BO to back the new MCS.
> >
> > Instead, just memset the amount of data we actually asked libdrm to
> > allocate for, which will be smaller (more efficient) and not overrun.
> > Thanks go to Kenneth for doing most of the hard debugging to eliminate a
> > lot of the search space for the bug.
> >
> > Cc: "10.0 10.1" <mesa-stable at lists.freedesktop.org>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
> > ---
> >
> > Ken: how I eventually figured this out was I thought "well, I'm
> > clearing the whole surface at the start of rendering glxgears, so I
> > don't *really* need to memset because I'll be initializing the whole
> > buffer during fallback glClear() anyway, so let me just drop the whole
> > memset step to definitely eliminate that as a potential problem.  Huh,
> > the problem's gone."
> >
> > The worst is I remember briefly thinking about this code when it was
> > landing, and thought "nope, seems safe."
> >
> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > index 5996a1b..59700ed 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
> >      * Note: the clear value for MCS buffers is all 1's, so we memset to
> 0xff.
> >      */
> >     void *data = intel_miptree_map_raw(brw, mt->mcs_mt);
> > -   memset(data, 0xff, mt->mcs_mt->region->bo->size);
> > +   memset(data, 0xff, mt->mcs_mt->region->height *
> mt->mcs_mt->region->pitch);
> >     intel_miptree_unmap_raw(brw, mt->mcs_mt);
> >     mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
>
> This does seem to fix the KWin problem, as well as the glxgears problem.
>
> I agree this is the correct amount of data to memset, and even if we
> make the libdrm change I suggested, this seems worth doing.  bo->size
> may have been rounded up beyond what we need, and memsetting that extra
> space is wasteful (even if it did work).
>
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> Thanks a ton for your help on this, Eric.  I was really stumped.
>
>
> _______________________________________________
> mesa-stable mailing list
> mesa-stable at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-stable
>
>

-- 
Courtney Goeltzenleuchter
LunarG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-stable/attachments/20140415/bcdf77dd/attachment-0001.html>