[Mesa-stable] [Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

Tue Apr 15 10:08:54 PDT 2014

I would go farther than requiring the returned bo->size to be covered by
backing pages.
There really should be backing pages for every page mapped by
drm_intel_gem_bo_map.
No mapped addresses should be affecting memory outside of an object's
backing pages.

If tiling can result in access to unallocated memory it might even lead to
corruption of data in pages
used by different processes.  That would break GL_ARB_robustness
requirements.

On Tue, Apr 15, 2014 at 10:43 AM, Courtney Goeltzenleuchter <
courtney at lunarg.com> wrote:

> What timing, we were just looking at this exact issue - intermittent
> glxgears rendering issues when using multisample buffers.
>
> What's the plan for DRM? Seems like it's broken if writing to the buffer
> size indicated (bo->size) causes us to clobber other BOs.
>
> Courtney
>
>
> On Mon, Apr 14, 2014 at 8:54 PM, Kenneth Graunke <kenneth at whitecape.org>wrote:
>
>> On 04/14/2014 05:33 PM, Eric Anholt wrote:
>> > This manifested as rendering failures or sometimes GPU hangs in
>> > compositors when they accidentally got MSAA visuals due to a bug in the
>> X
>> > Server.  Today we decided that the problem in compositors was equivalent
>> > to a corruption bug we'd noticed recently in resizing MSAA-visual
>> > glxgears, and debugging got a lot easier.
>> >
>> > When we allocate our MCS MT, libdrm takes the size we request, aligns it
>> > to Y tile size (blowing it up from 300x300=900000 bytes to
>> 384*320=122880
>> > bytes, 30 pages), then puts it into a power-of-two-sized BO (131072
>> bytes,
>> > 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to
>> it.
>> > When we memset by the BO size in Mesa, between bytes 122880 and 131072
>> the
>> > data gets stored to the first 20 or so scanlines of each of the 3 tiled
>> > pages in that row, even though only 2 of those pages were allocated by
>> > libdrm.
>>
>> What?
>>
>> I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
>> drm_intel_bo where bo->size is larger than what you asked for, due to
>> the BO cache.  But...what you're saying is, it doesn't actually allocate
>> enough pages to back the whole bo->size it gives you?  So, if you write
>> bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's
>> really difficult to detect?
>>
>> I never realized that.  It seems pretty scary.
>>
>> There are other places where we memset an entire BO using bo->size.  For
>> example, your INTEL_DEBUG=shader_time code does exactly that (though it
>> isn't tiled).
>>
>> Could we change libdrm to set bo->size to the actual usable size of the
>> buffer, rather than the bucket size?  That seems much safer, and would
>> prevent mistakes like this in the future.  If libdrm needs to know the
>> bucket size internally, we could store that as a field in the private
>> drm_intel_gem_bo structure, and not expose it to consumers of the
>> library.  I can't imagine users of libdrm want the current value.
>>
>> > In the glxgears case, the missing 3rd page happened to
>> > consistently be the static VBO that got mapped right after the first MCS
>> > allocation, so corruption only appeared once window resize made us throw
>> > out the old MCS and then allocate the same BO to back the new MCS.
>> >
>> > Instead, just memset the amount of data we actually asked libdrm to
>> > allocate for, which will be smaller (more efficient) and not overrun.
>> > Thanks go to Kenneth for doing most of the hard debugging to eliminate a
>> > lot of the search space for the bug.
>> >
>> > Cc: "10.0 10.1" <mesa-stable at lists.freedesktop.org>
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77207
>> > ---
>> >
>> > Ken: how I eventually figured this out was I thought "well, I'm
>> > clearing the whole surface at the start of rendering glxgears, so I
>> > don't *really* need to memset because I'll be initializing the whole
>> > buffer during fallback glClear() anyway, so let me just drop the whole
>> > memset step to definitely eliminate that as a potential problem.  Huh,
>> > the problem's gone."
>> >
>> > The worst is I remember briefly thinking about this code when it was
>> > landing, and thought "nope, seems safe."
>> >
>> >  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> > index 5996a1b..59700ed 100644
>> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
>> > @@ -1219,7 +1219,7 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
>> >      * Note: the clear value for MCS buffers is all 1's, so we memset
>> to 0xff.
>> >      */
>> >     void *data = intel_miptree_map_raw(brw, mt->mcs_mt);
>> > -   memset(data, 0xff, mt->mcs_mt->region->bo->size);
>> > +   memset(data, 0xff, mt->mcs_mt->region->height *
>> mt->mcs_mt->region->pitch);
>> >     intel_miptree_unmap_raw(brw, mt->mcs_mt);
>> >     mt->fast_clear_state = INTEL_FAST_CLEAR_STATE_CLEAR;
>>
>> This does seem to fix the KWin problem, as well as the glxgears problem.
>>
>> I agree this is the correct amount of data to memset, and even if we
>> make the libdrm change I suggested, this seems worth doing.  bo->size
>> may have been rounded up beyond what we need, and memsetting that extra
>> space is wasteful (even if it did work).
>>
>> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>>
>> Thanks a ton for your help on this, Eric.  I was really stumped.
>>
>>
>> _______________________________________________
>> mesa-stable mailing list
>> mesa-stable at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-stable
>>
>>
>
>
> --
> Courtney Goeltzenleuchter
> LunarG
>
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>

-- 

 Mike Stroyan - Software Architect
 LunarG, Inc.  - The Graphics Experts
 Cell:  (970) 219-7905
 Email: Mike at LunarG.com
 Website: http://www.lunarg.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-stable/attachments/20140415/25635c49/attachment.html>