[Mesa-stable] [Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

Tue Apr 15 21:06:24 PDT 2014

On Wed, Apr 16, 2014 at 3:18 AM, Eric Anholt <eric at anholt.net> wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
>
>> On 04/14/2014 05:33 PM, Eric Anholt wrote:
>>> This manifested as rendering failures or sometimes GPU hangs in
>>> compositors when they accidentally got MSAA visuals due to a bug in the X
>>> Server.  Today we decided that the problem in compositors was equivalent
>>> to a corruption bug we'd noticed recently in resizing MSAA-visual
>>> glxgears, and debugging got a lot easier.
>>>
>>> When we allocate our MCS MT, libdrm takes the size we request, aligns it
>>> to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
>>> bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
>>> 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to it.
>>> When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
>>> data gets stored to the first 20 or so scanlines of each of the 3 tiled
>>> pages in that row, even though only 2 of those pages were allocated by
>>> libdrm.
>>
>> What?
>>
>> I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
>> drm_intel_bo where bo->size is larger than what you asked for, due to
>> the BO cache.  But...what you're saying is, it doesn't actually allocate
>> enough pages to back the whole bo->size it gives you?  So, if you write
>> bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's
>> really difficult to detect?
>
> You have that many pages, really.  But you've attached a fence to it, so
> your allocated pages are structured as:
>
> +---+---+---+
> |   |   |   |
> +---+---+---+
> |   |   |   |
> +---+---+---+
> |   |   |   |
> +---+---+---+
> |   |   |
> +---+---+
>
> (except taller in this specific example).  If you hit the pixels in
> those quads, you'll be fine.
>
>>
>> There are other places where we memset an entire BO using bo->size.  For
>> example, your INTEL_DEBUG=shader_time code does exactly that (though it
>> isn't tiled).
>>
>> Could we change libdrm to set bo->size to the actual usable size of the
>> buffer, rather than the bucket size?
>
> The pages containing pixels you asked for go to 122880, and the BO is
> 131072, but the pixels you asked for have a maximum linear address of
> 384*320=115200.  Which value are you thinking is the "actual usable
> size"?  We certainly shouldn't have been memsetting more pixels than
> 115200.
384*320 is 122880.  It  feels like bo->size could be 122880, and
131072 could be stored elsewhere in bo_gem.

With that change assumed, do you think it makes sense to add

        if (tiling_mode != I915_TILING_NONE && bo->size % stride)
           fprintf(stderr, "bo size is not a multiple of stride\n");

to drm_intel_gem_bo_set_tiling_internal?  That is, emit a warning when

  drm_intel_gem_bo_map_gtt(bo);
  memset(bo->virtual, 0, bo->size);

is known to explode.

>
> _______________________________________________
> mesa-stable mailing list
> mesa-stable at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-stable
>

-- 
olv at LunarG.com