[Mesa-dev] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

Tue Apr 15 15:45:08 PDT 2014

On 04/15/2014 12:18 PM, Eric Anholt wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
> 
>> On 04/14/2014 05:33 PM, Eric Anholt wrote:
>>> This manifested as rendering failures or sometimes GPU hangs in
>>> compositors when they accidentally got MSAA visuals due to a bug in the X
>>> Server.  Today we decided that the problem in compositors was equivalent
>>> to a corruption bug we'd noticed recently in resizing MSAA-visual
>>> glxgears, and debugging got a lot easier.
>>>
>>> When we allocate our MCS MT, libdrm takes the size we request, aligns it
>>> to Y tile size (blowing it up from 300x300=900000 bytes to 384*320=122880
>>> bytes, 30 pages), then puts it into a power-of-two-sized BO (131072 bytes,
>>> 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence to it.
>>> When we memset by the BO size in Mesa, between bytes 122880 and 131072 the
>>> data gets stored to the first 20 or so scanlines of each of the 3 tiled
>>> pages in that row, even though only 2 of those pages were allocated by
>>> libdrm.
>>
>> What?
>>
>> I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
>> drm_intel_bo where bo->size is larger than what you asked for, due to
>> the BO cache.  But...what you're saying is, it doesn't actually allocate
>> enough pages to back the whole bo->size it gives you?  So, if you write
>> bytes 0..(bo->size - 1), you'll randomly clobber memory in a way that's
>> really difficult to detect?
> 
> You have that many pages, really.  But you've attached a fence to it, so
> your allocated pages are structured as:
> 
> +---+---+---+
> |   |   |   |
> +---+---+---+
> |   |   |   |
> +---+---+---+
> |   |   |   |
> +---+---+---+
> |   |   |
> +---+---+
> 
> (except taller in this specific example).  If you hit the pixels in
> those quads, you'll be fine.
> 
>>
>> There are other places where we memset an entire BO using bo->size.  For
>> example, your INTEL_DEBUG=shader_time code does exactly that (though it
>> isn't tiled).
>>
>> Could we change libdrm to set bo->size to the actual usable size of the
>> buffer, rather than the bucket size?
> 
> The pages containing pixels you asked for go to 122880, and the BO is
> 131072, but the pixels you asked for have a maximum linear address of
> 384*320=115200.  Which value are you thinking is the "actual usable
> size"?  We certainly shouldn't have been memsetting more pixels than
> 115200.

115200, I guess - the maximum linear address.  Because if I map it
through a fence, and try to access beyond that, the tiling may put it
beyond the page containing pixels I asked for, which is dangerous.

Maybe it doesn't need to change---really, I'm wondering how to prevent
this kind of problem in the future, since it seems to be an easy mistake
to make, and I know a bunch of us read that code many times and didn't
spot the problem.

Perhaps there's a way to teach valgrind about the maximum linear address
when mapping through a fence, so it can complain about accesses beyond
that?  While that wouldn't prevent us from writing bad code, it would
certainly help detect it.

--Ken

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140415/48716ae3/attachment.sig>