[Mesa-dev] [Mesa-stable] [PATCH] i965: Fix buffer overruns in MSAA MCS buffer clearing.

Tue Apr 15 15:34:45 PDT 2014

On Tue, Apr 15, 2014 at 2:16 PM, Eric Anholt <eric at anholt.net> wrote:

> Courtney Goeltzenleuchter <courtney at lunarg.com> writes:
>
> > On Tue, Apr 15, 2014 at 1:18 PM, Eric Anholt <eric at anholt.net> wrote:
> >
> >> Kenneth Graunke <kenneth at whitecape.org> writes:
> >>
> >> > On 04/14/2014 05:33 PM, Eric Anholt wrote:
> >> >> This manifested as rendering failures or sometimes GPU hangs in
> >> >> compositors when they accidentally got MSAA visuals due to a bug in
> the
> >> X
> >> >> Server.  Today we decided that the problem in compositors was
> equivalent
> >> >> to a corruption bug we'd noticed recently in resizing MSAA-visual
> >> >> glxgears, and debugging got a lot easier.
> >> >>
> >> >> When we allocate our MCS MT, libdrm takes the size we request,
> aligns it
> >> >> to Y tile size (blowing it up from 300x300=900000 bytes to
> >> 384*320=122880
> >> >> bytes, 30 pages), then puts it into a power-of-two-sized BO (131072
> >> bytes,
> >> >> 32 pages).  Because it's Y tiled, we attach a 384-byte-stride fence
> to
> >> it.
> >> >> When we memset by the BO size in Mesa, between bytes 122880 and
> 131072
> >> the
> >> >> data gets stored to the first 20 or so scanlines of each of the 3
> tiled
> >> >> pages in that row, even though only 2 of those pages were allocated
> by
> >> >> libdrm.
> >> >
> >> > What?
> >> >
> >> > I get that drm_intel_bo_alloc/drm_intel_bo_alloc_tiled might return a
> >> > drm_intel_bo where bo->size is larger than what you asked for, due to
> >> > the BO cache.  But...what you're saying is, it doesn't actually
> allocate
> >> > enough pages to back the whole bo->size it gives you?  So, if you
> write
> >> > bytes 0..(bo->size - 1), you'll randomly clobber memory in a way
> that's
> >> > really difficult to detect?
> >>
> >> You have that many pages, really.  But you've attached a fence to it, so
> >> your allocated pages are structured as:
> >>
> >> +---+---+---+
> >> |   |   |   |
> >> +---+---+---+
> >> |   |   |   |
> >> +---+---+---+
> >> |   |   |   |
> >> +---+---+---+
> >> |   |   |
> >> +---+---+
> >>
> >> (except taller in this specific example).  If you hit the pixels in
> >> those quads, you'll be fine.
> >>
> >> >
> >> > There are other places where we memset an entire BO using bo->size.
>  For
> >> > example, your INTEL_DEBUG=shader_time code does exactly that (though
> it
> >> > isn't tiled).
> >> >
> >> > Could we change libdrm to set bo->size to the actual usable size of
> the
> >> > buffer, rather than the bucket size?
> >>
> >> The pages containing pixels you asked for go to 122880, and the BO is
> >> 131072, but the pixels you asked for have a maximum linear address of
> >> 384*320=115200.  Which value are you thinking is the "actual usable
> >> size"?  We certainly shouldn't have been memsetting more pixels than
> >> 115200.
> >>
> >
> > Why not? I understand that it's not useful to touch pixels beyond 115200,
> > but from the data structure, we were given 131072 bytes to work with. Why
> > shouldn't memsetting the entire allocated space be a safe operation?
>
> If you drm_intel_bo_map() and write 131072, you'd be fine.  If you
> drm_intel_bo_map_gtt() on the tiled buffer and your fence readdresses
> your writes beyond 131072, you're not fine.
>

I'm curious, what would it have cost to reserve the pages necessary to
cover both cases?

The issue caused by this particular overwrite was hard to pin down. In our
test case the failure was intermittent. We could see that memory was
getting corrupted but nothing else had run between the last time we checked
and things were good and when they went bad (building the dlist in
glxgears).

-- 
Courtney Goeltzenleuchter
LunarG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140415/b9229c86/attachment-0001.html>