[Intel-gfx] sna: buffer overrun
Mark Kettenis
mark.kettenis at xs4all.nl
Sun Nov 3 23:06:55 CET 2013
> Date: Sun, 3 Nov 2013 19:43:48 +0000
> From: Chris Wilson <chris at chris-wilson.co.uk>
>
> On Sun, Nov 03, 2013 at 01:22:52PM +0100, Mark Kettenis wrote:
> > I ran into a "regression" in xf86-video-intel master. X would spin
> > for several seconds and eventually I'd see a message like:
> >
> > [ 170.724] kgem_bo_write: failed to write 3600 bytes into BO handle=175: 14
> >
> > in Xorg.0.log
> >
> > Bisected it down to the following commit:
> >
> >
> > commit 4f41bf3de059c4e0a03fb161fb2e78d94be69e3f
> > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > Date: Tue Oct 29 09:56:10 2013 +0000
> >
> > sna: Try harder to complete writes
> >
> > Expunge our caches if we fail to write into a bo (presuming that
> > allocation failure is the likely fixable cause).
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >
> >
> > It's obviously trying really really hard to write into that bo now ;).
> > But failing eventually anyway. So I started digging deeper. The
> > pwrite is failing with EFAULT. Some kernel debugging revealed that
> > the fault happens when copying data from user space. Adding some
> > debugging printf's shows that
> >
> > pwrite.data_ptr = 0x1cc19d9831f0
> > pwrite.size = 3648
> >
> > and that the fault happens at address 0x1cc19d984000. This is the
> > same 3600 byte buffer from the message. Obviously pwrite is reading
> > beyond the end of the buffer, running into the next page, which isn't
> > there.
> >
> > Now I'm seeing this on OpenBSD. I'm guessing this is actually a
> > malloc()'ed buffer. And on OpenBSD malloc() is extremely nasty. It
> > tries to align the allocated space such that the end lies right at the
> > end of a page and inserts a guard page. It does this to catch buffer
> > overruns like this. You're much less likely to hit something like
> > this on Linux, unless you're using a special debug malloc library.
> >
> > This problem was introduced with the following commit:
> >
> >
> > commit 95f4da647a4055545b09cae0834df0fa2127a458
> > Author: Chris Wilson <chris at chris-wilson.co.uk>
> > Date: Wed Nov 30 11:59:31 2011 +0000
> >
> > sna: Align pwrite to transfer whole cachelines
> >
> > Daniel claims that this is will be faster, or will be once he has
> > completed rewriting pwrite!
> >
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >
> >
> > I fear that optimization simply isn't safe.
>
> It was. All the local allocations are page-aligned, the issue are the
> couple of places where we upload from other buffers.
>
> diff --git a/src/sna/kgem.c b/src/sna/kgem.c
> index 264a379..1217367 100644
> --- a/src/sna/kgem.c
> +++ b/src/sna/kgem.c
> @@ -482,7 +482,7 @@ bool kgem_bo_write(struct kgem *kgem, struct kgem_bo *bo,
>
> assert(length <= bytes(bo));
> retry:
> - if (gem_write(kgem->fd, bo->handle, 0, length, data)) {
> + if (__gem_write(kgem->fd, bo->handle, 0, length, data)) {
> int err = errno;
>
> assert(err != EINVAL);
Fixed it the same way here, and things seem stable enough. So I guess
that's a:
Tested-by: Mark Kettenis <kettenis at openbsd.org>
Reviewed-by: Mark Kettenis <kettenis at openbsd.org>
Thanks,
Mark
More information about the Intel-gfx
mailing list