[Intel-gfx] sna: buffer overrun
Chris Wilson
chris at chris-wilson.co.uk
Sun Nov 3 20:43:48 CET 2013
On Sun, Nov 03, 2013 at 01:22:52PM +0100, Mark Kettenis wrote:
> I ran into a "regression" in xf86-video-intel master. X would spin
> for several seconds and eventually I'd see a message like:
>
> [ 170.724] kgem_bo_write: failed to write 3600 bytes into BO handle=175: 14
>
> in Xorg.0.log
>
> Bisected it down to the following commit:
>
>
> commit 4f41bf3de059c4e0a03fb161fb2e78d94be69e3f
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date: Tue Oct 29 09:56:10 2013 +0000
>
> sna: Try harder to complete writes
>
> Expunge our caches if we fail to write into a bo (presuming that
> allocation failure is the likely fixable cause).
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>
>
> It's obviously trying really really hard to write into that bo now ;).
> But failing eventually anyway. So I started digging deeper. The
> pwrite is failing with EFAULT. Some kernel debugging revealed that
> the fault happens when copying data from user space. Adding some
> debugging printf's shows that
>
> pwrite.data_ptr = 0x1cc19d9831f0
> pwrite.size = 3648
>
> and that the fault happens at address 0x1cc19d984000. This is the
> same 3600 byte buffer from the message. Obviously pwrite is reading
> beyond the end of the buffer, running into the next page, which isn't
> there.
>
> Now I'm seeing this on OpenBSD. I'm guessing this is actually a
> malloc()'ed buffer. And on OpenBSD malloc() is extremely nasty. It
> tries to align the allocated space such that the end lies right at the
> end of a page and inserts a guard page. It does this to catch buffer
> overruns like this. You're much less likely to hit something like
> this on Linux, unless you're using a special debug malloc library.
>
> This problem was introduced with the following commit:
>
>
> commit 95f4da647a4055545b09cae0834df0fa2127a458
> Author: Chris Wilson <chris at chris-wilson.co.uk>
> Date: Wed Nov 30 11:59:31 2011 +0000
>
> sna: Align pwrite to transfer whole cachelines
>
> Daniel claims that this is will be faster, or will be once he has
> completed rewriting pwrite!
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>
>
> I fear that optimization simply isn't safe.
It was. All the local allocations are page-aligned, the issue are the
couple of places where we upload from other buffers.
diff --git a/src/sna/kgem.c b/src/sna/kgem.c
index 264a379..1217367 100644
--- a/src/sna/kgem.c
+++ b/src/sna/kgem.c
@@ -482,7 +482,7 @@ bool kgem_bo_write(struct kgem *kgem, struct kgem_bo *bo,
assert(length <= bytes(bo));
retry:
- if (gem_write(kgem->fd, bo->handle, 0, length, data)) {
+ if (__gem_write(kgem->fd, bo->handle, 0, length, data)) {
int err = errno;
assert(err != EINVAL);
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list