[Intel-gfx] sna: buffer overrun

Mark Kettenis kettenis at openbsd.org
Sun Nov 3 13:22:52 CET 2013


I ran into a "regression" in xf86-video-intel master.  X would spin
for several seconds and eventually I'd see a message like:

[   170.724] kgem_bo_write: failed to write 3600 bytes into BO handle=175: 14

in Xorg.0.log

Bisected it down to the following commit:


commit 4f41bf3de059c4e0a03fb161fb2e78d94be69e3f
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Tue Oct 29 09:56:10 2013 +0000

    sna: Try harder to complete writes
    
    Expunge our caches if we fail to write into a bo (presuming that
    allocation failure is the likely fixable cause).
    
    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>


It's obviously trying really really hard to write into that bo now ;).
But failing eventually anyway.  So I started digging deeper.  The
pwrite is failing with EFAULT.  Some kernel debugging revealed that
the fault happens when copying data from user space.  Adding some
debugging printf's shows that

pwrite.data_ptr = 0x1cc19d9831f0
pwrite.size = 3648

and that the fault happens at address 0x1cc19d984000.  This is the
same 3600 byte buffer from the message.  Obviously pwrite is reading
beyond the end of the buffer, running into the next page, which isn't
there.

Now I'm seeing this on OpenBSD.  I'm guessing this is actually a
malloc()'ed buffer.  And on OpenBSD malloc() is extremely nasty.  It
tries to align the allocated space such that the end lies right at the
end of a page and inserts a guard page.  It does this to catch buffer
overruns like this.  You're much less likely to hit something like
this on Linux, unless you're using a special debug malloc library.

This problem was introduced with the following commit:


commit 95f4da647a4055545b09cae0834df0fa2127a458
Author: Chris Wilson <chris at chris-wilson.co.uk>
Date:   Wed Nov 30 11:59:31 2011 +0000

    sna: Align pwrite to transfer whole cachelines
    
    Daniel claims that this is will be faster, or will be once he has
    completed rewriting pwrite!
    
    Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>


I fear that optimization simply isn't safe.



More information about the Intel-gfx mailing list