[Intel-gfx] [PATCH] drm/i915: Do an optimistic is-busy? check first

Thu Sep 5 18:37:42 CEST 2013

On Thu, Sep 5, 2013 at 6:30 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> It also a confusion that the kernel can't prevent.
>
> lock_mutex
> A checks bo, reports it idle
> unlock_mutex
>
> lock_mutex
> B renders to bo
> unlock_mutex
>
> lock_mutex
> A uses bo, stalls
> unlock_mutex
>
> Whether or not the checking of the bo is locked is irrelevant as it can
> be gazzumped at anytime between the check and the use.

Nope, I'm talking about a different kind of confusion.

Object A is busy on the RCS with seqno 1.
Some other guy submits a bit of work to the blitter with seqno 2.
Blitter finishes work, so signalled seqno is 2, RCS is still busy.

dri client sends a buffer swap request to the display server with object A.

Display server does pageflip/blit/whatever, just something which will
force the kernel to move object A to the ring. After that object A is
busy on the BLT with seqno 3.

Concurrently our dri client runs the busy ioctl and reads ring ==
blitter, seqno == 1 and concludes that the object not busy. And this
can happen while the RCS hasn't even finished rendering the original
request from the client for object A. And that broken and will be
prevented by locking.

So the scenario I'm talking about is not the client racing the busy
against _new_ command submission, but the kernel lying to the client
about the completion of old commands which have been submitted all
from the same thread context. We've already had a similar bug for the
last_write_seqno where we updated the ring but used the old seqno,
resulting in mayhem.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch