[Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

Wed Jan 20 12:45:39 PST 2010

> Also note that the delayed delete list is not in fence order but in
> deletion-time order, which perhaps gives room for more optimizations.
You are right.
I think then that ttm_bo_delayed_delete may still need to be changed,
because it stops when ttm_bo_cleanup_refs returns -EBUSY, which
happens when a fence has not been reached.
This means that a buffer will need to wait for all previously deleted
buffers to become unused, even if it is unused itself.
Is this acceptable?

What if we get rid of the delayed destroy list, and instead append
buffers to be deleted to their fence object, and delete them when the
fence is signaled?

This also allows to do it more naturally, since the fence object can
just keep a normal reference to the buffers it fences, and unreference
them on expiration.

Then there needs to be no special "delayed destruction" logic, and it
would work as if the GPU were keeping a reference to the buffer
itself, using fences as a proxy to have the CPU do that work for the
GPU.

Then the delayed work is no longer "periodically destroy buffers" but
rather "periodically check if fences are expired", naturally stopping
at the first unexpired one.
Drivers that support IRQs on fences could also do the work in the
interrupt handler/tasklet instead, avoid the delay jiffies magic
number. This may need a NAPI-like interrupt mitigation middle layer
for optimal results though.

> But isn't an atomic cmpxchg about as costly as a spinlock?
I think it's cheaper on all architectures, as otherwise it would be
mostly pointless to have it, since you can emulate it with a spinlock.