[Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

Luca Barbieri luca at luca-barbieri.com
Wed Jan 20 11:22:11 PST 2010


Yes it's fine. I sent your patch to Dave with an expanded commit
comment for merging.

Here is a possible redesign of the mechanism inspired by this issue.
It seems that what we are racing against is buffer eviction, due to
delayed deletion buffers being still kept on the LRU list.

I'm wondering if the delayed deletion design could be changed as follows:
1. Remove to-be-deleted buffers from the LRU list before putting them
on delayed delete
2. Change buffer eviction to first do a delayed deletion pass. This
should be cheap (and cheaper than the current code) because delayed
deletion stops at the first unsignaled fence.
3. Add a new "delayed deletion" lock/semaphore. Then, have
ttm_bo_delayed_delete take it for reading and keep it across the
function.
4. Inside the delayed deletion lock, grab the LRU lock, copy the
delayed delete list head to a local variable, set it to empty and
release the lock.
5. Iterate on the privately held list with list_for_each_safe
6. At the end of ttm_bo_delayed_delete, retake the LRU lock and readd
the remaining part of our private list at the head of the global list

This would prevent uselessly trying to evict delayed-delete buffers
(which should be processed in fence order and not LRU order), and also
prevent concurrent delayed deletions, which should be more harmful
than useful.

Furthermore, it should be possible to get rid of list locking in the
following way:
1. BOs to be delayed-deleted are added to the head of the initial
delayed deletion single linked list, using atomic cmpxchg
2. ttm_bo_delayed_delete takes the delayed deletion lock and grabs the
list with an atomic xchg of the head with zero
3. It reverses the list in place, processes the entries and puts them
at the end of a second single linked list, the recurring delayed
deletion list
4. It processes the recurring delayed deletion list, cleaning up the BOs
5. Finally, the delayed deletion lock is released

This makes adding to the delayed deletion list lockless.

The LRU list instead inherently needs to be doubly linked, so only RCU
could make it lockless, and it seems that may require using an
external list node structure (so readers don't suddenly jump to the
most recent node), and that would not be a win (except with *lots* of
CPUs).
Besides, most DRM drivers (except vmware) are taking the BKL around
all ioctls and (except nouveau) use a single pushbuffer, so this is a
bit moot anyway.

What do you think?

Anyway, this, if done, would be for the next merge window, or later,
while the current fix ought to be merged now.


More information about the Nouveau mailing list