buggy/weird behavior in ttm

Thu Oct 11 11:42:15 PDT 2012

Op 11-10-12 18:57, Thomas Hellstrom schreef:
> On 10/11/2012 04:50 PM, Maarten Lankhorst wrote:
>> I was trying to clean ttm up a little so my changes would be less invasive, and simplify
>> the code for debuggability. During testing I noticed the following weirdnesses:
>> - ttm_mem_evict_first ignores no_wait_gpu if the buffer is on the ddestroy list.
>>    If you follow the code, it will effectively spin in ttm_mem_evict_first if a bo
>>    is on the list and no_wait_gpu is true.
>
> Yes, you're right. This is a bug. At a first glance it looks like the code should
> return unconditionally after kref_put().
>
>
>>
>> I was working on a commit that removes fence_lock since I was killing off the
>> fence lock, but that requires some kind of defined behavior for this. Unless
>> we leave this in place as expected behavior..
>
> I don't quite follow you? If you mean defined behavior for the fence lock it is
> that it protects the bo::sync_obj and bo::sync_obj_arg of *all* buffers. It was
> prevously one lock per buffer. The locking order is that it should be taken before
> the lru lock, but looking at the code it seems it could be quite simplified the other
> way around...
I mean that fence_lock can be killed off if you put slightly more care in
ttm_bo_cleanup_refs and ttm_bo_cleanup_refs_or_queue. Those are the
only 2 functions in my tree that seem to call ttm_bo_wait
without a reservation, so I fixed those up instead.

In ttm_bo_cleanup_refs_or_queue you can simply change the order around,
and abort reservation if ttm_bo_wait fails nonblockingly, without touching lru lists.

ttm_bo_cleanup_refs is slightly trickier, but to preserve current behavior I just did this:
Take a reservation first, try ttm_bo_wait without blocking, if it returns -EBUSY
and no_wait_gpu is unset, take the fence pointer then undo the reservation
without touching the lru lists. If it doesn't return -EBUSY, wait succeeded and we
should retry the function from the start.

So maybe slightly more complexity in ttm_bo_cleanup_refs (not really, the function
was doing behavior similar to this ANYWAY, I just moved things around) and an entire
spinlock gone. :)

> Anyway, if you plan to remove the fence lock and protect it with reserve, you must
> make sure that a waiting reserve is never done in a destruction path. I think this
> mostly concerns the nvidia driver.
Well I don't think any lock should ever be held during destruction time,

>>
>> - no_wait_reserve is ignored if no_wait_gpu is false
>>    ttm_bo_reserve_locked can only return true if no_wait_reserve is true, but
>>    subsequently it will do a wait_unreserved if no_wait_gpu is false.
>> I'm planning on removing this argument and act like it is always true, since
>> nothing on the lru list should fail to reserve currently.
>
> Yes, since all buffers that are reserved are removed from the LRU list, there
> should never be a waiting reserve on them, so no_wait_reserve can be removed
> from ttm_mem_evict_first, ttm_bo_evict and possibly other functions in the call chain.
I suppose there will stay a small race though,

>>
>> - effectively unlimited callchain between some functions that all go through
>>    ttm_mem_evict_first:
>>
>>                                      /------------------------\
>> ttm_mem_evict_first - ttm_bo_evict -                          -ttm_bo_mem_space  - ttm_bo_mem_force_space - ttm_mem_evict_first
>>                                      \ ttm_bo_handle_move_mem /
>> I'm not surprised that there was a deadlock before, it seems to me it would
>> be pretty suicidal to ever do a blocking reserve on any of those lists,
>> lockdep would be all over you for this.
>
> Well, at first this may look worse than it actually is. The driver's eviction memory order determines the recursion depth
> and typically it's 0 or 1, since subsequent ttm_mem_evict_first should never touch the same LRU lists as the first one.
> What would typically happen is that a BO is evicted from VRAM to TT, and if there is no space in TT, another BO is evicted
> to system memory, and the chain is terminated. However a driver could set up any eviction order but that would be
> a BUG.
>
> But in essence, as you say, even with a small recursion depth, a waiting reserve could cause a deadlock.
> But there should be no waiting reserves in the eviction path currently.
Partially true, ttm_bo_cleanup_refs is currently capable of blocking reserve.
Fixing this might mean that ttm_mem_evict_first may need to become more aggressive,
since it seems all the callers of this function assume that ttm_mem_evict_first can only fail
if there is really nothing more to free and blocking nested would really upset lockdep
and leave you open to the same deadlocks.

An unexpected bonus from adding this skipping would be that the atomic lru_list
removal on unreserve guarantee becomes a lot easier to drop while keeping behavior
exactly the same. Only the swapout code would need to be adjusted for to try the whole
list. Maybe ttm_bo_delayed_delete with remove_all = false too would change behavior
slightly too, but this seems to be less likely, as it could only ever happen on delayed
destruction of imported dma-buf's.

I'm still mentally debugging the code so there's no final version yet, I'm trying to ensure
that the changes would not alter existing behavior.

>> Also it seems ttm_bo_move_ttm, ttm_bo_move_memcpy and ttm_bo_move_accel_cleanup
>> don't use some of their arguments, so could those be dropped?
>
> Yes, but they are designed so that they could be plugged into the driver::move interface, so
> if you change the argument list you should do it on driver::move as well, and
> make sure they all have the same argument list.
Not true, ttm_bo_move_memcpy has no interruptible parameter, so it's not compatible with driver.move.
The drivers that do use this function have set up their own alias that calls that function instead,
so I was thinking of dropping those parameters from memcpy since they're unused and the drivers
that could point their move member to it don't do it, so no point in having compatibility...

~Maarten