[PATCH 1/6] drm/ttm: remove lru_lock around ttm_bo_reserve

Thu Dec 6 01:55:53 PST 2012

Op 06-12-12 02:19, Jerome Glisse schreef:
> On Fri, Nov 30, 2012 at 7:12 AM, Maarten Lankhorst
> <m.b.lankhorst at gmail.com> wrote:
>> There should no longer be assumptions that reserve will always succeed
>> with the lru lock held, so we can safely break the whole atomic
>> reserve/lru thing. As a bonus this fixes most lockdep annotations for
>> reservations.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>
>> ---
>>  drivers/gpu/drm/ttm/ttm_bo.c           | 54 ++++++++++++++++++++++------------
>>  drivers/gpu/drm/ttm/ttm_execbuf_util.c |  2 +-
>>  include/drm/ttm/ttm_bo_driver.h        | 19 +++---------
>>  3 files changed, 40 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index 9028327..61b5cd0 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -213,14 +213,13 @@ int ttm_bo_del_from_lru(struct ttm_buffer_object *bo)
>>         return put_count;
>>  }
>>
>> -int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
>> +int ttm_bo_reserve_nolru(struct ttm_buffer_object *bo,
>>                           bool interruptible,
>>                           bool no_wait, bool use_sequence, uint32_t sequence)
>>  {
>> -       struct ttm_bo_global *glob = bo->glob;
>>         int ret;
>>
>> -       while (unlikely(atomic_read(&bo->reserved) != 0)) {
>> +       while (unlikely(atomic_xchg(&bo->reserved, 1) != 0)) {
>>                 /**
>>                  * Deadlock avoidance for multi-bo reserving.
>>                  */
> Regarding memory barrier discussion we could add a smp_rdmb here or
> few line below before the read of sequence for -EAGAIN. But it's not
> really important, if a CPU doesn't see new sequence value the worst
> case is that the CPU will go to wait again before reaching back this
> section and returning EAGAIN. So it's just gonna waste some CPU cycle
> i can't see anything bad that could happen.
Ah indeed, no bad thing can happen from what I can tell.

I looked at Documentation/atomic_ops.txt again, and it says:

"If a caller requires memory barrier semantics around an atomic_t
operation which does not return a value, opsa set of interfaces are
defined which accomplish this:"

Since we check the value of atomic_xchg, I think that means memory barriers are implied,
strengthened by the fact that the arch specific xchg implementations I checked (x86 and sparc) clobber
memory, which definitely implies a compiler barrier at least, that should be good enough here. Parisc
has no native xchg op and falls back to using spin_lock_irqrestore, so it would definitely be good enough
there.

>> @@ -241,26 +240,36 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
>>                 if (no_wait)
>>                         return -EBUSY;
>>
>> -               spin_unlock(&glob->lru_lock);
>>                 ret = ttm_bo_wait_unreserved(bo, interruptible);
>> -               spin_lock(&glob->lru_lock);
>>
>>                 if (unlikely(ret))
>>                         return ret;
>>         }
>>
>> -       atomic_set(&bo->reserved, 1);
>>         if (use_sequence) {
>> +               bool wake_up = false;
>>                 /**
>>                  * Wake up waiters that may need to recheck for deadlock,
>>                  * if we decreased the sequence number.
>>                  */
>>                 if (unlikely((bo->val_seq - sequence < (1 << 31))
>>                              || !bo->seq_valid))
>> -                       wake_up_all(&bo->event_queue);
>> +                       wake_up = true;
>>
>> +               /*
>> +                * In the worst case with memory ordering these values can be
>> +                * seen in the wrong order. However since we call wake_up_all
>> +                * in that case, this will hopefully not pose a problem,
>> +                * and the worst case would only cause someone to accidentally
>> +                * hit -EAGAIN in ttm_bo_reserve when they see old value of
>> +                * val_seq. However this would only happen if seq_valid was
>> +                * written before val_seq was, and just means some slightly
>> +                * increased cpu usage
>> +                */
>>                 bo->val_seq = sequence;
>>                 bo->seq_valid = true;
> If we want we could add smp_wrmb here but see above comment on
> usefullness of this.
Yeah would be overkill. The wake_up_all takes an irqoff spinlock, which would implicitly
count as full memory barrier, and the lru lock will always be taken afterwards, which is
definitely a full memory barrier.

~Maarten