[PATCH 4/6] drm/ttm: use ttm_bo_reserve_slowpath_nolru in ttm_eu_reserve_buffers

Thu Dec 6 07:38:03 PST 2012

On Thu, Dec 6, 2012 at 5:52 AM, Maarten Lankhorst
<maarten.lankhorst at canonical.com> wrote:
> Op 06-12-12 02:36, Jerome Glisse schreef:
>> On Fri, Nov 30, 2012 at 7:12 AM, Maarten Lankhorst
>> <m.b.lankhorst at gmail.com> wrote:
>>> This requires re-use of the seqno, which increases fairness slightly.
>>> Instead of spinning with a new seqno every time we keep the current one,
>>> but still drop all other reservations we hold. Only when we succeed,
>>> we try to get back our other reservations again.
>>>
>>> This should increase fairness slightly as well.
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at canonical.com>
>>> ---
>>>  drivers/gpu/drm/ttm/ttm_execbuf_util.c | 14 ++++++++++++--
>>>  1 file changed, 12 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
>>> index c7d3236..c02b2b6 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
>>> @@ -129,13 +129,17 @@ int ttm_eu_reserve_buffers(struct list_head *list)
>>>         entry = list_first_entry(list, struct ttm_validate_buffer, head);
>>>         glob = entry->bo->glob;
>>>
>>> -retry:
>>>         spin_lock(&glob->lru_lock);
>>>         val_seq = entry->bo->bdev->val_seq++;
>>>
>>> +retry:
>> After more thinking while i was going over patches to send comment i
>> think this one is bad, really bad. So here is bad case i see, we have
>> a list of 2 bo A & B. We succeed reserving A we set seq_valid to true
>> and val_seq to our val_seq. We fail on B, we go slow path and
>> unreserve A. Some other CPU reserve A but before it write either
>> seq_valid or val_seq we wakeup reserve B and retry to reserve A in
>> reserve_nolru we see A as reserved we enter the while loop see
>> seq_valid set and test val_seq against our same old val_seq we got ==
>> and return -EDEADLCK which is definitly not what we want to do. If we
>> inc val seq on retry we will never have this case. So what exactly not
>> incrementing it gave us ?
> Hm, that would indeed be a valid problem.
>
> Annoyingly, I can't do the conversion to mutexes until the end of the patchset.
> I solved this in the ticket mutexes by making the seqno an atomic, which is unset before unlocking.
> But doing this with the current ttm code would force extra wake_up_all's inside the reserve path,
> since we lack the slowpath handling mutexes have.
>
> Nouveau would run into the same problems already with patch 1, so perhaps we could disable
> the -EDEADLK handling and sleep in that case for now? Afaict, it's a driver error if -EDEADLK
> occurs.
>
> If performance is unimportant unset seq_valid before unreserving,
> and add a smp_wmb between seqno and seqno_valid assignments.
> wake_up_all() would be called unconditionally during reservation then, but that call is expensive..
>
>> I think the read and write memory barrier i was talking in patch 1
>> should avoid any such things to happen but i need to sleep on that to
>> make nightmare about all things that can go wrong.
> No it wouldn't help, still a race immediately after the xchg call. :-(
>
> ~Maarten

Then just keep the retry label where it's, it doesn't change the
fundamental of that patch and it avoids the EDEADLK scenario.

Cheers,
Jerome