[PATCH v6 04/10] gpu: host1x: Remove cancelled waiters immediately

Tue Mar 30 01:35:30 UTC 2021

30.03.2021 00:36, Mikko Perttunen пишет:
> On 3/29/21 11:27 PM, Dmitry Osipenko wrote:
>> 29.03.2021 16:38, Mikko Perttunen пишет:
>>> Before this patch, cancelled waiters would only be cleaned up
>>> once their threshold value was reached. Make host1x_intr_put_ref
>>> process the cancellation immediately to fix this.
>>>
>>> Signed-off-by: Mikko Perttunen <mperttunen at nvidia.com>
>>> ---
>>> v6:
>>> * Call schedule instead of cpu_relax while waiting for pending
>>>    interrupt processing
>>> v5:
>>> * Add parameter to flush, i.e. wait for all pending waiters to
>>>    complete before returning. The reason this is not always true
>>>    is that the pending waiter might be the place that is calling
>>>    the put_ref.
>>> ---
>>>   drivers/gpu/host1x/intr.c   | 23 +++++++++++++++++------
>>>   drivers/gpu/host1x/intr.h   |  4 +++-
>>>   drivers/gpu/host1x/syncpt.c |  2 +-
>>>   3 files changed, 21 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
>>> index 9245add23b5d..69b0e8e41466 100644
>>> --- a/drivers/gpu/host1x/intr.c
>>> +++ b/drivers/gpu/host1x/intr.c
>>> @@ -242,18 +242,29 @@ int host1x_intr_add_action(struct host1x *host,
>>> struct host1x_syncpt *syncpt,
>>>       return 0;
>>>   }
>>>   -void host1x_intr_put_ref(struct host1x *host, unsigned int id,
>>> void *ref)
>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref,
>>> +             bool flush)
>>>   {
>>>       struct host1x_waitlist *waiter = ref;
>>>       struct host1x_syncpt *syncpt;
>>>   -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>> WLS_CANCELLED) ==
>>> -           WLS_REMOVED)
>>> -        schedule();
>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>         syncpt = host->syncpt + id;
>>> -    (void)process_wait_list(host, syncpt,
>>> -                host1x_syncpt_load(host->syncpt + id));
>>> +
>>> +    spin_lock(&syncpt->intr.lock);
>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>> +        WLS_CANCELLED) {
>>> +        list_del(&waiter->list);
>>> +        kref_put(&waiter->refcount, waiter_release);
>>> +    }
>>> +    spin_unlock(&syncpt->intr.lock);
>>
>> Looks like we need to use IRQ-safe version of the locking here in order
>> not to race with the interrupt handler(?), preventing lockup.
> 
> The potential contention is with the syncpt_thresh_work scheduled work,
> and not the actual interrupt handler, so there is no issue.

I see now, thanks.

>> But what real bug is fixed by this patch? If no real problem is fixed,
>> then maybe will be better to defer touching this code till we will just
>> replace it all with a proper dma-fence handlers?
>>
> 
> It improves things in that we won't litter the waiter data structures
> with unbounded waiter entries when waits are cancelled. Also, I prefer
> working in steps when possible - next is writing dma_fences on top of
> this (which is already done) and then eventually phasing/refactoring
> code from intr.c to fence.c so eventually only dma_fences remain. In my
> experience that works better than big rewrites.

So this change is a cleanup and not a bugfix, which wasn't clear from
the commit description. In my experience it usually tends to help with a
review if commit message explicitly states whether it is a minor cleanup
or a critical bugfix.

The small cleanups should be okay. It could be better if you could
explicitly separate the fixes from cleanups since there is a better
chance that fixes will be picked up immediately.

I'd also suggest to try to group the cleanup changes with the new
features that benefit from them, where possible. This should make
patches to look more logical, not like it's some random change, helping
with a review.

I'll try to give a test to this series later today.