[PATCH v6 04/10] gpu: host1x: Remove cancelled waiters immediately

Mon Mar 29 21:36:11 UTC 2021

On 3/29/21 11:27 PM, Dmitry Osipenko wrote:
> 29.03.2021 16:38, Mikko Perttunen пишет:
>> Before this patch, cancelled waiters would only be cleaned up
>> once their threshold value was reached. Make host1x_intr_put_ref
>> process the cancellation immediately to fix this.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen at nvidia.com>
>> ---
>> v6:
>> * Call schedule instead of cpu_relax while waiting for pending
>>    interrupt processing
>> v5:
>> * Add parameter to flush, i.e. wait for all pending waiters to
>>    complete before returning. The reason this is not always true
>>    is that the pending waiter might be the place that is calling
>>    the put_ref.
>> ---
>>   drivers/gpu/host1x/intr.c   | 23 +++++++++++++++++------
>>   drivers/gpu/host1x/intr.h   |  4 +++-
>>   drivers/gpu/host1x/syncpt.c |  2 +-
>>   3 files changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
>> index 9245add23b5d..69b0e8e41466 100644
>> --- a/drivers/gpu/host1x/intr.c
>> +++ b/drivers/gpu/host1x/intr.c
>> @@ -242,18 +242,29 @@ int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
>>   	return 0;
>>   }
>>   
>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
>> +			 bool flush)
>>   {
>>   	struct host1x_waitlist *waiter = ref;
>>   	struct host1x_syncpt *syncpt;
>>   
>> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
>> -	       WLS_REMOVED)
>> -		schedule();
>> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>   
>>   	syncpt = host->syncpt + id;
>> -	(void)process_wait_list(host, syncpt,
>> -				host1x_syncpt_load(host->syncpt + id));
>> +
>> +	spin_lock(&syncpt->intr.lock);
>> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>> +	    WLS_CANCELLED) {
>> +		list_del(&waiter->list);
>> +		kref_put(&waiter->refcount, waiter_release);
>> +	}
>> +	spin_unlock(&syncpt->intr.lock);
> 
> Looks like we need to use IRQ-safe version of the locking here in order
> not to race with the interrupt handler(?), preventing lockup.

The potential contention is with the syncpt_thresh_work scheduled work, 
and not the actual interrupt handler, so there is no issue.

> 
> But what real bug is fixed by this patch? If no real problem is fixed,
> then maybe will be better to defer touching this code till we will just
> replace it all with a proper dma-fence handlers?
> 

It improves things in that we won't litter the waiter data structures 
with unbounded waiter entries when waits are cancelled. Also, I prefer 
working in steps when possible - next is writing dma_fences on top of 
this (which is already done) and then eventually phasing/refactoring 
code from intr.c to fence.c so eventually only dma_fences remain. In my 
experience that works better than big rewrites.

Mikko