[PATCH v6 04/10] gpu: host1x: Remove cancelled waiters immediately

Mon Mar 29 20:27:39 UTC 2021

29.03.2021 16:38, Mikko Perttunen пишет:
> Before this patch, cancelled waiters would only be cleaned up
> once their threshold value was reached. Make host1x_intr_put_ref
> process the cancellation immediately to fix this.
> 
> Signed-off-by: Mikko Perttunen <mperttunen at nvidia.com>
> ---
> v6:
> * Call schedule instead of cpu_relax while waiting for pending
>   interrupt processing
> v5:
> * Add parameter to flush, i.e. wait for all pending waiters to
>   complete before returning. The reason this is not always true
>   is that the pending waiter might be the place that is calling
>   the put_ref.
> ---
>  drivers/gpu/host1x/intr.c   | 23 +++++++++++++++++------
>  drivers/gpu/host1x/intr.h   |  4 +++-
>  drivers/gpu/host1x/syncpt.c |  2 +-
>  3 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
> index 9245add23b5d..69b0e8e41466 100644
> --- a/drivers/gpu/host1x/intr.c
> +++ b/drivers/gpu/host1x/intr.c
> @@ -242,18 +242,29 @@ int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
>  	return 0;
>  }
>  
> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> +			 bool flush)
>  {
>  	struct host1x_waitlist *waiter = ref;
>  	struct host1x_syncpt *syncpt;
>  
> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> -	       WLS_REMOVED)
> -		schedule();
> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>  
>  	syncpt = host->syncpt + id;
> -	(void)process_wait_list(host, syncpt,
> -				host1x_syncpt_load(host->syncpt + id));
> +
> +	spin_lock(&syncpt->intr.lock);
> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> +	    WLS_CANCELLED) {
> +		list_del(&waiter->list);
> +		kref_put(&waiter->refcount, waiter_release);
> +	}
> +	spin_unlock(&syncpt->intr.lock);

Looks like we need to use IRQ-safe version of the locking here in order
not to race with the interrupt handler(?), preventing lockup.

But what real bug is fixed by this patch? If no real problem is fixed,
then maybe will be better to defer touching this code till we will just
replace it all with a proper dma-fence handlers?