[PATCH v4] drm/xe/ufence: Signal ufence faster when possible

Cavitt, Jonathan jonathan.cavitt at intel.com
Tue Oct 22 16:06:12 UTC 2024


-----Original Message-----
From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of Nirmoy Das
Sent: Friday, October 18, 2024 8:30 AM
To: intel-xe at lists.freedesktop.org
Cc: Das, Nirmoy <nirmoy.das at intel.com>; Auld, Matthew <matthew.auld at intel.com>; Brost, Matthew <matthew.brost at intel.com>
Subject: [PATCH v4] drm/xe/ufence: Signal ufence faster when possible
> 
> When the backing fence is already signaled, the ufence can be
> immediately signaled without queuing in the ordered work queue.
> This should also reduce load from the xe ordered_wq and won't
> block signaling a ufence which doesn't require any serialization.
> 
> v2: fix system_wq typo
> v3: signal immediately instead of queuing in system_wq (Matt B)
> v4: revert back to v2 of using workqueue because of locking issue
>     and remote viewing a different mm struct.
>     Use Xe's unordered_wq which should be less congested than global
>     one.
> 
> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1630
> Cc: Matthew Auld <matthew.auld at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>

I don't see anything immediately wrong with this, though given that
V3 had to be reverted, I'd wait for approval from Matt Brost before
proceeding to make sure the revert reason is amenable.

Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
-Jonathan Cavitt

> ---
>  drivers/gpu/drm/xe/xe_sync.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_sync.c b/drivers/gpu/drm/xe/xe_sync.c
> index a90480c6aecf..7a1558c7ce09 100644
> --- a/drivers/gpu/drm/xe/xe_sync.c
> +++ b/drivers/gpu/drm/xe/xe_sync.c
> @@ -92,18 +92,27 @@ static void user_fence_worker(struct work_struct *w)
>  	user_fence_put(ufence);
>  }
>  
> -static void kick_ufence(struct xe_user_fence *ufence, struct dma_fence *fence)
> +static void kick_ufence_ordered(struct xe_user_fence *ufence,
> +				struct dma_fence *fence)
>  {
>  	INIT_WORK(&ufence->worker, user_fence_worker);
>  	queue_work(ufence->xe->ordered_wq, &ufence->worker);
>  	dma_fence_put(fence);
>  }
>  
> +static void kick_ufence_unordered(struct xe_user_fence *ufence,
> +				  struct dma_fence *fence)
> +{
> +	INIT_WORK(&ufence->worker, user_fence_worker);
> +	queue_work(ufence->xe->unordered_wq, &ufence->worker);
> +	dma_fence_put(fence);
> +}
> +
>  static void user_fence_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
>  {
>  	struct xe_user_fence *ufence = container_of(cb, struct xe_user_fence, cb);
>  
> -	kick_ufence(ufence, fence);
> +	kick_ufence_ordered(ufence, fence);
>  }
>  
>  int xe_sync_entry_parse(struct xe_device *xe, struct xe_file *xef,
> @@ -239,7 +248,16 @@ void xe_sync_entry_signal(struct xe_sync_entry *sync, struct dma_fence *fence)
>  		err = dma_fence_add_callback(fence, &sync->ufence->cb,
>  					     user_fence_cb);
>  		if (err == -ENOENT) {
> -			kick_ufence(sync->ufence, fence);
> +			/*
> +			 * use unordered_wq to schedule it faster and to keep
> +			 * the ordered_wq less loaded as serialization is not
> +			 * needed for when the fence is already signaled.
> +			 *
> +			 * This needs to be done with a wq here to avoid locking
> +			 * issue when a ufence addr is backed by a bo and also
> +			 * tsk->mm needs to null to call kthread_use_mm().
> +			 */
> +			kick_ufence_unordered(sync->ufence, fence);
>  		} else if (err) {
>  			XE_WARN_ON("failed to add user fence");
>  			user_fence_put(sync->ufence);
> -- 
> 2.46.0
> 
> 


More information about the Intel-xe mailing list