[RFC v1 8/9] drm/xe/hw_engine_group: Resume LR exec queues suspended by dma fence jobs

Wed Jul 17 23:03:35 UTC 2024

On Wed, Jul 17, 2024 at 03:07:29PM +0200, Francois Dugast wrote:
> Submission of a dma fence job leads to suspending the long running exec
> queues of the hw engine group. Work is queued in the resume worker for
> this group and execution is resumed on the attached exec queues in long
> running mode.
> 
> This is another entry point for execution on the hw engine group so the
> execution mode is updated.
> 
> Signed-off-by: Francois Dugast <francois.dugast at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_hw_engine.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
> index e6c755a04fd8..dd8ef65cbf2d 100644
> --- a/drivers/gpu/drm/xe/xe_hw_engine.c
> +++ b/drivers/gpu/drm/xe/xe_hw_engine.c
> @@ -431,6 +431,26 @@ hw_engine_setup_default_state(struct xe_hw_engine *hwe)
>  	xe_rtp_process_to_sr(&ctx, engine_entries, &hwe->reg_sr);
>  }
>  
> +static void hw_engine_group_resume_work_func(struct work_struct *w)
> +{
> +	struct xe_exec_queue *q;
> +	struct xe_hw_engine_group *group = container_of(w, struct xe_hw_engine_group, resume_work);
> +	int err;
> +
> +	err = xe_hw_engine_group_get_mode(group, EXEC_MODE_LR);
> +	if (err)
> +		return;
> +

I think techincally if we were previously in EXEC_MODE_LR the resume
loop can be skipped. It is harmless though to call resume twice though.

> +	list_for_each_entry(q, &group->exec_queue_list, hw_engine_group_link) {
> +		if (!xe_vm_in_lr_mode(q->vm))
> +			continue;
> +
> +		q->ops->resume(q);
> +	}
> +
> +	xe_hw_engine_group_put(group);
> +}
> +
>  static struct xe_hw_engine_group *
>  hw_engine_group_alloc(struct xe_device *xe)
>  {
> @@ -438,6 +458,8 @@ hw_engine_group_alloc(struct xe_device *xe)
>  
>  	group = kzalloc(sizeof(*group), GFP_KERNEL);
>  	init_rwsem(&group->mode_sem);
> +	group->resume_wq = alloc_ordered_workqueue("xe-resume-lr-jobs-wq", 0);

I think just 1 non-ordered workqueue for all resumes should be fine. You
will also need to destroy this via drmm on driver unload. 

> +	INIT_WORK(&group->resume_work, hw_engine_group_resume_work_func);
>  	INIT_LIST_HEAD(&group->exec_queue_list);
>  
>  	return group;
> @@ -1235,6 +1257,7 @@ static int xe_hw_engine_group_suspend_lr_jobs(struct xe_hw_engine_group *group)
>  {
>  	int err;
>  	struct xe_exec_queue *q;
> +	bool suspended_queue = false;
>

I think you call this function, it implied that at 1 exec queue is in LR
mode, right?

>  	lockdep_assert_held(&group->mode_sem);
>  
> @@ -1247,8 +1270,13 @@ static int xe_hw_engine_group_suspend_lr_jobs(struct xe_hw_engine_group *group)
>  			return err;
>  
>  		q->ops->suspend_wait(q);
> +
> +		suspended_queue = true;
>  	}
>  
> +	if (suspended_queue)

With above, maybe xe_assert(suspended_queue);

Matt

> +		queue_work(group->resume_wq, &group->resume_work);
> +
>  	return 0;
>  }
>  
> -- 
> 2.43.0
>