[PATCH v7 9/9] drm/xe/vf: Rebase exec queue parallel commands during migration recovery

Wed Jul 30 21:54:43 UTC 2025

On Sat, Jul 19, 2025 at 09:37:45AM +0200, Tomasz Lis wrote:
> Parallel exec queues have an additional command streamer buffer which holds
> a GGTT reference to data within context status. The GGTT references have to
> be fixed after VF migration.
> 
> v2: Properly handle nop entry, verify if parsing goes ok
> 
> Signed-off-by: Tomasz Lis <tomasz.lis at intel.com>
> Cc: Michal Winiarski <michal.winiarski at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 43 +++++++++++++++++++++++++++++-
>  1 file changed, 42 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 84bc352f6f12..9eb0fd863218 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -693,6 +693,44 @@ static void wq_item_append(struct xe_exec_queue *q)
>  	parallel_write(xe, map, wq_desc.tail, q->guc->wqi_tail);
>  }
>  
> +static void wq_items_rebase(struct xe_exec_queue *q)

Should we propagate errors up?
And perhaps fail the entire fixup (and wedge the driver)?
And I guess same thing applies to other post-migration fixup places, if
we ever find ourselves in an unexpected situation, with
driver-controlled fields containing undefined values post migration
(like in this case - places where we break out of the loop early, before
reaching wqi_tail).

> +{
> +	struct xe_guc *guc = exec_queue_to_guc(q);
> +	struct xe_device *xe = guc_to_xe(guc);
> +	struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]);
> +	int i = q->guc->wqi_head;
> +
> +	/* the ring starts after a header struct */
> +	iosys_map_incr(&map, offsetof(struct guc_submit_parallel_scratch, wq[0]));
> +
> +	while ((i % WQ_SIZE) != (q->guc->wqi_tail % WQ_SIZE)) {
> +		u32 len_dw, type, val;
> +
> +		if (i < 0 || i > 2 * WQ_SIZE)
> +			break;

Perhaps drm_WARN_ON_ONCE?

> +
> +		val = xe_map_rd_ring_u32(xe, &map, i / sizeof(u32) + 0, WQ_SIZE / sizeof(u32));
> +		len_dw = FIELD_GET(WQ_LEN_MASK, val);
> +		type = FIELD_GET(WQ_TYPE_MASK, val);
> +
> +		if (len_dw >= WQ_SIZE / sizeof(u32))
> +			break;

Same here.

> +
> +		if (type == WQ_TYPE_MULTI_LRC) {
> +			val = xe_lrc_descriptor(q->lrc[0]);
> +			xe_map_wr_ring_u32(xe, &map, i / sizeof(u32) + 1,
> +					   WQ_SIZE / sizeof(u32), val);
> +		} else if (type != WQ_TYPE_NOOP) {
> +			break;

And I guess same thing applies here.
I know that we would print out the error at the end if we don't reach
the tail, but that alone doesn't really tell us what went wrong.

> +		}
> +
> +		i += (len_dw + 1) * sizeof(u32);
> +	}
> +
> +	if (i != q->guc->wqi_tail)

i % WQ_SIZE? We're incrementing "i" beyond WQ_SIZE, but wqi_tail always
stays bounded within that range.

Thanks,
-Michał

> +		xe_gt_err(q->gt, "Exec queue fixups incomplete - wqi parse failed\n");
> +}
> +
>  #define RESUME_PENDING	~0x0ull
>  static void submit_exec_queue(struct xe_exec_queue *q)
>  {
> @@ -2506,7 +2544,10 @@ void xe_guc_contexts_hwsp_rebase(struct xe_guc *guc, void *scratch)
>  	unsigned long index;
>  
>  	mutex_lock(&guc->submission_state.lock);
> -	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q)
> +	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>  		xe_exec_queue_contexts_hwsp_rebase(q, scratch);
> +		if (xe_exec_queue_is_parallel(q))
> +			wq_items_rebase(q);
> +	}
>  	mutex_unlock(&guc->submission_state.lock);
>  }
> -- 
> 2.25.1
>