[PATCH v6 9/9] drm/xe/vf: Rebase exec queue parallel commands during migration recovery

Lis, Tomasz tomasz.lis at intel.com
Sat Jul 19 06:21:04 UTC 2025


On 16.07.2025 16:27, Michał Winiarski wrote:
> On Fri, Jul 04, 2025 at 11:02:28PM +0200, Tomasz Lis wrote:
>> Parallel exec queues have an additional command streamer buffer which holds
>> a GGTT reference to data within context status. The GGTT references have to
>> be fixed after VF migration.
>>
>> Signed-off-by: Tomasz Lis <tomasz.lis at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_guc_submit.c | 26 +++++++++++++++++++++++++-
>>   1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>> index 37584a064574..96b94f7fcc31 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>> @@ -693,6 +693,27 @@ static void wq_item_append(struct xe_exec_queue *q)
>>   	parallel_write(xe, map, wq_desc.tail, q->guc->wqi_tail);
>>   }
>>   
>> +static void wq_items_rebase(struct xe_exec_queue *q)
>> +{
>> +	struct xe_guc *guc = exec_queue_to_guc(q);
>> +	struct xe_device *xe = guc_to_xe(guc);
>> +	struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]);
>> +	int i = q->guc->wqi_head;
>> +
>> +	/* the ring starts after a header struct */
>> +	iosys_map_incr(&map, offsetof(struct guc_submit_parallel_scratch, wq[0]));
>> +
>> +	while (i != q->guc->wqi_tail) {
>> +		u32 len_dw, val;
>> +
>> +		val = xe_map_rd_ring_u32(xe, &map, i / sizeof(u32) + 0, WQ_SIZE / sizeof(u32));
>> +		len_dw = FIELD_GET(WQ_LEN_MASK, val);
>> +		val = xe_lrc_descriptor(q->lrc[0]);
>> +		xe_map_wr_ring_u32(xe, &map, i / sizeof(u32) + 1, WQ_SIZE / sizeof(u32), val);
>> +		i += len_dw * sizeof(u32);
>> +	}
> This type of constructs make me anxious, as it's relatively easy to turn
> it into infinite loop. Can we rewrite it as a for loop? Or at least
> something with a fixed number of iterations as a condition? (instead of
> != tail)
> Or at the very least - add some asserts...

I think I got distracted while implementing this, it's clearly 
unfinished. Or maybe the emit code looked differently when I implemented 
the fixups? Either way, I will prepare a correct implementation.

-Tomasz

> Thanks,
> -Michał
>
>> +}
>> +
>>   #define RESUME_PENDING	~0x0ull
>>   static void submit_exec_queue(struct xe_exec_queue *q)
>>   {
>> @@ -2511,7 +2532,10 @@ void xe_guc_contexts_hwsp_rebase(struct xe_guc *guc, void *scratch)
>>   	unsigned long index;
>>   
>>   	mutex_lock(&guc->submission_state.lock);
>> -	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q)
>> +	xa_for_each(&guc->submission_state.exec_queue_lookup, index, q) {
>>   		xe_exec_queue_contexts_hwsp_rebase(q, scratch);
>> +		if (xe_exec_queue_is_parallel(q))
>> +			wq_items_rebase(q);
>> +	}
>>   	mutex_unlock(&guc->submission_state.lock);
>>   }
>> -- 
>> 2.25.1
>>


More information about the Intel-xe mailing list