[PATCH] drm/xe: Do not print engine reset message on a killed queue

John Harrison john.c.harrison at intel.com
Fri May 9 19:13:01 UTC 2025


On 5/8/2025 11:03 PM, Matthew Brost wrote:
> On Thu, May 08, 2025 at 04:03:56PM -0700, John Harrison wrote:
>> On 5/8/2025 12:09 PM, Matthew Brost wrote:
>>> When an app is ctrl-c (killed) any queues running on the GPU have their
>>> preemption timeout set to the minimum value and scheduling is disabled.
>>> If the queue has something active on the GPU it is very likely for the
>>> GuC will trigger an engine reset resulting in the engine reset message
>>> being printed when this is fully expected. Do not print the engine reset
>>> message on queues which have been killed.
>>>
>>> Reported-by: Paulo Zanoni <paulo.r.zanoni at intel.com>
>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4904
>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>> ---
>>>    drivers/gpu/drm/xe/xe_guc_submit.c | 5 +++--
>>>    1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
>>> index 369be36f7dc5..efff462ddd75 100644
>>> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
>>> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
>>> @@ -2005,8 +2005,9 @@ int xe_guc_exec_queue_reset_handler(struct xe_guc *guc, u32 *msg, u32 len)
>>>    	if (unlikely(!q))
>>>    		return -EPROTO;
>>> -	xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d",
>>> -		   xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id);
>>> +	if (!exec_queue_killed(q))
>>> +		xe_gt_info(gt, "Engine reset: engine_class=%s, logical_mask: 0x%x, guc_id=%d",
>>> +			   xe_hw_engine_class_to_str(q->class), q->logical_mask, guc_id);
>> Maybe make it an xe_gt_dbg in the case of a killed queue? It is still useful
>> to see such messages when triaging CI failures to get an idea of what is
>> going on behind the scenes.
>>
> I had thought about this, should be fine as long as this isn't spamming
> normal production kernels dmesg. I would assume xe_gt_dbg would not show
> up. I did the same thing for job timeout message in this patch - just
> dropped it on killed queues maybe I should be xe_gt_dbg message there
> too?
Yeah, for CI runs it is helpful to see when and why things are being 
killed. Especially when tracking down issues with context clean up and such!

And yeah, debug level does not show up by default. CI explicitly bumps 
the default log level to make it show up. And that brings on way more 
output from the display side than the i915 or Xe drivers ever produce!

John.

>
> Matt
>
>> John.
>>
>>>    	trace_xe_exec_queue_reset(q);



More information about the Intel-xe mailing list