[PATCH] drm/xe/guc_submit: improve schedule disable error logging

Nirmoy Das nirmoy.das at intel.com
Fri Sep 27 14:10:10 UTC 2024


On 9/27/2024 3:35 PM, Matthew Auld wrote:
> A few things here. Make the two prints consistent (and distinct), print
> the guc_id, and finally dump the CT queues. It should be possible to
> spot the guc_id in the CT queue dump, and for example see that host side
> has yet to process the response for the schedule disable, or see that
> GuC is yet to send it, to help narrow things down if we trigger the
> timeout.
>
> References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1638
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Nirmoy Das <nirmoy.das at intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_submit.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 80062e1d3f66..52ed7c0043f9 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -977,7 +977,12 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
>  					 !exec_queue_pending_disable(q) ||
>  					 guc_read_stopped(guc), HZ * 5);
>  		if (!ret) {
> -			drm_warn(&xe->drm, "Schedule disable failed to respond");
> +			struct xe_gt *gt = guc_to_gt(guc);
> +			struct drm_printer p = xe_gt_err_printer(gt);
> +
> +			xe_gt_warn(gt, "%s schedule disable failed to respond guc_id=%d",
> +				   __func__, ge->id);
> +			xe_guc_ct_print(&guc->ct, &p, false);
>  			xe_sched_submission_start(sched);
>  			xe_gt_reset_async(q->gt);
>  			return;
> @@ -1177,8 +1182,14 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
>  					 guc_read_stopped(guc), HZ * 5);
>  		if (!ret || guc_read_stopped(guc)) {
>  trigger_reset:
> -			if (!ret)
> -				xe_gt_warn(guc_to_gt(guc), "Schedule disable failed to respond");
> +			if (!ret) {
> +				struct xe_gt *gt = guc_to_gt(guc);
> +				struct drm_printer p = xe_gt_err_printer(gt);
> +
> +				xe_gt_warn(gt, "%s schedule disable failed to respond guc_id=%d",
> +					   __func__, q->guc->id);
> +				xe_guc_ct_print(&guc->ct, &p, true);
> +			}
>  			set_exec_queue_extra_ref(q);
>  			xe_exec_queue_get(q);	/* GT reset owns this */
>  			set_exec_queue_banned(q);


More information about the Intel-xe mailing list