[PATCH] drm/xe: Skip TLB invalidation time out log if ct is disabled

Matthew Brost matthew.brost at intel.com
Tue Feb 20 15:05:27 UTC 2024


On Tue, Feb 20, 2024 at 02:13:56AM +0000, Shuicheng Lin wrote:
> Suspend may cause the TLB invalidation time out as below log.
> Skip the log print if ct is disabled to make log clean.
> "
> [  228.812266] xe_gt_tlb_invalidation_wait enter
> [  228.812311] xe_gt_suspend enter
> [  228.812782] xe 0000:03:00.0: [drm] GT0: suspended
> [  228.812786] xe_gt_suspend enter
> [  228.813508] xe 0000:03:00.0: [drm] GT1: suspended
>> [  229.067007] xe 0000:03:00.0: [drm] *ERROR* TILE0 [GTT] GT0: TLB invalidation time'd out, seqno=321, recv=319
> [  229.067099] xe 0000:03:00.0: [drm] *ERROR* GT0: CT disabled
> "
> 

This doesn't look right for a few reasons.
- The timeout still can race suspend and then a resume
- The xe_guc_ct_enabled check also supresses the -ETIME return
- I think this message it actually valid

What should probably be done is signal all pending TLB invalidations on
suspend. I think we are doing a bit of rework in [1] in this area too.
I'd say let's get [1] to land and if this is still an issue fixup the
suspend path to signal all TLB invalidation waiters. Signaling all
waiters on suspend shoud avoid having this message be printed.

Matt

[1] https://patchwork.freedesktop.org/series/129217/

> Signed-off-by: Shuicheng Lin <shuicheng.lin at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index 7b3a54748b49..8aac12efea84 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -330,11 +330,18 @@ int xe_gt_tlb_invalidation_wait(struct xe_gt *gt, int seqno)
>  	if (!ret) {
>  		struct drm_printer p = xe_gt_err_printer(gt);
>  
> -		xe_tile_report_driver_error(gt_to_tile(gt), XE_TILE_DRV_ERR_GTT,
> -					    "GT%u: TLB invalidation time'd out, seqno=%d, recv=%d",
> -					    gt->info.id, seqno, gt->tlb_invalidation.seqno_recv);
> -		xe_guc_ct_print(&guc->ct, &p, true);
> -		return -ETIME;
> +		/*
> +		 * guc ct may be disabled during the waiting period and lead to the timeout.
> +		 * Such as power suspend just after this tlb invalidation wait.
> +		 * Skip the error log print if ct is disabled.
> +		 */
> +		if (xe_guc_ct_enabled(&guc->ct)) {
> +			xe_tile_report_driver_error(gt_to_tile(gt), XE_TILE_DRV_ERR_GTT,
> +						    "GT%u: TLB invalidation time'd out, seqno=%d, recv=%d",
> +						    gt->info.id, seqno, gt->tlb_invalidation.seqno_recv);
> +			xe_guc_ct_print(&guc->ct, &p, true);
> +			return -ETIME;
> +		}
>  	}
>  
>  	return 0;
> -- 
> 2.25.1
> 


More information about the Intel-xe mailing list