[PATCH v2] drm/xe/guc: Configure TLB timeout based on CT buffer size

Wed Jun 26 14:17:02 UTC 2024

On 26.06.2024 15:42, Nirmoy Das wrote:
> GuC TLB invalidation depends on GuC to process the request from the CT
> queue and then the real time to invalidate TLB. Add a function to return
> overestimated possible time a TLB inval H2G might take which can be used
> as timeout value for TLB invalidation wait time.
> 
> v2: Address reviews from Michal.
> 
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1622
> Cc: Matthew Brost <matthew.brost at intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |  7 ++++++-
>  drivers/gpu/drm/xe/xe_guc_ct.c              | 13 +++++++++++++
>  drivers/gpu/drm/xe/xe_guc_ct.h              |  2 ++
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> index e1f1ccb01143..6fc85f711074 100644
> --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
> @@ -17,7 +17,12 @@
>  #include "xe_trace.h"
>  #include "regs/xe_guc_regs.h"
>  
> -#define TLB_TIMEOUT	(HZ / 4)
> +/**
> + * TLB inval depends on pending commands in the CT queue and then the real
> + * invalidation time. Double up the time to process full CT queue
> + * just to be on the safe side.
> + */
> +#define TLB_TIMEOUT	(xe_ct_full_queue_proc_time_jiffies() * 2)

I'm still not sold on keeping this as a macro, even at the one time cost
of slightly bigger diff.

static long tlb_timeout_jiffies(struct xe_gt *gt)
{
	/* this reflects what HW/GuC needs to process TLB inv request */
	const long HW_TLB_TIMEOUT = HZ / 4;

	/* this estimates actual delay caused by the CTB transport */
	long delay = xe_guc_ct_queue_proc_time_jiffies(ct);

	return HW_TLB_TIMEOUT + 2 * delay;
}

>  
>  static void xe_gt_tlb_fence_timeout(struct work_struct *work)
>  {
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 873d1bcbedd7..0cdf0d4d20e5 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -112,6 +112,19 @@ ct_to_xe(struct xe_guc_ct *ct)
>  #define CTB_G2H_BUFFER_SIZE	(4 * CTB_H2G_BUFFER_SIZE)
>  #define G2H_ROOM_BUFFER_SIZE	(CTB_G2H_BUFFER_SIZE / 4)
>  
> +/**
> + * xe_ct_full_queue_proc_time_jiffies - Return maximum time to process a full CT command queue
> + *
> + * Observation is that A 4KB buffer full of commands takes a little over a second to process.

s/A 4KB/a 4 KiB

> + * Use that to calculate maximum time to process a full CT command queue.
> + *
> + * Return: Maximum time to process the full CT queue in jiffies.
> + */

while code can be long as 100, we usually wrap comments at 80 column

> +long xe_ct_full_queue_proc_time_jiffies(void)

this should have xe_guc_ct_ prefix to match component

and likely take a ct pointer as then we could return better estimate of
the delay in CTB transport based on actual number of requests in flight,
or to provide any tweaks based on platform and/or SR-IOV mode (as
requests from the VFs might be processed at lower rate that your initial
observations, and in PF mode some operations could be quite heavy)

> +{
> +	return (CTB_H2G_BUFFER_SIZE * HZ) / SZ_4K;
> +}
> +
>  static size_t guc_ct_size(void)
>  {
>  	return 2 * CTB_DESC_SIZE + CTB_H2G_BUFFER_SIZE +
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.h b/drivers/gpu/drm/xe/xe_guc_ct.h
> index 105bb8e99a8d..57280f82dc35 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.h
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.h
> @@ -64,4 +64,6 @@ xe_guc_ct_send_block_no_fail(struct xe_guc_ct *ct, const u32 *action, u32 len)
>  	return xe_guc_ct_send_recv_no_fail(ct, action, len, NULL);
>  }
>  
> +long xe_ct_full_queue_proc_time_jiffies(void);
> +
>  #endif