[PATCH v3 2/2] drm/xe/guc: Scale mmio send/recv timeout for CCS save/restore with smem size

Wed Aug 6 16:28:17 UTC 2025

On Wed, Aug 06, 2025 at 01:59:10PM +0530, Satyanarayana K V P wrote:
> After VF migration, GUC restores CCS metadata scaled to system memory size.
> The default timeout (50ms) is calibrated for 4GB memory capacity per
> specification. Timeouts for other memory sizes are proportionally derived
> from this baseline.
> 
> This ensures adequate restoration time for CCS metadata across
> different hardware configurations while maintaining spec compliance.
> 
> Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com>
> Cc: John Harrison <John.C.Harrison at Intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc.c | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index 9e34401e4489..d836ded83491 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -10,6 +10,7 @@
>  #include <generated/xe_wa_oob.h>
>  
>  #include "abi/guc_actions_abi.h"
> +#include "abi/guc_actions_sriov_abi.h"
>  #include "abi/guc_errors_abi.h"
>  #include "regs/xe_gt_regs.h"
>  #include "regs/xe_gtt_defs.h"
> @@ -1397,6 +1398,36 @@ int xe_guc_auth_huc(struct xe_guc *guc, u32 rsa_addr)
>  	return xe_guc_ct_send_block(&guc->ct, action, ARRAY_SIZE(action));
>  }
>  
> +/*
> + * After VF migration, GUC restores CCS metadata scaled to system memory size.
> + * Default timeout (50ms) is calibrated for 4GB memory capacity per
> + * specification. Timeouts for other memory sizes are proportionally derived
> + * from this baseline.
> + */
> +static u32 guc_mmio_send_recv_timeout(struct xe_guc *guc, const u32 *request)
> +{
> +	struct xe_device *xe = guc_to_xe(guc);
> +	u32 timeout = 50000;

Is this really the upper bound? It seems like if could be signicantly
higher if multiple VFs are trying to do things all at the same time.

Scaling the timeout itself, does make sense though.

Matt

> +	u32 action, factor;
> +	struct sysinfo si;
> +	u64 sys_mem_size;
> +
> +	action = FIELD_GET(GUC_HXG_REQUEST_MSG_0_ACTION, request[0]);
> +	if (action != GUC_ACTION_VF2GUC_NOTIFY_RESFIX_DONE || IS_DGFX(xe) ||
> +	    !xe_device_has_flat_ccs(xe))
> +		return timeout;
> +
> +	si_meminfo(&si);
> +	sys_mem_size = si.totalram * si.mem_unit;
> +
> +	if (sys_mem_size <= SZ_4G)
> +		return timeout;
> +
> +	factor = (sys_mem_size + SZ_4G) / SZ_4G;
> +	timeout *= factor;
> +
> +	return timeout;
> +}
>  int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
>  			  u32 len, u32 *response_buf)
>  {
> @@ -1439,7 +1470,7 @@ int xe_guc_mmio_send_recv(struct xe_guc *guc, const u32 *request,
>  
>  	ret = xe_mmio_wait32(mmio, reply_reg, GUC_HXG_MSG_0_ORIGIN,
>  			     FIELD_PREP(GUC_HXG_MSG_0_ORIGIN, GUC_HXG_ORIGIN_GUC),
> -			     50000, &reply, false);
> +			     guc_mmio_send_recv_timeout(guc, request), &reply, false);
>  	if (ret) {
>  		/* scratch registers might be cleared during FLR, try once more */
>  		if (!reply && !lost) {
> -- 
> 2.43.0
>