[PATCH v6 6/9] drm/xe/vf: Rebase MEMIRQ structures for all contexts after migration

Wed Jul 16 18:32:51 UTC 2025

On Fri, Jul 04, 2025 at 11:02:25PM +0200, Tomasz Lis wrote:
> All contexts require an update of state data, as the data includes
> GGTT references to memirq-related buffers.
> 
> Default contexts need these references updated as well, because they
> are not refreshed when a new context is created from them.
> 
> The way we write to vram requires scratch buffer to be used
> before the whole block is memcopied. Since using kalloc() within
> specific recovery functions would lead to unintended relations
> between locks, we are allocating the buffer earlier, before
> any locks are taken. The same buffer will be used for other steps
> of the recovery.
> 
> v2: Update addresses by xe_lrc_write_ctx_reg() rather than
>   set_memory_based_intr()
> v3: Renamed parameter, reordered parameters in some functs
> v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file
> v5: Revert back to requiring scratch buffer, but allocate it
>   earlier this time
> 
> Signed-off-by: Tomasz Lis <tomasz.lis at intel.com>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Michal Winiarski <michal.winiarski at intel.com>
> Acked-by: Satyanarayana K V P <satyanarayana.k.v.p at intel.com> (v3)
> ---
>  drivers/gpu/drm/xe/xe_exec_queue.c  |  7 ++--
>  drivers/gpu/drm/xe/xe_exec_queue.h  |  2 +-
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.c | 14 ++++++++
>  drivers/gpu/drm/xe/xe_gt_sriov_vf.h |  1 +
>  drivers/gpu/drm/xe/xe_guc_submit.c  |  5 +--
>  drivers/gpu/drm/xe/xe_guc_submit.h  |  2 +-
>  drivers/gpu/drm/xe/xe_lrc.c         | 52 +++++++++++++++++++++++++++--
>  drivers/gpu/drm/xe/xe_lrc.h         |  4 +++
>  drivers/gpu/drm/xe/xe_sriov_vf.c    | 20 +++++++++--
>  9 files changed, 97 insertions(+), 10 deletions(-)
> 

[...]

> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
> index 54a6218e48b0..43ac73e432d4 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
> @@ -13,6 +13,7 @@
>  #include "xe_guc_ct.h"
>  #include "xe_guc_submit.h"
>  #include "xe_irq.h"
> +#include "xe_lrc.h"
>  #include "xe_pm.h"
>  #include "xe_sriov.h"
>  #include "xe_sriov_printk.h"
> @@ -244,6 +245,11 @@ static int vf_get_next_migrated_gt_id(struct xe_device *xe)
>  	return -1;
>  }
>  
> +static size_t post_migration_scratch_size(struct xe_device *xe)
> +{
> +	return xe_lrc_reg_size(xe);
> +}
> +
>  /**
>   * Perform post-migration fixups on a single GT.
>   *
> @@ -260,19 +266,29 @@ static int vf_get_next_migrated_gt_id(struct xe_device *xe)
>  static int gt_vf_post_migration_fixups(struct xe_gt *gt)
>  {
>  	s64 shift;
> +	void *buf;
>  	int err;
>  
> +	buf = kmalloc(post_migration_scratch_size(gt_to_xe(gt)), GFP_KERNEL);
> +	if (!buf)
> +		return -ENOMEM;
> +

I would suggest allocating this temporary buffer at the place where it's
actually needed, rather than dragging this top-level allocation through
the entire call-stack. Yes, when done this way we can reuse it, no, the
reuse doesn't really give us anything, as the buffer is small, and
kmalloc is relatively fast.

Other than that - LGTM.

Reviewed-by: Michał Winiarski <michal.winiarski at intel.com>

Thanks,
-Michał

>  	err = xe_gt_sriov_vf_query_config(gt);
> -	if (err)
> +	if (err) {
> +		kfree(buf);
>  		return err;
> +	}
>  
>  	shift = xe_gt_sriov_vf_ggtt_shift(gt);
>  	if (shift) {
>  		xe_tile_sriov_vf_fixup_ggtt_nodes(gt_to_tile(gt), shift);
> -		xe_guc_contexts_hwsp_rebase(&gt->uc.guc);
> +		xe_gt_sriov_vf_default_lrcs_hwsp_rebase(gt);
> +		xe_guc_contexts_hwsp_rebase(&gt->uc.guc, buf);
>  		/* FIXME: add the recovery steps */
>  		xe_guc_ct_fixup_messages_with_ggtt(&gt->uc.guc.ct, shift);
>  	}
> +
> +	kfree(buf);
>  	return 0;
>  }
>  
> -- 
> 2.25.1
>