[Intel-gfx] [PATCH 11/27] drm/i915/guc: Copy whole golden context, set engine state size of subset

John Harrison john.c.harrison at intel.com
Thu Aug 26 23:33:28 UTC 2021


On 8/25/2021 20:23, Matthew Brost wrote:
> When the GuC does a media reset, it copies a golden context state back
> into the corrupted context's state. The address of the golden context
> and the size of the engine state restore are passed in via the GuC ADS.
> The i915 had a bug where it passed in the whole size of the golden
> context, not the size of the engine state to restore resulting in a
> memory corruption.
>
> Also copy the entire golden context on init rather than just the engine
> state that is restored.
>
> Fixes: 481d458caede ("drm/i915/guc: Add golden context to GuC ADS")
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> ---
>   drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c | 28 +++++++++++++++++-----
>   1 file changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> index 6926919bcac6..df2734bfe078 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c
> @@ -358,6 +358,11 @@ static int guc_prep_golden_context(struct intel_guc *guc,
>   	u8 engine_class, guc_class;
>   	struct guc_gt_system_info *info, local_info;
>   
> +	/* Skip execlist and PPGTT registers + HWSP */
> +	const u32 lr_hw_context_size = 80 * sizeof(u32);
> +	const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
> +		lr_hw_context_size;
> +
>   	/*
>   	 * Reserve the memory for the golden contexts and point GuC at it but
>   	 * leave it empty for now. The context data will be filled in later
> @@ -396,7 +401,18 @@ static int guc_prep_golden_context(struct intel_guc *guc,
>   		if (!blob)
>   			continue;
>   
> -		blob->ads.eng_state_size[guc_class] = real_size;
> +		/*
> +		 * This interface is slightly confusing. We need to pass the
> +		 * base address of the golden context and the engine state size
> +		 * which is not the size of the whole golden context, it is a
> +		 * subset that the GuC uses when doing a watchdog reset. The
> +		 * engine state size must match the size of the golden context
> +		 * minus the first part of the golden context that the GuC does
> +		 * not retore during reset. Currently no real way to verify this
> +		 * other than reading the GuC spec / code and ensuring the
> +		 * 'skip_size' below matches the value used in the GuC code.
> +		 */
> +		blob->ads.eng_state_size[guc_class] = real_size - skip_size;
>   		blob->ads.golden_context_lrca[guc_class] = addr_ggtt;
>   		addr_ggtt += alloc_size;
>   	}
> @@ -437,8 +453,8 @@ static void guc_init_golden_context(struct intel_guc *guc)
>   	u8 *ptr;
>   
>   	/* Skip execlist and PPGTT registers + HWSP */
> -	const u32 lr_hw_context_size = 80 * sizeof(u32);
> -	const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
> +	__maybe_unused const u32 lr_hw_context_size = 80 * sizeof(u32);
> +	__maybe_unused const u32 skip_size = LRC_PPHWSP_SZ * PAGE_SIZE +
>   		lr_hw_context_size;
Not sure why the 'maybe unused'? The values are not only used in BUG_ONs 
or such that could vanish.

More importantly, you now have two sets of definitions for these magic 
numbers. That seems like a very bad idea. They should be moved into a 
helper function rather than repeated.

John.


>   
>   	if (!intel_uc_uses_guc_submission(&gt->uc))
> @@ -476,12 +492,12 @@ static void guc_init_golden_context(struct intel_guc *guc)
>   			continue;
>   		}
>   
> -		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] != real_size);
> +		GEM_BUG_ON(blob->ads.eng_state_size[guc_class] !=
> +			   real_size - skip_size);
>   		GEM_BUG_ON(blob->ads.golden_context_lrca[guc_class] != addr_ggtt);
>   		addr_ggtt += alloc_size;
>   
> -		shmem_read(engine->default_state, skip_size, ptr + skip_size,
> -			   real_size - skip_size);
> +		shmem_read(engine->default_state, 0, ptr, real_size);
>   		ptr += alloc_size;
>   	}
>   



More information about the Intel-gfx mailing list