[PATCH] drm/i915/gt: Fix CCS id's calculation for CCS mode setting

Gnattu OC gnattuoc at me.com
Sun May 19 15:34:12 UTC 2024



> On May 17, 2024, at 17:06, Andi Shyti <andi.shyti at linux.intel.com> wrote:
> 
> The whole point of the previous fixes has been to change the CCS
> hardware configuration to generate only one stream available to
> the compute users. We did this by changing the info.engine_mask
> that is set during device probe, reset during the detection of
> the fused engines, and finally reset again when choosing the CCS
> mode.
> 
> We can't use the engine_mask variable anymore, as with the
> current configuration, it imposes only one CCS no matter what the
> hardware configuration is.
> 
> Before changing the engine_mask for the third time, save it and
> use it for calculating the CCS mode.
> 
> After the previous changes, the user reported a performance drop
> to around 1/4. We have tested that the compute operations, with
> the current patch, have improved by the same factor.
> 
> Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> Cc: Gnattu OC <gnattuoc at me.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Tested-by: Jian Ye <jian.ye at intel.com>
> ---
> Hi,
> 
> This ensures that all four CCS engines work properly. However,
> during the tests, Jian detected that the performance during
> memory copy assigned to the CCS engines is negatively impacted.
> 
> I believe this might be expected, considering that based on the
> engines' availability, the media user might decide to reduce the
> copy in multitasking.
> 
> With the upcoming work that will give the user the chance to
> configure the CCS mode, this might improve.
> 
> Gnattu, can I use your kindness to ask for a test on this patch
> and check whether the performance improve on your side as well?
> 
> Thanks,
> Andi
> 
> drivers/gpu/drm/i915/gt/intel_engine_cs.c   | 6 ++++++
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
> drivers/gpu/drm/i915/gt/intel_gt_types.h    | 8 ++++++++
> 3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 5c8e9ee3b008..3b740ca25000 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
> 	if (IS_DG2(gt->i915)) {
> 		u8 first_ccs = __ffs(CCS_MASK(gt));
> 
> +		/*
> +		 * Store the number of active cslices before
> +		 * changing the CCS engine configuration
> +		 */
> +		gt->ccs.cslices = CCS_MASK(gt);
> +
> 		/* Mask off all the CCS engine */
> 		info->engine_mask &= ~GENMASK(CCS3, CCS0);
> 		/* Put back in the first CCS engine */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> index 99b71bb7da0a..3c62a44e9106 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> @@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
> 
> 	/* Build the value for the fixed CCS load balancing */
> 	for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
> -		if (CCS_MASK(gt) & BIT(cslice))
> +		if (gt->ccs.cslices & BIT(cslice))
> 			/*
> 			 * If available, assign the cslice
> 			 * to the first available engine...
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index def7dd0eb6f1..cfdd2ad5e954 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -207,6 +207,14 @@ struct intel_gt {
> 					    [MAX_ENGINE_INSTANCE + 1];
> 	enum intel_submission_method submission_method;
> 
> +	struct {
> +		/*
> +		 * Mask of the non fused CCS slices
> +		 * to be used for the load balancing
> +		 */
> +		intel_engine_mask_t cslices;
> +	} ccs;
> +
> 	/*
> 	 * Default address space (either GGTT or ppGTT depending on arch).
> 	 *
> -- 
> 2.43.0

Hi Andi,

I can confirm that this patch restores most of the performance we had before the CCS change. 

I do notice a reduction in memcpy performance, but it is good enough for our use case since our video processing pipeline is zero-copy once the video is loaded to the VRAM.

Tested-by: Gnattu OC <gnattuoc at me.com <mailto:gnattuoc at me.com>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20240519/42244aff/attachment.htm>


More information about the Intel-gfx mailing list