[PATCH] drm/i915/gt: Fix CCS id's calculation for CCS mode setting
Gnattu OC
gnattuoc at me.com
Sun May 19 15:34:12 UTC 2024
> On May 17, 2024, at 17:06, Andi Shyti <andi.shyti at linux.intel.com> wrote:
>
> The whole point of the previous fixes has been to change the CCS
> hardware configuration to generate only one stream available to
> the compute users. We did this by changing the info.engine_mask
> that is set during device probe, reset during the detection of
> the fused engines, and finally reset again when choosing the CCS
> mode.
>
> We can't use the engine_mask variable anymore, as with the
> current configuration, it imposes only one CCS no matter what the
> hardware configuration is.
>
> Before changing the engine_mask for the third time, save it and
> use it for calculating the CCS mode.
>
> After the previous changes, the user reported a performance drop
> to around 1/4. We have tested that the compute operations, with
> the current patch, have improved by the same factor.
>
> Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")
> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> Cc: Gnattu OC <gnattuoc at me.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Matt Roper <matthew.d.roper at intel.com>
> Tested-by: Jian Ye <jian.ye at intel.com>
> ---
> Hi,
>
> This ensures that all four CCS engines work properly. However,
> during the tests, Jian detected that the performance during
> memory copy assigned to the CCS engines is negatively impacted.
>
> I believe this might be expected, considering that based on the
> engines' availability, the media user might decide to reduce the
> copy in multitasking.
>
> With the upcoming work that will give the user the chance to
> configure the CCS mode, this might improve.
>
> Gnattu, can I use your kindness to ask for a test on this patch
> and check whether the performance improve on your side as well?
>
> Thanks,
> Andi
>
> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++++++
> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-
> drivers/gpu/drm/i915/gt/intel_gt_types.h | 8 ++++++++
> 3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 5c8e9ee3b008..3b740ca25000 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)
> if (IS_DG2(gt->i915)) {
> u8 first_ccs = __ffs(CCS_MASK(gt));
>
> + /*
> + * Store the number of active cslices before
> + * changing the CCS engine configuration
> + */
> + gt->ccs.cslices = CCS_MASK(gt);
> +
> /* Mask off all the CCS engine */
> info->engine_mask &= ~GENMASK(CCS3, CCS0);
> /* Put back in the first CCS engine */
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> index 99b71bb7da0a..3c62a44e9106 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
> @@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)
>
> /* Build the value for the fixed CCS load balancing */
> for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {
> - if (CCS_MASK(gt) & BIT(cslice))
> + if (gt->ccs.cslices & BIT(cslice))
> /*
> * If available, assign the cslice
> * to the first available engine...
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index def7dd0eb6f1..cfdd2ad5e954 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -207,6 +207,14 @@ struct intel_gt {
> [MAX_ENGINE_INSTANCE + 1];
> enum intel_submission_method submission_method;
>
> + struct {
> + /*
> + * Mask of the non fused CCS slices
> + * to be used for the load balancing
> + */
> + intel_engine_mask_t cslices;
> + } ccs;
> +
> /*
> * Default address space (either GGTT or ppGTT depending on arch).
> *
> --
> 2.43.0
Hi Andi,
I can confirm that this patch restores most of the performance we had before the CCS change.
I do notice a reduction in memcpy performance, but it is good enough for our use case since our video processing pipeline is zero-copy once the video is loaded to the VRAM.
Tested-by: Gnattu OC <gnattuoc at me.com <mailto:gnattuoc at me.com>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20240519/42244aff/attachment.htm>
More information about the Intel-gfx
mailing list