<html><head><meta http-equiv="content-type" content="text/html; charset=us-ascii"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On May 17, 2024, at 17:06, Andi Shyti <andi.shyti@linux.intel.com> wrote:</div><br class="Apple-interchange-newline"><div><div>The whole point of the previous fixes has been to change the CCS<br>hardware configuration to generate only one stream available to<br>the compute users. We did this by changing the info.engine_mask<br>that is set during device probe, reset during the detection of<br>the fused engines, and finally reset again when choosing the CCS<br>mode.<br><br>We can't use the engine_mask variable anymore, as with the<br>current configuration, it imposes only one CCS no matter what the<br>hardware configuration is.<br><br>Before changing the engine_mask for the third time, save it and<br>use it for calculating the CCS mode.<br><br>After the previous changes, the user reported a performance drop<br>to around 1/4. We have tested that the compute operations, with<br>the current patch, have improved by the same factor.<br><br>Fixes: 6db31251bb26 ("drm/i915/gt: Enable only one CCS for compute workload")<br>Cc: Chris Wilson <chris.p.wilson@linux.intel.com><br>Cc: Gnattu OC <gnattuoc@me.com><br>Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com><br>Cc: Matt Roper <matthew.d.roper@intel.com><br>Tested-by: Jian Ye <jian.ye@intel.com><br>---<br>Hi,<br><br>This ensures that all four CCS engines work properly. However,<br>during the tests, Jian detected that the performance during<br>memory copy assigned to the CCS engines is negatively impacted.<br><br>I believe this might be expected, considering that based on the<br>engines' availability, the media user might decide to reduce the<br>copy in multitasking.<br><br>With the upcoming work that will give the user the chance to<br>configure the CCS mode, this might improve.<br><br>Gnattu, can I use your kindness to ask for a test on this patch<br>and check whether the performance improve on your side as well?<br><br>Thanks,<br>Andi<br><br> drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++++++<br> drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 2 +-<br> drivers/gpu/drm/i915/gt/intel_gt_types.h | 8 ++++++++<br> 3 files changed, 15 insertions(+), 1 deletion(-)<br><br>diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c<br>index 5c8e9ee3b008..3b740ca25000 100644<br>--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c<br>+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c<br>@@ -885,6 +885,12 @@ static intel_engine_mask_t init_engine_mask(struct intel_gt *gt)<br> <span class="Apple-tab-span" style="white-space:pre"> </span>if (IS_DG2(gt->i915)) {<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>u8 first_ccs = __ffs(CCS_MASK(gt));<br><br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>/*<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> * Store the number of active cslices before<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> * changing the CCS engine configuration<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> */<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>gt->ccs.cslices = CCS_MASK(gt);<br>+<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>/* Mask off all the CCS engine */<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>info->engine_mask &= ~GENMASK(CCS3, CCS0);<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>/* Put back in the first CCS engine */<br>diff --git a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c<br>index 99b71bb7da0a..3c62a44e9106 100644<br>--- a/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c<br>+++ b/drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c<br>@@ -19,7 +19,7 @@ unsigned int intel_gt_apply_ccs_mode(struct intel_gt *gt)<br><br> <span class="Apple-tab-span" style="white-space:pre"> </span>/* Build the value for the fixed CCS load balancing */<br> <span class="Apple-tab-span" style="white-space:pre"> </span>for (cslice = 0; cslice < I915_MAX_CCS; cslice++) {<br>-<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>if (CCS_MASK(gt) & BIT(cslice))<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>if (gt->ccs.cslices & BIT(cslice))<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>/*<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> * If available, assign the cslice<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> * to the first available engine...<br>diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h<br>index def7dd0eb6f1..cfdd2ad5e954 100644<br>--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h<br>+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h<br>@@ -207,6 +207,14 @@ struct intel_gt {<br> <span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> [MAX_ENGINE_INSTANCE + 1];<br> <span class="Apple-tab-span" style="white-space:pre"> </span>enum intel_submission_method submission_method;<br><br>+<span class="Apple-tab-span" style="white-space:pre"> </span>struct {<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>/*<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> * Mask of the non fused CCS slices<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> * to be used for the load balancing<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span> */<br>+<span class="Apple-tab-span" style="white-space:pre"> </span><span class="Apple-tab-span" style="white-space:pre"> </span>intel_engine_mask_t cslices;<br>+<span class="Apple-tab-span" style="white-space:pre"> </span>} ccs;<br>+<br> <span class="Apple-tab-span" style="white-space:pre"> </span>/*<br> <span class="Apple-tab-span" style="white-space:pre"> </span> * Default address space (either GGTT or ppGTT depending on arch).<br> <span class="Apple-tab-span" style="white-space:pre"> </span> *<br>-- <br>2.43.0<br></div></div></blockquote><div><br></div><div><div>Hi Andi,</div><div><br></div><div>I can confirm that this patch restores most of the performance we had before the CCS change. </div><div><br></div><div>I do notice a reduction in memcpy performance, but it is good enough for our use case since our video processing pipeline is zero-copy once the video is loaded to the VRAM.</div><div><br></div></div></div><div>Tested-by: Gnattu OC <<a href="mailto:gnattuoc@me.com">gnattuoc@me.com</a>></div><br></body></html>