[Intel-xe] [PATCH v3 1/3] drm/xe: Enable Fixed CCS mode setting
Niranjana Vishwanathapura
niranjana.vishwanathapura at intel.com
Fri Dec 8 20:45:30 UTC 2023
On Thu, Dec 07, 2023 at 03:45:19PM -0600, Lucas De Marchi wrote:
>I see this is merged, just giving some after-the-fact comments.
>
>On Sun, Dec 03, 2023 at 09:37:07PM -0800, Niranjana Vishwanathapura wrote:
>>Disable dynamic HW load balancing of compute resource assignment
>>to engines and instead enabled fixed mode of mapping compute
>>resources to engines on all platforms with more than one compute
>>engine.
>>
>>By default enable only one CCS engine with all compute slices
>>assigned to it. This is the desired configuration for common
>>workloads.
>>
>>PVC platform supports only the fixed CCS mode (workaround 16016805146).
>>
>>v2: Rebase, make it platform agnostic
>>v3: Minor code refactoring
>>
>>Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
>>---
>>drivers/gpu/drm/xe/Makefile | 1 +
>>drivers/gpu/drm/xe/regs/xe_gt_regs.h | 14 +++++
>>drivers/gpu/drm/xe/xe_gt.c | 10 ++++
>>drivers/gpu/drm/xe/xe_gt.h | 2 +
>>drivers/gpu/drm/xe/xe_gt_ccs_mode.c | 78 ++++++++++++++++++++++++++++
>>drivers/gpu/drm/xe/xe_gt_ccs_mode.h | 23 ++++++++
>>drivers/gpu/drm/xe/xe_gt_types.h | 8 +++
>>drivers/gpu/drm/xe/xe_guc_ads.c | 3 ++
>>drivers/gpu/drm/xe/xe_hw_engine.c | 20 +++++++
>>9 files changed, 159 insertions(+)
>>create mode 100644 drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>>create mode 100644 drivers/gpu/drm/xe/xe_gt_ccs_mode.h
>>
>>diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>>index 87f3fca0c0ee..86691f3b9077 100644
>>--- a/drivers/gpu/drm/xe/Makefile
>>+++ b/drivers/gpu/drm/xe/Makefile
>>@@ -70,6 +70,7 @@ xe-y += xe_bb.o \
>> xe_gsc.o \
>> xe_gsc_submit.o \
>> xe_gt.o \
>>+ xe_gt_ccs_mode.o \
>> xe_gt_clock.o \
>> xe_gt_debugfs.o \
>> xe_gt_idle.o \
>>diff --git a/drivers/gpu/drm/xe/regs/xe_gt_regs.h b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>index d318ec0efd7d..f4f2cf8d9022 100644
>>--- a/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>+++ b/drivers/gpu/drm/xe/regs/xe_gt_regs.h
>>@@ -401,8 +401,22 @@
>>#define COMP_CKN_IN REG_GENMASK(30, 29)
>>
>>#define RCU_MODE XE_REG(0x14800, XE_REG_OPTION_MASKED)
>>+#define RCU_MODE_FIXED_SLICE_CCS_MODE REG_BIT(1)
>>#define RCU_MODE_CCS_ENABLE REG_BIT(0)
>>
>>+/*
>>+ * Total of 4 cslices, where each cslice is in the form:
>>+ * [0-3] CCS ID
>>+ * [4-6] RSVD
>>+ * [7] Disabled
>>+ */
>
>we don't document fields like this. This should come verbatim from the
>bspec. Eventually we want to autogen this header.
>
Thanks Lucas for the comments.
Ok, will remove it.
>>+#define CCS_MODE XE_REG(0x14804)
>>+#define CCS_MODE_CSLICE_0_3_MASK REG_GENMASK(11, 0) /* 3 bits per cslice */
>>+#define CCS_MODE_CSLICE_MASK 0x7 /* CCS0-3 + rsvd */
>
>these names are odd... aren't then inverted? I.e., should be
>
>+#define CCS_MODE_CSLICE_MASK REG_GENMASK(11, 0)
>+#define CCS_MODE_CSLICE_0_3_MASK 0x7
>
No. CCS_MODE_CSLCE_MASK is the mask is the mask for cslice 0, whereas
CCS_MODE_CSLICE_0_3_MASK is the mask of all 4 slices.
I will define them as below.
#define CCS_MODE_CSLICE_0_3_MASK REG_GENMASK(11, 0)
#define CCS_MODE_CSLICE_0_MASK REG_GENMASK(2, 0)
>>+#define CCS_MODE_CSLICE_WIDTH ilog2(CCS_MODE_CSLICE_MASK + 1)
>>+#define CCS_MODE_CSLICE(cslice, ccs) \
>>+ ((ccs) << ((cslice) * CCS_MODE_CSLICE_WIDTH))
>
>I don't think we need these calculations here. Prefer using the REG_FIELD
>macros instead
>
Ok, will move them to .c file.
Can't use REG_FIELD macros, more on this below.
>>+
>>#define FORCEWAKE_ACK_GT XE_REG(0x130044)
>>#define FORCEWAKE_KERNEL BIT(0)
>>#define FORCEWAKE_USER BIT(1)
>>diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
>>index 8a6fb9641cd6..dc05c7780848 100644
>>--- a/drivers/gpu/drm/xe/xe_gt.c
>>+++ b/drivers/gpu/drm/xe/xe_gt.c
>>@@ -22,6 +22,7 @@
>>#include "xe_force_wake.h"
>>#include "xe_ggtt.h"
>>#include "xe_gsc.h"
>>+#include "xe_gt_ccs_mode.h"
>>#include "xe_gt_clock.h"
>>#include "xe_gt_idle.h"
>>#include "xe_gt_mcr.h"
>>@@ -452,6 +453,12 @@ static int all_fw_domain_init(struct xe_gt *gt)
>> if (err)
>> goto err_force_wake;
>>
>>+ /* Configure default CCS mode of 1 engine with all resources */
>>+ if (xe_gt_ccs_mode_enabled(gt)) {
>>+ gt->ccs_mode = 1;
>
>mode? or count?
ccs_mode is defined as,
"@ccs_mode: Number of compute engines enabled."
Here ccs_mode can be 1/2/4.
I preferred the name ccs_mode as that is how it is commonly referred to as.
>
>>+ xe_gt_apply_ccs_mode(gt);
>>+ }
>>+
>> err = xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL);
>> XE_WARN_ON(err);
>> xe_device_mem_access_put(gt_to_xe(gt));
>>@@ -558,6 +565,9 @@ static int do_gt_restart(struct xe_gt *gt)
>> xe_reg_sr_apply_whitelist(hwe);
>> }
>>
>>+ /* Get CCS mode in sync between sw/hw */
>>+ xe_gt_apply_ccs_mode(gt);
>>+
>> return 0;
>>}
>>
>>diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
>>index caded203a8a0..a818cc9c8fd0 100644
>>--- a/drivers/gpu/drm/xe/xe_gt.h
>>+++ b/drivers/gpu/drm/xe/xe_gt.h
>>@@ -17,6 +17,8 @@
>> for_each_if(((hwe__) = (gt__)->hw_engines + (id__)) && \
>> xe_hw_engine_is_valid((hwe__)))
>>
>>+#define CCS_MASK(gt) (((gt)->info.engine_mask & XE_HW_ENGINE_CCS_MASK) >> XE_HW_ENGINE_CCS0)
>
>this define seems out of place in this header
>
Ok, will move it to xe_gt_ccs_mode.h file.
>>+
>>#ifdef CONFIG_FAULT_INJECTION
>>extern struct fault_attr gt_reset_failure;
>>static inline bool xe_fault_inject_gt_reset(void)
>>diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.c b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>>new file mode 100644
>>index 000000000000..541c44c70a84
>>--- /dev/null
>>+++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.c
>>@@ -0,0 +1,78 @@
>>+// SPDX-License-Identifier: MIT
>>+/*
>>+ * Copyright © 2023 Intel Corporation
>>+ */
>>+
>>+#include "regs/xe_gt_regs.h"
>>+#include "xe_assert.h"
>>+#include "xe_gt.h"
>>+#include "xe_gt_ccs_mode.h"
>>+#include "xe_mmio.h"
>>+
>>+static void __xe_gt_apply_ccs_mode(struct xe_gt *gt, u32 num_engines)
>>+{
>>+ u32 mode = CCS_MODE_CSLICE_0_3_MASK; /* disable all by default */
>>+ int num_slices = hweight32(CCS_MASK(gt));
>>+ struct xe_device *xe = gt_to_xe(gt);
>>+ int width, cslice = 0;
>>+ u32 config = 0;
>>+
>>+ xe_assert(xe, xe_gt_ccs_mode_enabled(gt));
>>+
>>+ xe_assert(xe, num_engines && num_engines <= num_slices);
>>+ xe_assert(xe, !(num_slices % num_engines));
>>+
>>+ /*
>>+ * Loop over all available slices and assign each a user engine.
>>+ * For example, if there are four compute slices available, the
>>+ * assignment of compute slices to compute engines would be,
>>+ *
>>+ * With 1 engine (ccs0):
>>+ * slice 0, 1, 2, 3: ccs0
>>+ *
>>+ * With 2 engines (ccs0, ccs1):
>>+ * slice 0, 2: ccs0
>>+ * slice 1, 3: ccs1
>>+ *
>>+ * With 4 engines (ccs0, ccs1, ccs2, ccs3):
>>+ * slice 0: ccs0
>>+ * slice 1: ccs1
>>+ * slice 2: ccs2
>>+ * slice 3: ccs3
>>+ */
>>+ for (width = num_slices / num_engines; width; width--) {
>>+ struct xe_hw_engine *hwe;
>>+ enum xe_hw_engine_id id;
>>+
>>+ for_each_hw_engine(hwe, gt, id) {
>>+ if (hwe->class != XE_ENGINE_CLASS_COMPUTE)
>>+ continue;
>>+
>>+ if (hwe->logical_instance >= num_engines)
>>+ break;
>>+
>>+ config |= BIT(hwe->instance) << XE_HW_ENGINE_CCS0;
>>+
>>+ /* If a slice is fused off, leave disabled */
>>+ while ((CCS_MASK(gt) & BIT(cslice)) == 0)
>>+ cslice++;
>>+
>>+ mode &= ~CCS_MODE_CSLICE(cslice, CCS_MODE_CSLICE_MASK);
>>+ mode |= CCS_MODE_CSLICE(cslice, hwe->instance);
>
>REG_FIELD_SET() would be more appropriate.
>
Can't use REG_FIELD_PREP() here as it expects __mask to be a constant, etc.
CCS_MODE register has 3 bits for each compute slice. So defining a mask
for each of them and using it cumbersome and not scalable.
>>+ cslice++;
>>+ }
>>+ }
>>+
>>+ xe_mmio_write32(gt, CCS_MODE, mode);
>>+
>>+ xe_gt_info(gt, "CCS_MODE=%x config:%08x, num_engines:%d, num_slices:%d\n",
>>+ mode, config, num_engines, num_slices);
>>+}
>>+
>>+void xe_gt_apply_ccs_mode(struct xe_gt *gt)
>
>in xe we try to keep things into namespaces defined by the header, so
>this should had been xe_gt_ccs_mode_apply()
>
Ok, will fix.
>>+{
>>+ if (!gt->ccs_mode)
>>+ return;
>>+
>>+ __xe_gt_apply_ccs_mode(gt, gt->ccs_mode);
>
>what's the point of the split between xe_gt_apply_ccs_mode() and
>__xe_gt_apply_ccs_mode()? At most it would have been called
>apply_ccs_mode() rather than __xe_gt_apply_ccs_mode(). But it could as
>well just had been 1 function as there are no other callers and no
>clear role split between one and the other. Bothw of them operate on "gt"
>
Well, it is partly left over from previous design, but I kept it as
during gt reset this function gets called for all platforms and we
only want to configure ccs_mode if gt->ccs_mode is set (which currently
is only on PVC).
But I see this is not correct interface anymore and I will get rid of
this wrapper.
>
>>+}
>>diff --git a/drivers/gpu/drm/xe/xe_gt_ccs_mode.h b/drivers/gpu/drm/xe/xe_gt_ccs_mode.h
>>new file mode 100644
>>index 000000000000..e8766879f6ec
>>--- /dev/null
>>+++ b/drivers/gpu/drm/xe/xe_gt_ccs_mode.h
>>@@ -0,0 +1,23 @@
>>+/* SPDX-License-Identifier: MIT */
>>+/*
>>+ * Copyright © 2023 Intel Corporation
>>+ */
>>+
>>+#ifndef _XE_GT_CCS_MODE_H_
>>+#define _XE_GT_CCS_MODE_H_
>>+
>>+#include "xe_device_types.h"
>>+#include "xe_gt.h"
>>+#include "xe_gt_types.h"
>>+#include "xe_platform_types.h"
>
>why do you need all these headers?
>
left over from a previous design, will remove.
Niranjana
>Lucas De Marchi
More information about the Intel-xe
mailing list