[Intel-xe] [PATCH v2] drm/xe: Skip XY_FAST_COLOR instruction on link copy engines
Balasubramani Vivekanandan
balasubramani.vivekanandan at intel.com
Wed Mar 8 06:28:15 UTC 2023
On 07.03.2023 01:06, Lucas De Marchi wrote:
> On Tue, Mar 07, 2023 at 01:39:16PM +0530, Balasubramani Vivekanandan wrote:
> > Link copy engines doesn't support the XY_FAST_COLOR instruction.
> > Currently this instruction is used only at one place to clear a ttm
> > resource while migrating a BO.
> > A new device_info member is created to know if a platform has link copy
> > engine. If it supports, then instead of using XY_FAST_COLOR instruction,
> > MEM_SET is used which is available both in main and link copy engines.
> >
> > BSpec: 68433
> >
> > Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan at intel.com>
> > ---
> > drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 9 ++++
> > drivers/gpu/drm/xe/xe_device_types.h | 2 +
> > drivers/gpu/drm/xe/xe_migrate.c | 65 ++++++++++++++++-------
> > drivers/gpu/drm/xe/xe_pci.c | 4 ++
> > 4 files changed, 60 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > index 288576035ce3..df9ed4fbf2bf 100644
> > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > @@ -6,6 +6,8 @@
> > #ifndef _XE_GPU_COMMANDS_H_
> > #define _XE_GPU_COMMANDS_H_
> >
> > +#include "regs/xe_reg_defs.h"
>
> for the uses we are making here, it should suffice to
> include linux/bitmap.h
This file is using the macro REG_BIT(), definition of which is included
via the file regs/xe_reg_defs.h . So I had to include regs/xe_reg_defs.h
Though REG_BIT() was preexisting in xe_gpu_commands.h, this patch
uncovered the missing inclusion of regs/xe_reg_defs.h
>
> > +
> > #define INSTR_CLIENT_SHIFT 29
> > #define INSTR_MI_CLIENT 0x0
> > #define __INSTR(client) ((client) << INSTR_CLIENT_SHIFT)
> > @@ -56,6 +58,13 @@
> > #define GEN9_XY_FAST_COPY_BLT_CMD (2 << 29 | 0x42 << 22)
> > #define BLT_DEPTH_32 (3<<24)
> >
> > +#define PVC_MEM_SET_CMD (2 << 29 | 0x5b << 22)
> > +#define PVC_MEM_SET_CMD_LEN_DW 7
> > +#define PVC_MS_MATRIX REG_BIT(17)
> > +/* Bspec lists field as [6:0], but index alone is from [6:1] */
> > +#define PVC_MS_MOCS_INDEX_MASK GENMASK(6, 1)
> > +#define PVC_MS_DATA_FIELD GENMASK(31, 24)
>
> wrong order. Should be defining the most significative bits first,
> like they appear in bspec.
I will reorder the definitions in the order of dword, followed by most
significant bits of the dword.
>
> > +
> > #define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
> > #define PIPE_CONTROL_TILE_CACHE_FLUSH (1<<28)
> > #define PIPE_CONTROL_AMFS_FLUSH (1<<25)
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 199bd37fce9a..a73c5e1d7503 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -95,6 +95,8 @@ struct xe_device {
> > bool has_4tile;
> > /** @has_range_tlb_invalidation: Has range based TLB invalidations */
> > bool has_range_tlb_invalidation;
> > + /** @has_link_copy_engines: Whether the platform has link copy engines */
> > + bool has_link_copy_engine;
> > /** @enable_display: display enabled */
> > bool enable_display;
> >
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > index bc69ec17d5ad..59fd588a1faf 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -750,32 +750,57 @@ static int emit_clear(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
> > u32 size, u32 pitch, u32 value, bool is_vram)
> > {
> > u32 *cs = bb->cs + bb->len;
> > - u32 len = XY_FAST_COLOR_BLT_DW;
> > + u32 len;
> > u32 mocs = xe_mocs_index_to_value(gt->mocs.uc_index);
> > + struct xe_device *xe = gt_to_xe(gt);
> >
> > - if (GRAPHICS_VERx100(gt->xe) < 1250)
> > - len = 11;
> > -
> > - *cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
> > - (len - 2);
> > - *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
> > - (pitch - 1);
> > - *cs++ = 0;
> > - *cs++ = (size / pitch) << 16 | pitch / 4;
> > - *cs++ = lower_32_bits(src_ofs);
> > - *cs++ = upper_32_bits(src_ofs);
> > - *cs++ = (is_vram ? 0x0 : 0x1) << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > - *cs++ = value;
> > - *cs++ = 0;
> > - *cs++ = 0;
> > - *cs++ = 0;
> > -
> > - if (len > 11) {
> > - *cs++ = 0;
> > + if (xe->info.has_link_copy_engine) {
>
> I wonder if instead of doing this we have here one example of function
> hook that could provide some value. +Francois.
>
> And take a look at the discussion in 20230303145015.1018870-1-francois.dugast at intel.com
>
> I wouldn't block this series on that though.
I am very much interested in the proposal. I wish to have such a change
in the driver. Like you said, we can take it up as an improvment.
Regards,
Bala
>
>
> > + /* MEM_SET command supports setting only 8-bit value.
>
> leave a blank line:
>
> /*
> * MEM_SET ...
>
> is the more usual style.
>
> > + * This function is currently used only to clear the address
> > + * range. So the value agrument is not really used. Need to
> > + * have a better handling when there is a need to actually set
> > + * a value. Print a warning if a value bigger than 8-bit is
> > + * passed
> > + */
> > + XE_WARN_ON(value > U8_MAX);
> > +
> > + len = PVC_MEM_SET_CMD_LEN_DW;
> > +
> > + *cs++ = PVC_MEM_SET_CMD | PVC_MS_MATRIX |
> > + (PVC_MEM_SET_CMD_LEN_DW - 2);
> > + *cs++ = pitch - 1;
> > + *cs++ = (size / pitch) - 1;
> > + *cs++ = pitch - 1;
> > + *cs++ = lower_32_bits(src_ofs);
> > + *cs++ = upper_32_bits(src_ofs);
> > + *cs++ = FIELD_PREP(PVC_MS_DATA_FIELD, value) |
> > + FIELD_PREP(PVC_MS_MOCS_INDEX_MASK, mocs);
> > + } else {
> > + len = XY_FAST_COLOR_BLT_DW;
> > + if (GRAPHICS_VERx100(gt->xe) < 1250)
> > + len = 11;
> > +
> > + *cs++ = XY_FAST_COLOR_BLT_CMD | XY_FAST_COLOR_BLT_DEPTH_32 |
> > + (len - 2);
> > + *cs++ = FIELD_PREP(XY_FAST_COLOR_BLT_MOCS_MASK, mocs) |
> > + (pitch - 1);
> > *cs++ = 0;
> > + *cs++ = (size / pitch) << 16 | pitch / 4;
> > + *cs++ = lower_32_bits(src_ofs);
> > + *cs++ = upper_32_bits(src_ofs);
> > + *cs++ = (is_vram ? 0x0 : 0x1) << XY_FAST_COLOR_BLT_MEM_TYPE_SHIFT;
> > + *cs++ = value;
> > *cs++ = 0;
> > *cs++ = 0;
> > *cs++ = 0;
> > +
> > + if (len > 11) {
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + *cs++ = 0;
> > + }
> > }
> >
> > XE_BUG_ON(cs - bb->cs != len + bb->len);
> > diff --git a/drivers/gpu/drm/xe/xe_pci.c b/drivers/gpu/drm/xe/xe_pci.c
> > index c4d9fd2e7b2b..e555f13395ab 100644
> > --- a/drivers/gpu/drm/xe/xe_pci.c
> > +++ b/drivers/gpu/drm/xe/xe_pci.c
> > @@ -72,6 +72,8 @@ struct xe_device_desc {
> > bool has_4tile;
> > bool has_range_tlb_invalidation;
> > bool has_asid;
> > +
> > + bool has_link_copy_engine;
>
> I'm not sure if we should tie the availability of
> MEM_SET instruction to the fact that the platform has link copy engines.
> It seems to be an instruction to carry forward regardless of the type of
> the copy engine.
>
> +Matt Roper for opinion here
>
> Lucas De Marchi
>
> > };
> >
> > __diag_push();
> > @@ -224,6 +226,7 @@ static const struct xe_device_desc pvc_desc = {
> > .vm_max_level = 4,
> > .supports_usm = true,
> > .has_asid = true,
> > + .has_link_copy_engine = true,
> > };
> >
> > #define MTL_MEDIA_ENGINES \
> > @@ -413,6 +416,7 @@ static int xe_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> > xe->info.has_flat_ccs = desc->has_flat_ccs;
> > xe->info.has_4tile = desc->has_4tile;
> > xe->info.has_range_tlb_invalidation = desc->has_range_tlb_invalidation;
> > + xe->info.has_link_copy_engine = desc->has_link_copy_engine;
> >
> > spd = subplatform_get(xe, desc);
> > xe->info.subplatform = spd ? spd->subplatform : XE_SUBPLATFORM_NONE;
> > --
> > 2.34.1
> >
More information about the Intel-xe
mailing list