[Intel-xe] [PATCH 3/4] drm/xe/xe2: Set tile y bits in XY_FAST_COPY_BLT

Matt Roper matthew.d.roper at intel.com
Fri Sep 29 21:01:34 UTC 2023


On Fri, Sep 29, 2023 at 03:42:28PM -0500, Lucas De Marchi wrote:
> On Fri, Sep 29, 2023 at 09:10:22AM -0700, Matt Roper wrote:
> > On Thu, Sep 28, 2023 at 09:43:50PM -0700, Lucas De Marchi wrote:
> > > From: Haridhar Kalvala <haridhar.kalvala at intel.com>
> > > 
> > > Set bits 30 and 31 of XY_FAST_COPY_BLT for xe2 platforms. Its setting
> > > Tile-Y for source and destination.
> > 
> > This explanation is actually the opposite of what we're doing.  Setting
> > these bits marks the Y-major tiling format as Tile4 instead of legacy
> > TileY (the setting of "0" for these bits to select legacy TileY isn't
> > supported anymore).
> > 
> > We're also missing the explanation that we're seemingly supposed to set
> > these bits even in cases when the actual surface is linear or x-tiled.
> 
> Yeah... I think rewording it like in the bspec would be appropriate:
> 
> Destination or source being TileY is selected on dword0 and there's
> nothing to set on dword1. According to the bspec, "Behavior is undefined
> when programmed the value 0". So force those bits to 1 in xe2.
> 
> > 
> > And this patch also highlights that we're still setting these bits
> > incorrectly on DG2, PVC, and MTL --- on those platforms it may not be
> > mandatory to set these bits even when the surface is linear or X-tile,
> > but when we're working with a Tile4 surface we definitely should be
> > setting them (bspec 47982).  The simplest fix is probably to just change
> > the version condition here to start from 12.50 instead of 20.
> 
> Before xe2 it's doesn't have the comment that is illegal to select
> "legacy tile-y", although applying a platform filter does filter the
> value 0 out.
> 
> Humn... can we go even simpler and always set the value to 0b11
> regardless of platform? Even for platforms like TGL or ADL, I'd expect
> these fields wouls simply ignored if linear is selected in dword0.

We can try and see what CI thinks.  My worry would be whether the
hardware teams were already laying some of the preliminary groundwork
for Tile4 (which only got finished and exposed on Xe_HP).  So there's a
chance that we might get some random/incorrect behavior if the hardware
actually does have some kind of half-baked Tile4 support already
silently linked up to those bits.


Matt

> 
> Lucas De Marchi
> 
> > 
> > 
> > Matt
> > 
> > > 
> > > Bspec: 57567
> > > Cc: Matt Roper <matthew.d.roper at intel.com>
> > > Signed-off-by: Haridhar Kalvala <haridhar.kalvala at intel.com>
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 2 ++
> > >  drivers/gpu/drm/xe/xe_migrate.c           | 9 ++++++++-
> > >  2 files changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > index 1fdf2e4f1c9f..cc7b56763f10 100644
> > > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > @@ -57,6 +57,8 @@
> > > 
> > >  #define XY_FAST_COPY_BLT_CMD		(2 << 29 | 0x42 << 22)
> > >  #define   XY_FAST_COPY_BLT_DEPTH_32	(3<<24)
> > > +#define   XY_FAST_COPY_BLT_D1_SRC_TILE4	REG_BIT(31)
> > > +#define   XY_FAST_COPY_BLT_D1_DST_TILE4	REG_BIT(30)
> > > 
> > >  #define	PVC_MEM_SET_CMD		(2 << 29 | 0x5b << 22)
> > >  #define   PVC_MEM_SET_CMD_LEN_DW	7
> > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> > > index 2e6ce2daf414..b682d34bc1e5 100644
> > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > @@ -543,12 +543,19 @@ static void emit_copy(struct xe_gt *gt, struct xe_bb *bb,
> > >  		      u64 src_ofs, u64 dst_ofs, unsigned int size,
> > >  		      unsigned int pitch)
> > >  {
> > > +	struct xe_device *xe = gt_to_xe(gt);
> > > +
> > >  	xe_gt_assert(gt, size / pitch <= S16_MAX);
> > >  	xe_gt_assert(gt, pitch / 4 <= S16_MAX);
> > >  	xe_gt_assert(gt, pitch <= U16_MAX);
> > > 
> > >  	bb->cs[bb->len++] = XY_FAST_COPY_BLT_CMD | (10 - 2);
> > > -	bb->cs[bb->len++] = XY_FAST_COPY_BLT_DEPTH_32 | pitch;
> > > +	if (GRAPHICS_VER(xe) >= 20)
> > > +		bb->cs[bb->len++] = XY_FAST_COPY_BLT_DEPTH_32 | pitch |
> > > +				    XY_FAST_COPY_BLT_D1_SRC_TILE4 |
> > > +				    XY_FAST_COPY_BLT_D1_DST_TILE4;
> > > +	else
> > > +		bb->cs[bb->len++] = XY_FAST_COPY_BLT_DEPTH_32 | pitch;
> > >  	bb->cs[bb->len++] = 0;
> > >  	bb->cs[bb->len++] = (size / pitch) << 16 | pitch / 4;
> > >  	bb->cs[bb->len++] = lower_32_bits(dst_ofs);
> > > --
> > > 2.40.1
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list