[Intel-xe] [PATCH 00/26] Separate GT and tile

Mon May 15 18:11:05 UTC 2023

On Mon, May 15, 2023 at 03:08:24PM +0200, Thomas Hellström wrote:
> Hi, Matt,
> 
> On Wed, 2023-05-10 at 20:46 -0700, Matt Roper wrote:
> > A 'tile' is not the same thing as a 'GT.'  For historical reasons,
> > i915
> > attempted to use a single 'struct intel_gt' to represent both
> > concepts,
> > although this design hasn't worked out terribly well.  For Xe we have
> > the opportunity to design the driver in a way that more accurately
> > reflects the real hardware behavior.
> > 
> > Different vendors use the term "tile" a bit differently, but in the
> > Intel world, a 'tile' is pretty close to what most people would think
> > of
> > as being a complete GPU.  When multiple GPUs are placed behind a
> > single
> > PCI device, that's what we refer to as a "multi-tile device."  In
> > such
> > cases, pretty much all hardware is replicated per-tile, although
> > certain
> > responsibilities like PCI communication, reporting of interrupts to
> > the
> > OS, etc. are handled solely by the "root tile."  A multi-tile
> > platform
> > takes care of tying the tiles together in a way such that interrupt
> > notifications from remote tiles are forwarded to the root tile, the
> > per-tile vram is combined into a single address space, etc.
> > 
> > In contrast, a "GT" (which officially stands for "Graphics
> > Technology")
> > is the subset of a GPU/tile that is responsible for implementing
> > graphics and/or media operations.  The GT is where a lot of the
> > driver
> > implementation happens since it's where the hardware engines, the
> > execution units, and the GuC all reside.
> > 
> > Historically most Intel devices were single-tile devices that
> > contained
> > a single GT.  PVC is currently the only released Intel platform built
> > on
> > a multi-tile design (i.e., multiple GPUs behind a single PCI device);
> > each PVC tile only has a single GT.  In contrast, platforms like MTL
> > that have separate chips for render and media IP are still only a
> > single
> > logical GPU, but the graphics and media IP blocks are exposed each
> > exposed as a separate GT within that single GPU.  This is important
> > from
> > a software perspective because multi-GT platforms like MTL only
> > replicate a subset of the GPU hardware and behave differently than
> > multi-tile platforms like PVC where nearly everything is replicated.
> > 
> > This series separates tiles from GTs in a manner that more closely
> > matches the hardware behavior.  We now consider a PCI device
> > (xe_device)
> > to contain one or more tiles (struct xe_tile).  Each tile will
> > contain
> > one or two GTs (struct xe_gt).  Although we don't have any platforms
> > yet
> > that are multi-tile *and* contain more than one GT per tile, that may
> > change in the future.  This driver redesign splits functionality as
> > follows:
> > 
> > Per-tile functionality (shared by all GTs within the tile):
> >  - Complete 4MB MMIO space (containing SGunit/SoC registers, GT
> >    registers, display registers, etc.)
> >  - Global GTT
> >  - VRAM (if discrete)
> >  - Interrupt flows
> >  - Migration context
> >  - kernel batchbuffer pool
> >  - Primary GT
> >  - Media GT (if media version >= 13)
> > 
> > Per-GT functionality:
> >  - GuC
> >  - Hardware engines
> >  - Programmable hardware units (subslices, EUs)
> >  - GSI subset of registers (multiple copies of these registers reside
> >    within the complete MMIO space provided by the tile, but at
> > different
> >    offsets --- 0 for render, 0x380000 for media)
> >  - Multicast register steering
> >  - TLBs to cache page table translations
> >  - Reset capability
> >  - Low-level power management (e.g., C6)
> >  - Clock frequency
> >  - MOCS and PAT programming
> > 
> 
> With that detailed cover-letter description, I think this makes sense.
> 
> I figure pagetables will need to be per tile with this splitup? What

Yeah, the GGTT moves into the xe_tile in this series.  I thought I had a
specific patch for that, but it looks like I might have accidentally
squashed it into the VRAM patch; I should separate those back out into
two separate patches in the next series revision.

> about per-tile resources, like VRAM, that is accessible from all tiles
> but with separate throughput / latencies depending on from which tile
> they are accessed? Should those perhaps be per device with a per-tile
> pointer to "preferred VRAM" and a map [tile][memory_type] of access
> cost?

I kept the VRAM inside the tile in this series, but we could definitely
promote it up to the device level if we think that makes sense (e.g., if
we suspect that future platforms might not have a 1:1 relationship
between GPU/tile and VRAM).  That would probably be worth doing as a
follow-up series though; since vram was already inside the xe_gt, moving
it to the xe_tile (which matches the reality of how PVC works) seemed
like the natural first step.

Matt

> 
> /Thomas
> 
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation