[Intel-xe] [PATCH 00/26] Separate GT and tile

Mon May 15 13:08:24 UTC 2023

Hi, Matt,

On Wed, 2023-05-10 at 20:46 -0700, Matt Roper wrote:
> A 'tile' is not the same thing as a 'GT.'  For historical reasons,
> i915
> attempted to use a single 'struct intel_gt' to represent both
> concepts,
> although this design hasn't worked out terribly well.  For Xe we have
> the opportunity to design the driver in a way that more accurately
> reflects the real hardware behavior.
> 
> Different vendors use the term "tile" a bit differently, but in the
> Intel world, a 'tile' is pretty close to what most people would think
> of
> as being a complete GPU.  When multiple GPUs are placed behind a
> single
> PCI device, that's what we refer to as a "multi-tile device."  In
> such
> cases, pretty much all hardware is replicated per-tile, although
> certain
> responsibilities like PCI communication, reporting of interrupts to
> the
> OS, etc. are handled solely by the "root tile."  A multi-tile
> platform
> takes care of tying the tiles together in a way such that interrupt
> notifications from remote tiles are forwarded to the root tile, the
> per-tile vram is combined into a single address space, etc.
> 
> In contrast, a "GT" (which officially stands for "Graphics
> Technology")
> is the subset of a GPU/tile that is responsible for implementing
> graphics and/or media operations.  The GT is where a lot of the
> driver
> implementation happens since it's where the hardware engines, the
> execution units, and the GuC all reside.
> 
> Historically most Intel devices were single-tile devices that
> contained
> a single GT.  PVC is currently the only released Intel platform built
> on
> a multi-tile design (i.e., multiple GPUs behind a single PCI device);
> each PVC tile only has a single GT.  In contrast, platforms like MTL
> that have separate chips for render and media IP are still only a
> single
> logical GPU, but the graphics and media IP blocks are exposed each
> exposed as a separate GT within that single GPU.  This is important
> from
> a software perspective because multi-GT platforms like MTL only
> replicate a subset of the GPU hardware and behave differently than
> multi-tile platforms like PVC where nearly everything is replicated.
> 
> This series separates tiles from GTs in a manner that more closely
> matches the hardware behavior.  We now consider a PCI device
> (xe_device)
> to contain one or more tiles (struct xe_tile).  Each tile will
> contain
> one or two GTs (struct xe_gt).  Although we don't have any platforms
> yet
> that are multi-tile *and* contain more than one GT per tile, that may
> change in the future.  This driver redesign splits functionality as
> follows:
> 
> Per-tile functionality (shared by all GTs within the tile):
>  - Complete 4MB MMIO space (containing SGunit/SoC registers, GT
>    registers, display registers, etc.)
>  - Global GTT
>  - VRAM (if discrete)
>  - Interrupt flows
>  - Migration context
>  - kernel batchbuffer pool
>  - Primary GT
>  - Media GT (if media version >= 13)
> 
> Per-GT functionality:
>  - GuC
>  - Hardware engines
>  - Programmable hardware units (subslices, EUs)
>  - GSI subset of registers (multiple copies of these registers reside
>    within the complete MMIO space provided by the tile, but at
> different
>    offsets --- 0 for render, 0x380000 for media)
>  - Multicast register steering
>  - TLBs to cache page table translations
>  - Reset capability
>  - Low-level power management (e.g., C6)
>  - Clock frequency
>  - MOCS and PAT programming
> 

With that detailed cover-letter description, I think this makes sense.

I figure pagetables will need to be per tile with this splitup? What
about per-tile resources, like VRAM, that is accessible from all tiles
but with separate throughput / latencies depending on from which tile
they are accessed? Should those perhaps be per device with a per-tile
pointer to "preferred VRAM" and a map [tile][memory_type] of access
cost?

/Thomas