[Intel-xe] [PATCH 00/26] Separate GT and tile
Matt Roper
matthew.d.roper at intel.com
Thu May 11 03:46:56 UTC 2023
A 'tile' is not the same thing as a 'GT.' For historical reasons, i915
attempted to use a single 'struct intel_gt' to represent both concepts,
although this design hasn't worked out terribly well. For Xe we have
the opportunity to design the driver in a way that more accurately
reflects the real hardware behavior.
Different vendors use the term "tile" a bit differently, but in the
Intel world, a 'tile' is pretty close to what most people would think of
as being a complete GPU. When multiple GPUs are placed behind a single
PCI device, that's what we refer to as a "multi-tile device." In such
cases, pretty much all hardware is replicated per-tile, although certain
responsibilities like PCI communication, reporting of interrupts to the
OS, etc. are handled solely by the "root tile." A multi-tile platform
takes care of tying the tiles together in a way such that interrupt
notifications from remote tiles are forwarded to the root tile, the
per-tile vram is combined into a single address space, etc.
In contrast, a "GT" (which officially stands for "Graphics Technology")
is the subset of a GPU/tile that is responsible for implementing
graphics and/or media operations. The GT is where a lot of the driver
implementation happens since it's where the hardware engines, the
execution units, and the GuC all reside.
Historically most Intel devices were single-tile devices that contained
a single GT. PVC is currently the only released Intel platform built on
a multi-tile design (i.e., multiple GPUs behind a single PCI device);
each PVC tile only has a single GT. In contrast, platforms like MTL
that have separate chips for render and media IP are still only a single
logical GPU, but the graphics and media IP blocks are exposed each
exposed as a separate GT within that single GPU. This is important from
a software perspective because multi-GT platforms like MTL only
replicate a subset of the GPU hardware and behave differently than
multi-tile platforms like PVC where nearly everything is replicated.
This series separates tiles from GTs in a manner that more closely
matches the hardware behavior. We now consider a PCI device (xe_device)
to contain one or more tiles (struct xe_tile). Each tile will contain
one or two GTs (struct xe_gt). Although we don't have any platforms yet
that are multi-tile *and* contain more than one GT per tile, that may
change in the future. This driver redesign splits functionality as
follows:
Per-tile functionality (shared by all GTs within the tile):
- Complete 4MB MMIO space (containing SGunit/SoC registers, GT
registers, display registers, etc.)
- Global GTT
- VRAM (if discrete)
- Interrupt flows
- Migration context
- kernel batchbuffer pool
- Primary GT
- Media GT (if media version >= 13)
Per-GT functionality:
- GuC
- Hardware engines
- Programmable hardware units (subslices, EUs)
- GSI subset of registers (multiple copies of these registers reside
within the complete MMIO space provided by the tile, but at different
offsets --- 0 for render, 0x380000 for media)
- Multicast register steering
- TLBs to cache page table translations
- Reset capability
- Low-level power management (e.g., C6)
- Clock frequency
- MOCS and PAT programming
At the moment I've left USM / pagefault handling at the GT level,
although I'm not familiar enough with that specific feature to know
whether it's truly correct or not.
The first patch in this series temporarily drops MTL media GT support.
The driver doesn't load properly on MTL today, largely due to the
mishandling of GT vs tile; dropping support completely allows us to more
easily make the necessary driver redesign required. The media GT is
re-enabled (properly this time) near the end of the series and this
allows the driver to load successfully without error on MTL for the
first time. There are still issues when submitting workloads to MTL
after driver load (i.e., CAT errors), but those seem to be a separate
platform-specific issues unrelated to the GT/tile work in this series
that will need to be debugged and fixed separately.
This series leaves a few open questions and FIXME's:
- Unlike i915, the Xe driver has chosen to expose GTs to userspace
rather than keeping them a hidden implementation detail. With the
separation of xe_tile and xe_gt, we need to decide whether we also
want to expose tiles (in addition to GTs), whether we want to _only_
expose tiles (and keep the primary vs media GT separation a hidden
internal detail), or something else.
- How should GTs be numbered? Today it's straightforward --- PVC
assigns GT IDs 0 and 1 to the primary GT of each tile. MTL assigns
GT IDs 0 and 1 to the primary and media GTs of its sole tile. But if
we have a platform in the future that has multiple tiles _and_
multiple GTs per tile, how should we handle the numbering in that
case?
- Xe (mis)design used xe_gt as the target of all MMIO operations (i.e.,
xe_mmio_*()). This really doesn't make sense, especially since
there's a lot of MMIO accesses that are completely unrelated to GT
(i.e., sgunit registers, display registers, etc.). i915 used
'intel_uncore' as the MMIO target, although that wasn't really an
accurate reflection of the hardware either. What we really want is
something that combines the MMIO register space (stored in the tile)
with the GSI offset (stored in the GT). My current plan is to
introduce an "xe_mmio_view" (name may change) in a future series that
will serve as a target for register operations. There will be
sensible APIs to obtain an xe_mmio_view appropriate to the type of
register access being performed (and that will also be able to do
some range sanity checking in debug drivers to help catch misuse).
That's a somewhat large/invasive change, so I'm saving that for a
follow-up series after this one is completed.
Cc: Matthew Brost <matthew.brost at intel.com>
Cc: Lucas De Marchi <lucas.demarchi at intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
Cc: Michael J. Ruhl <michael.j.ruhl at intel.com>
Cc: Nirmoy Das <nirmoy.das at intel.com>
Matt Roper (26):
drm/xe/mtl: Disable media GT
drm/xe: Introduce xe_tile
drm/xe: Add backpointer from gt to tile
drm/xe: Add for_each_tile iterator
drm/xe: Move register MMIO into xe_tile
drm/xe: Move VRAM from GT to tile
drm/xe: Memory allocations are tile-based, not GT-based
drm/xe: Move migration from GT to tile
drm/xe: Clarify 'gt' retrieval for primary tile
drm/xe: Drop vram_id
drm/xe: Drop extra_gts[] declarations and XE_GT_TYPE_REMOTE
drm/xe: Allocate GT dynamically
drm/xe: Add media GT to tile
drm/xe: Move display IRQ postinstall out of GT function
drm/xe: Interrupts are delivered per-tile, not per-GT
drm/xe/irq: Handle ASLE backlight interrupts at same time as display
drm/xe/irq: Actually call xe_irq_postinstall()
drm/xe/irq: Ensure primary GuC won't clobber media GuC's interrupt
mask
drm/xe/irq: Untangle postinstall functions
drm/xe: Replace xe_gt_irq_postinstall with xe_irq_enable_hwe
drm/xe: Invalidate TLB on all affected GTs during GGTT updates
drm/xe/tlb: Obtain forcewake when doing GGTT TLB invalidations
drm/xe: Allow GT looping and lookup on standalone media
drm/xe: Update query uapi to support standalone media
drm/xe: Reinstate media GT support
drm/xe: Clarify source of GT log messages
drivers/gpu/drm/i915/display/intel_dsb.c | 5 +-
drivers/gpu/drm/i915/display/intel_fbc.c | 3 +-
drivers/gpu/drm/i915/display/intel_fbdev.c | 7 +-
drivers/gpu/drm/xe/Makefile | 1 +
.../drm/xe/compat-i915-headers/intel_uncore.h | 2 +-
drivers/gpu/drm/xe/display/ext/i915_irq.c | 2 +-
drivers/gpu/drm/xe/display/xe_fb_pin.c | 13 +-
drivers/gpu/drm/xe/display/xe_plane_initial.c | 8 +-
drivers/gpu/drm/xe/regs/xe_gt_regs.h | 8 +
drivers/gpu/drm/xe/tests/xe_bo.c | 8 +-
drivers/gpu/drm/xe/tests/xe_migrate.c | 15 +-
drivers/gpu/drm/xe/xe_bb.c | 5 +-
drivers/gpu/drm/xe/xe_bo.c | 104 ++---
drivers/gpu/drm/xe/xe_bo.h | 20 +-
drivers/gpu/drm/xe/xe_bo_evict.c | 22 +-
drivers/gpu/drm/xe/xe_bo_types.h | 4 +-
drivers/gpu/drm/xe/xe_device.c | 12 +-
drivers/gpu/drm/xe/xe_device.h | 49 ++-
drivers/gpu/drm/xe/xe_device_types.h | 107 ++++-
drivers/gpu/drm/xe/xe_engine.c | 2 +-
drivers/gpu/drm/xe/xe_ggtt.c | 45 +-
drivers/gpu/drm/xe/xe_ggtt.h | 6 +-
drivers/gpu/drm/xe/xe_ggtt_types.h | 2 +-
drivers/gpu/drm/xe/xe_gt.c | 191 ++-------
drivers/gpu/drm/xe/xe_gt.h | 8 +-
drivers/gpu/drm/xe/xe_gt_debugfs.c | 8 +-
drivers/gpu/drm/xe/xe_gt_mcr.c | 2 +-
drivers/gpu/drm/xe/xe_gt_pagefault.c | 16 +-
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 4 +-
drivers/gpu/drm/xe/xe_gt_types.h | 87 ++--
drivers/gpu/drm/xe/xe_guc.c | 11 +-
drivers/gpu/drm/xe/xe_guc_ads.c | 5 +-
drivers/gpu/drm/xe/xe_guc_ct.c | 5 +-
drivers/gpu/drm/xe/xe_guc_hwconfig.c | 5 +-
drivers/gpu/drm/xe/xe_guc_log.c | 6 +-
drivers/gpu/drm/xe/xe_guc_pc.c | 5 +-
drivers/gpu/drm/xe/xe_hw_engine.c | 6 +-
drivers/gpu/drm/xe/xe_irq.c | 393 +++++++++---------
drivers/gpu/drm/xe/xe_irq.h | 3 +-
drivers/gpu/drm/xe/xe_lrc.c | 13 +-
drivers/gpu/drm/xe/xe_lrc_types.h | 4 +-
drivers/gpu/drm/xe/xe_migrate.c | 76 ++--
drivers/gpu/drm/xe/xe_migrate.h | 9 +-
drivers/gpu/drm/xe/xe_mmio.c | 92 ++--
drivers/gpu/drm/xe/xe_mmio.h | 21 +-
drivers/gpu/drm/xe/xe_mocs.c | 14 +-
drivers/gpu/drm/xe/xe_pci.c | 92 ++--
drivers/gpu/drm/xe/xe_pt.c | 150 ++++---
drivers/gpu/drm/xe/xe_pt.h | 14 +-
drivers/gpu/drm/xe/xe_query.c | 32 +-
drivers/gpu/drm/xe/xe_res_cursor.h | 2 +-
drivers/gpu/drm/xe/xe_sa.c | 13 +-
drivers/gpu/drm/xe/xe_sa.h | 4 +-
drivers/gpu/drm/xe/xe_tile.c | 89 ++++
drivers/gpu/drm/xe/xe_tile.h | 16 +
drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 4 +-
drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 16 +-
drivers/gpu/drm/xe/xe_ttm_vram_mgr.h | 4 +-
drivers/gpu/drm/xe/xe_ttm_vram_mgr_types.h | 6 +-
drivers/gpu/drm/xe/xe_uc_fw.c | 5 +-
drivers/gpu/drm/xe/xe_vm.c | 156 ++++---
drivers/gpu/drm/xe/xe_vm.h | 2 +-
drivers/gpu/drm/xe/xe_vm_types.h | 22 +-
include/uapi/drm/xe_drm.h | 4 +-
64 files changed, 1108 insertions(+), 957 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_tile.c
create mode 100644 drivers/gpu/drm/xe/xe_tile.h
--
2.40.0
More information about the Intel-xe
mailing list