Difference GART/GTT and related questions

Mon Oct 10 14:05:09 UTC 2022

On Sat, Oct 8, 2022 at 7:14 AM Peter Maucher <bellosilicio at gmail.com> wrote:
>
> Hi dri-devel,
>
> what is the difference between GTT and GART for AMD GPUs?
>  From what I gathered when looking through the mailing list archives and
> the freedesktop docs [1] as well as wikipedia [2],
> these terms seem to be synonymous, but that can not be the whole truth
> (different sizes in dmesg log, different kernel parameters in
> amdgpu/radeon, ...).
>
> As far as I understand it currently,
> the size of the GART is depending on some HW/ASIC functionality [3].
> On the other hand, I was successfully able to increase the size of the
> GART mapping(?) from 512MB to 1024MB by using amdgpu.gartsize=1024 on my
> RX 6600, and booting the system.
>
> GTT, on the other hand, is the maximum amount of system memory visible
> to the GPU, shared between all processes connected to the GPU.
> As I understand it, using GPUVM, each process can have one or more GARTs
> for mapping?
> Apparently, there is also something called a GART table window,
> what's up with that?
>
> Also, according to what I found in the mailing list archives,
> the GPUVM functionality "replaces" old GART with new GART features,
> so what is the difference and what exactly is GPUVM?
> If I understood correctly, GPUVM is a MMU using page tables on the GPU?
>
> And, additionally, the addresses translated by the GART(s) are
> optionally translated once more by the PCIe IOMMU,
> as the former is located on the GPU and the latter is in the CPU's PCIe
> root complex?
> Wikipedia mentions something about (another?) GART in an AMD IOMMU...
>
> Lastly, do any of these numbers influence what the longest contiguous
> mapping is for one buffer to the GPU?
> As in: can I map 95% or so of the available (GART/GTT?) space into one
> buffer and have the GPU work on it?

Modern AMD GPUs have multiple GPU virtual address spaces (GPU Virtual
Memory -- GPUVM) that can be active at any given time.  Each address
space is designated by a token called a VMID (Virtual Memory ID).  The
kernel driver uses one of these IDs for its own memory management.
The others are used dynamically for user processes.  It's pretty much
like virtual memory on a CPU, but because the GPU has really deep
pipelines, we have the concept of multiple address spaces being active
at any given time denoted by the VMIDs.  The GPU also has lots of
asynchronous engines (graphics, compute, transfer, media, etc.). E.g.,
if you have two user applications using the GPU at the same time and
the GPU kernel driver is moving some memory, you'd have 3 different
virtual address spaces that need to be active at the same time.  For
example, when the kernel driver submits a user job to the gfx engine,
it tells the engine which VMID (and hence GPUVM address space) it
should use for that job

GART defines the amount of platform address space (system memory or
MMIO space) that the GPU kernel driver can have mapped into the GPU's
virtual address space used by the kernel driver.  The kernel driver
generally doesn't need that much system space to be mapped at any
given time so we keep the GART pretty small to minimize GPU page table
size.

GTT defines the amount of platform address space that can be mapped
into the GPU virtual address space used by user processes.

TTM (the kernel memory management infrastructure that the driver uses)
imposes a default limit of 50% of system memory due to the way the OOM
handler works on Linux.  Memory allocated via a kernel driver on
behalf of a user process does not currently get counted towards the
application that allocated it.    This is a complex problem to fix so
it's persisted for a while.

The IOMMU provides virtualization for device access to the system
address space (system memory and MMIO space) so the DMA addresses the
the GPU driver gets from the Linux DMA subsystem and uses in the
driver are actualyl IO virtual addresses (IOVAs); i.e., they are IOMMU
virtual addresses.  So when the GPU kernel driver sets up the GPUVM
page tables the "physical" addresses will actually be IOVAs when an
IOMMU is present.

GPUVM provides a 48 bit GPU virtual address space so each process
using the GPU can have up to 48 bits of virtual GPU address space
mapped.  This address space can map a combination of on-device memory
(VRAM), system address space allocated through the driver (GTT) and
user pointer memory (e.g., malloced memory from the application).

Alex

>
> Thanks, Peter
>
> [1] https://dri.freedesktop.org/wiki/GART/
> [2] https://en.wikipedia.org/wiki/Graphics_address_remapping_table
> [3] https://www.kernel.org/doc/html/v6.0/gpu/amdgpu/module-parameters.html
>