VM on GPUs
Jan Vesely
jan.vesely at rutgers.edu
Fri Feb 20 16:21:09 PST 2015
Hi,
thank you for exhaustive answer. I have few more
questions/clarifications:
is the DMA address used to access system pages further translated using
IOMMU (if present), or are GPUs treated specially?
I have only seen references to TLB flush, so I guess invalidating
individual entries is not supported?
does it mean that if a page needs to be moved/migrated a complete VMID
tlb flush is required?
I was a bit surprised to find about PCIe cache snoop, since the work I
have seen before assumes DMA is not cache coherent. I guess there's a
latency penalty for for using it, do you have any idea how much worse it
gets (relatively to non-snoop access)?
thanks again,
jan
On Fri, 2015-02-20 at 17:19 -0500, Alex Deucher wrote:
> On Fri, Feb 20, 2015 at 12:35 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
> > Hello radeon devs,
> >
> > I have been trying to find out more about VM implementation on SI+ hw,
> > but unfortunately I could not find much in the public documents[0].
> >
> > SI ISA manual suggests that there is a limited form of privileged mode
> > on these chips, so I wondered if it could be used for VM management too
> > (the docs only deal with numerical exceptions). Or does it always have
> > to be handled by host (driver)?
>
> These are related to trap/exception privilege for debugging for
> example. I'm not that familiar with how that stuff works. It's
> unrelated to GPUVM.
>
> >
> > One of the older patches [1] mentions different page sizes, is there any
> > public documentation on things like page table format, and GPU MMU
> > hierarchy? I could only get limited picture going through the code and
> > comments.
>
> There is not any public documentation on the VM hardware other than
> what is available in the driver. I can try and give you an overview
> of how it works. There are 16 VM contexts (8 on cayman/TN/RL) on the
> GPU that can be active at any given time. GPUVM supports a 40 bit
> address space. Each context has an id, we call them vmids. vmid 0 is
> a bit special. It's called the system context and behaves a bit
> differently to the other ones. It's designed to be for the kernel
> driver's view of GPU accessible memory. I can go into further detail
> if you want, but I don't think it's critical for this discussion.
> Just think of it as the context used by the kernel driver. So that
> leaves 16 contexts (7 on cayman and TN/RL) available for use by user
> clients. vmid 0 has one set of configuration registers and vmids 1-15
> share the same configuration (other than the page tables). E.g.,
> contexts 1-15 all have to use single or 2 level page tables for
> example. You select which VM context is used for a particular command
> buffer by a field in the command buffer packet. Some engines (e.g.,
> UVD or the display hardware) do not support VM so they always use vmid
> 0. Right now only the graphics, compute, and DMA engines support VM.
>
> With single level page tables, you just have a big array of page table
> entries (PTEs) that represent the entire virtual address space. With
> multi-level page tables, the address space is represented by an array
> of page directory entries (PDEs) that point to page table blocks
> (PTBs) which are arrays of PTEs.
>
> PTEs and PDEs are 64 bits per entry.
>
> PDE:
> 39:12 - PTB address
> 0 - PDE valid (the entry is valid)
>
> PTE:
> 39:12 - page address
> 11:7 - fragment
> 6 - write
> 5 - read
> 2 - CPU cache snoop (for accessing cached system memory)
> 1 - system (page is in system memory rather than vram)
> 0 - PTE valid (the entry is valid)
>
> Fragment needs some explanation. The logical/physical fragment size in
> bytes = 2 ^ (12 + fragment). A fragment size of 0 means 4k, 1 means,
> 8k, etc. The logical address must be aligned to the fragment size and
> the memory backing it must be contiguous and at least as large as the
> fragment size. Larger fragment sizes reduce the pressure on the TLB
> since fewer entries are required for the same amount of memory.
>
> For system pages, the page address is the dma address of the page.
> The system bit must be set and the snoop bit can be optionally set
> depending on whether you are using cachable memory.
>
> For vram pages, the address is the GPU physical address of vram
> (starts at 0 on dGPUs, starts at MC_VM_FB_OFFSET (dma address of
> "vram" carve out) on APUs).
>
> You can also adjust the page table block size which controls the
> number of pages per PTB. I.e., how many PDEs you need to cover the
> address space. E.g., if you set the block size to 0, each PTB is 4k
> which holds 512 PTEs; if you set it to 1 each PTB is 8k which holds
> 1024 PTEs, etc.
>
> GPUVM is only concerned with memory management and protection. There
> are other protection features in other hw blocks for things beyond
> memory. For example, on CI and newer asics, the CP and SDMA blocks
> execute command buffers in a secure mode that limits them to accessing
> only registers that are relevant for those blocks (e.g., gfx or
> compute state registers, but not display registers) or only executing
> certain packets.
>
> I hope this helps. Let me know if you have any more questions.
>
> Alex
>
> >
> >
> > thank you,
> > Jan
> >
> > [0]http://developer.amd.com/resources/documentation-articles/developer-guides-manuals/
> > [1]http://lists.freedesktop.org/archives/dri-devel/2014-May/058858.html
> >
> >
> > --
> > Jan Vesely <jan.vesely at rutgers.edu>
--
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20150220/3dbff250/attachment.sig>
More information about the dri-devel
mailing list