<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta name="Generator" content="Microsoft Word 15 (filtered medium)"> <style></style> </head> <body lang="EN-CA" link="blue" vlink="purple" style="word-wrap:break-word"> <div class="WordSection1"> <o:p> </o:p> <o:p> </o:p> <div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt"> <div> <div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm"> From: Christian König <christian.koenig@amd.com> Sent: Tuesday, February 27, 2024 1:54 AM To: Zeng, Oak <oak.zeng@intel.com>; Danilo Krummrich <dakr@redhat.com>; Dave Airlie <airlied@redhat.com>; Daniel Vetter <daniel@ffwll.ch>; Felix Kuehling <felix.kuehling@amd.com>; jglisse@redhat.com Cc: Welty, Brian <brian.welty@intel.com>; dri-devel@lists.freedesktop.org; intel-xe@lists.freedesktop.org; Bommu, Krishnaiah <krishnaiah.bommu@intel.com>; Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Thomas.Hellstrom@linux.intel.com; Vishwanathapura, Niranjana <niranjana.vishwanathapura@intel.com>; Brost, Matthew <matthew.brost@intel.com>; Gupta, saurabhg <saurabhg.gupta@intel.com> Subject: Re: Making drm_gpuvm work across gpu devices<o:p></o:p> </div> </div> <o:p> </o:p> Hi Oak,<o:p></o:p> <div> Am 23.02.24 um 21:12 schrieb Zeng, Oak:<o:p></o:p> </div> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> Hi Christian,<o:p></o:p> <o:p></o:p> I go back this old email to ask a question.<o:p></o:p> </blockquote> sorry totally missed that one. <o:p></o:p> <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt"> <o:p></o:p> Quote from your email:<o:p></o:p> “Those ranges can then be used to implement the SVM feature required for higher level APIs and not something you need at the UAPI or even inside the low level kernel memory management.”<o:p></o:p> “SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have any influence on the design of the kernel UAPI.”<o:p></o:p> <o:p></o:p> There are two category of SVM:<o:p></o:p> <ol style="margin-top:0cm" start="1" type="1"> <li class="MsoListParagraph" style="margin-left:0cm;mso-list:l1 level1 lfo3">driver svm allocator: this is implemented in user space, i.g., cudaMallocManaged (cuda) or zeMemAllocShared (L0) or clSVMAlloc(openCL). Intel already have gem_create/vm_bind in xekmd and our umd implemented clSVMAlloc and zeMemAllocShared on top of gem_create/vm_bind. Range A..B of the process address space is mapped into a range C..D of the GPU address space, exactly as you said.<o:p></o:p></li><li class="MsoListParagraph" style="margin-left:0cm;mso-list:l1 level1 lfo3">system svm allocator: This doesn’t introduce extra driver API for memory allocation. Any valid CPU virtual address can be used directly transparently in a GPU program without any extra driver API call. Quote from kernel Documentation/vm/hmm.hst: “Any application memory region (private anonymous, shared memory, or regular file backed memory) can be used by a device transparently” and “to share the address space by duplicating the CPU page table in the device page table so the same address points to the same physical memory for any valid main memory address in the process address space”. In system svm allocator, we don’t need that A..B C..D mapping.<o:p></o:p></li></ol> <o:p></o:p> It looks like you were talking of 1). Were you?<o:p></o:p> </blockquote> No, even when you fully mirror the whole address space from a process into the GPU you still need to enable this somehow with an IOCTL. And while enabling this you absolutely should specify to which part of the address space this mirroring applies and where it maps to.<o:p></o:p> <o:p> </o:p> <o:p> </o:p> Lets say we have a hardware platform where both CPU and GPU support 57bit virtual address range, how do you decide “which part of the address space this mirroring applies”? You have to mirror the whole address space (0~2^57-1), do you? As you designed it, the gigantic vm_bind/mirroring happens at the process initialization time, and at that time, you don’t know which part of the address space will be used for gpu program. <o:p></o:p> I see the system svm allocator as just a special case of the driver allocator where not fully backed buffer objects are allocated, but rather sparse one which are filled and migrated on demand.<o:p></o:p> <o:p> </o:p> Above statement is true to me. We don’t have BO for system svm allocator. It is a sparse one as we don’t map the whole vma to GPU. Our migration policy decide which pages/how much of the vma is migrated/mapped to GPU page table.<o:p></o:p> <o:p> </o:p> The difference b/t your mind and mine is, you want a gigantic vma (created during the gigantic vm_bind) to be sparsely populated to gpu. While I thought vma (xe_vma in xekmd codes) is a place to save memory attributes (such as caching, user preferred placement etc). All those memory attributes are range based, i.e., user can specify range1 is cached while range2 is uncached. So I don’t see how you can manage it with the gigantic vma.<o:p></o:p> <o:p> </o:p> Regards,<o:p></o:p> Oak <o:p></o:p> Regards, Christian. <o:p></o:p> <o:p> </o:p> </div> </div> </body> </html>