<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<div><br>
</div>
<div dir="auto" id="mail-editor-reference-message-container"><br>
<hr style="display:inline-block;width:98%">
<div id="divRplyFwdMsg" style="font-size: 11pt;"><strong>From:</strong> Intel-xe <intel-xe-bounces@lists.freedesktop.org> on behalf of Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com><br>
<strong>Sent:</strong> Friday, February 14, 2025 2:38:10 pm<br>
<strong>To:</strong> Thomas Hellström <thomas.hellstrom@linux.intel.com>; Demi Marie Obenour <demi@invisiblethingslab.com>; Brost, Matthew <matthew.brost@intel.com>; intel-xe@lists.freedesktop.org <intel-xe@lists.freedesktop.org>; dri-devel@lists.freedesktop.org
<dri-devel@lists.freedesktop.org><br>
<strong>Cc:</strong> apopple@nvidia.com <apopple@nvidia.com>; airlied@gmail.com <airlied@gmail.com>; simona.vetter@ffwll.ch <simona.vetter@ffwll.ch>; felix.kuehling@amd.com <felix.kuehling@amd.com>; dakr@kernel.org <dakr@kernel.org><br>
<strong>Subject:</strong> Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation<br>
</div>
<br>
<div dir="auto">k,</div>
<div dir="auto">asx.ddk</div>
<div dir="auto"><br>
</div>
<div dir="auto">please ignore</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size: 11pt; color: rgb(0, 0, 0);"><b>From:</b> Thomas Hellström <thomas.hellstrom@linux.intel.com><br>
<b>Sent:</b> Friday, February 14, 2025 2:17:13 PM<br>
<b>To:</b> Demi Marie Obenour <demi@invisiblethingslab.com>; Brost, Matthew <matthew.brost@intel.com>; intel-xe@lists.freedesktop.org <intel-xe@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org><br>
<b>Cc:</b> Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; apopple@nvidia.com <apopple@nvidia.com>; airlied@gmail.com <airlied@gmail.com>; simona.vetter@ffwll.ch <simona.vetter@ffwll.ch>; felix.kuehling@amd.com <felix.kuehling@amd.com>; dakr@kernel.org
<dakr@kernel.org><br>
<b>Subject:</b> Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt">
<div class="PlainText">Hi<br>
<br>
On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote:<br>
> On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote:<br>
> > Version 5 of GPU SVM. Thanks to everyone (especially Sima, Thomas,<br>
> > Alistair, Himal) for their numerous reviews on revision 1, 2, 3 <br>
> > and for<br>
> > helping to address many design issues.<br>
> > <br>
> > This version has been tested with IGT [1] on PVC, BMG, and LNL.<br>
> > Also<br>
> > tested with level0 (UMD) PR [2].<br>
> <br>
> What is the plan to deal with not being able to preempt while a page<br>
> fault is pending? This seems like an easy DoS vector. My<br>
> understanding<br>
> is that SVM is mostly used by compute workloads on headless systems.<br>
> Recent AMD client GPUs don't support SVM, so programs that want to<br>
> run<br>
> on client systems should not require SVM if they wish to be portable.<br>
> <br>
> Given the potential for abuse, I think it would be best to require<br>
> explicit administrator opt-in to enable SVM, along with possibly<br>
> having<br>
> a timeout to resolve a page fault (after which the context is<br>
> killed).<br>
> Since I expect most uses of SVM to be in the datacenter space (for<br>
> the<br>
> reasons mentioned above), I don't believe this will be a major<br>
> limitation in practice. Programs that wish to run on client systems<br>
> already need to use explicit memory transfer or pinned userptr, and<br>
> administrators of compute clusters should be willing to enable this<br>
> feature because only one workload will be using a GPU at a time.<br>
<br>
While not directly having addressed the potential DoS issue you<br>
mention, there is an associated deadlock possibility that may happen<br>
due to not being able to preempt a pending pagefault. That is if a dma-<br>
fence job is requiring the same resources held up by the pending page-<br>
fault, and then the pagefault servicing is dependent on that dma-fence<br>
to be signaled in one way or another.<br>
<br>
That deadlock is handled by only allowing either page-faulting jobs or<br>
dma-fence jobs on a resource (hw engine or hw engine group) that can be<br>
used by both at a time, blocking synchronously in the exec IOCTL until<br>
the resource is available for the job type. That means LR jobs waits<br>
for all dma-fence jobs to complete, and dma-fence jobs wait for all LR<br>
jobs to preempt. So a dma-fence job wait could easily mean "wait for<br>
all outstanding pagefaults to be serviced".<br>
<br>
Whether, on the other hand, that is a real DoS we need to care about,<br>
is probably a topic for debate. The directions we've had so far are<br>
that it's not. Nothing is held up indefinitely, what's held up can be<br>
Ctrl-C'd by the user and core mm memory management is not blocked since<br>
mmu_notifiers can execute to completion and shrinkers / eviction can<br>
execute while a page-fault is pending.<br>
<br>
Thanks,<br>
Thomas<br>
<br>
</div>
</span></font></div>
<br>
</div>
</body>
</html>