<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <body> <div><br> </div> <div dir="auto" id="mail-editor-reference-message-container"><br> <hr style="display:inline-block;width:98%"> <div id="divRplyFwdMsg" style="font-size: 11pt;"><strong>From:</strong> Intel-xe <intel-xe-bounces@lists.freedesktop.org> on behalf of Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com><br> <strong>Sent:</strong> Friday, February 14, 2025 2:38:10 pm<br> <strong>To:</strong> Thomas Hellström <thomas.hellstrom@linux.intel.com>; Demi Marie Obenour <demi@invisiblethingslab.com>; Brost, Matthew <matthew.brost@intel.com>; intel-xe@lists.freedesktop.org <intel-xe@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org><br> <strong>Cc:</strong> apopple@nvidia.com <apopple@nvidia.com>; airlied@gmail.com <airlied@gmail.com>; simona.vetter@ffwll.ch <simona.vetter@ffwll.ch>; felix.kuehling@amd.com <felix.kuehling@amd.com>; dakr@kernel.org <dakr@kernel.org><br> <strong>Subject:</strong> Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation<br> </div> <br> <div dir="auto">k,</div> <div dir="auto">asx.ddk</div> <div dir="auto"><br> </div> <div dir="auto">please ignore</div> <hr tabindex="-1" style="display:inline-block; width:98%"> <div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size: 11pt; color: rgb(0, 0, 0);"><b>From:</b> Thomas Hellström <thomas.hellstrom@linux.intel.com><br> <b>Sent:</b> Friday, February 14, 2025 2:17:13 PM<br> <b>To:</b> Demi Marie Obenour <demi@invisiblethingslab.com>; Brost, Matthew <matthew.brost@intel.com>; intel-xe@lists.freedesktop.org <intel-xe@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org><br> <b>Cc:</b> Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; apopple@nvidia.com <apopple@nvidia.com>; airlied@gmail.com <airlied@gmail.com>; simona.vetter@ffwll.ch <simona.vetter@ffwll.ch>; felix.kuehling@amd.com <felix.kuehling@amd.com>; dakr@kernel.org <dakr@kernel.org><br> <b>Subject:</b> Re: [PATCH v5 00/32] Introduce GPU SVM and Xe SVM implementation</font> <div> </div> </div> <div class="BodyFragment"><font size="2"><span style="font-size:11pt"> <div class="PlainText">Hi<br> <br> On Thu, 2025-02-13 at 16:23 -0500, Demi Marie Obenour wrote:<br> > On Wed, Feb 12, 2025 at 06:10:40PM -0800, Matthew Brost wrote:<br> > > Version 5 of GPU SVM. Thanks to everyone (especially Sima, Thomas,<br> > > Alistair, Himal) for their numerous reviews on revision 1, 2, 3 <br> > > and for<br> > > helping to address many design issues.<br> > > <br> > > This version has been tested with IGT [1] on PVC, BMG, and LNL.<br> > > Also<br> > > tested with level0 (UMD) PR [2].<br> > <br> > What is the plan to deal with not being able to preempt while a page<br> > fault is pending? This seems like an easy DoS vector. My<br> > understanding<br> > is that SVM is mostly used by compute workloads on headless systems.<br> > Recent AMD client GPUs don't support SVM, so programs that want to<br> > run<br> > on client systems should not require SVM if they wish to be portable.<br> > <br> > Given the potential for abuse, I think it would be best to require<br> > explicit administrator opt-in to enable SVM, along with possibly<br> > having<br> > a timeout to resolve a page fault (after which the context is<br> > killed).<br> > Since I expect most uses of SVM to be in the datacenter space (for<br> > the<br> > reasons mentioned above), I don't believe this will be a major<br> > limitation in practice. Programs that wish to run on client systems<br> > already need to use explicit memory transfer or pinned userptr, and<br> > administrators of compute clusters should be willing to enable this<br> > feature because only one workload will be using a GPU at a time.<br> <br> While not directly having addressed the potential DoS issue you<br> mention, there is an associated deadlock possibility that may happen<br> due to not being able to preempt a pending pagefault. That is if a dma-<br> fence job is requiring the same resources held up by the pending page-<br> fault, and then the pagefault servicing is dependent on that dma-fence<br> to be signaled in one way or another.<br> <br> That deadlock is handled by only allowing either page-faulting jobs or<br> dma-fence jobs on a resource (hw engine or hw engine group) that can be<br> used by both at a time, blocking synchronously in the exec IOCTL until<br> the resource is available for the job type. That means LR jobs waits<br> for all dma-fence jobs to complete, and dma-fence jobs wait for all LR<br> jobs to preempt. So a dma-fence job wait could easily mean "wait for<br> all outstanding pagefaults to be serviced".<br> <br> Whether, on the other hand, that is a real DoS we need to care about,<br> is probably a topic for debate. The directions we've had so far are<br> that it's not. Nothing is held up indefinitely, what's held up can be<br> Ctrl-C'd by the user and core mm memory management is not blocked since<br> mmu_notifiers can execute to completion and shrinkers / eviction can<br> execute while a page-fault is pending.<br> <br> Thanks,<br> Thomas<br> <br> </div> </span></font></div> <br> </div> </body> </html>