Design session notes: GPU acceleration in Xen

Tue Jun 18 14:12:11 UTC 2024

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Tue, Jun 18, 2024 at 08:33:38AM +0200, Christian König wrote:
> Am 18.06.24 um 02:57 schrieb Demi Marie Obenour:
> > On Mon, Jun 17, 2024 at 10:46:13PM +0200, Marek Marczykowski-Górecki
> > wrote:
> > > On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote:
> > >> On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
> > >>> In both cases, the device physical
> > >>> addresses are identical to dom0’s physical addresses.
> > >>
> > >> Yes, but a PV dom0 physical address space can be very scattered.
> > >>
> > >> IIRC there's an hypercall to request physically contiguous memory for
> > >> PV, but you don't want to be using that every time you allocate a
> > >> buffer (not sure it would support the sizes needed by the GPU
> > >> anyway).
> > 
> > > Indeed that isn't going to fly. In older Qubes versions we had PV
> > > sys-net with PCI passthrough for a network card. After some uptime it
> > > was basically impossible to restart and still have enough contagious
> > > memory for a network driver, and there it was about _much_ smaller
> > > buffers, like 2M or 4M. At least not without shutting down a lot more
> > > things to free some more memory.
> > 
> > Ouch!  That makes me wonder if all GPU drivers actually need physically
> > contiguous buffers, or if it is (as I suspect) driver-specific. CCing
> > Christian König who has mentioned issues in this area.
> 
> Well GPUs don't need physical contiguous memory to function, but if they
> only get 4k pages to work with it means a quite large (up to 30%)
> performance penalty.

The status quo is "no GPU acceleration at all", so 70% of bare metal
performance would be amazing right now.  However, the implementation
should not preclude eliminating this performance penalty in the future.

What size pages do GPUs need for good performance?  Is it the same as
CPU huge pages?  PV dom0 doesn't get huge pages at all, but PVH and HVM
guests do, and the goal is to move away from PV guests as they have lots
of unrelated problems.

> So scattering memory like you described is probably a very bad idea if you
> want any halve way decent performance.

For an initial prototype a 30% performance penalty is acceptable, but
it's good to know that memory fragmentation needs to be avoided.

> Regards,
> Christian

Thanks for the prompt response!
- -- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEdodNnxM2uiJZBxxxsoi1X/+cIsEFAmZxlbsACgkQsoi1X/+c
IsG+WhAA00y83cU94MMJCuDMqTCSOgJraPchvQHLBuMIB0cJkIbVxhA2T4yuvVZy
Bzg/oVvWJH8B+p47HHo6uyjoPoeO659q8Hyea6zT8yMrKhiwOF8UxFRyxakdYHRs
l793sCwUtMFwkJdsfacTSKjL6sMktWhicvOqX4rA/SIVpwzZh1auFjAIrZ2BENb/
YIRH18Dfl2iEOA2W3TQTNiaqLeT2qtYspDVVLuUeAe7OAFCJVSkeMpAPPR15jCzm
Ou0HP6JP2jH6h7Shd09ns+3UvQK4xaygpvEsj+BwpXPf2CDNgypKHezqgF1WMzCc
HGXK1deGXE35XNH4EL5jgRlF7FmLT54CXuMpPIGbfNWbT2fvpoS2tyrdQPHxwgr8
lqqqfjugZ9qzbqA4v/m+v0cKFclMvSYL8Rzn+tbz8kAFf7VTglypY55RIIStdnSZ
sLYStA6qv8Mcu4NHYvdGeatTS26XR72X+dB5ApTn4dLLttnzbXMAyqDSTys28XQb
jeHnh1uTOLChODJHu5prHJ6bN0MxmISwFuot58gW/iI0spyihRhPNjZ/6E/7BpIm
8AGiT+p96dvaymLB5k6dqj5ruqVPP8HLBibB8zafzJn3JIJpjCZm9HM5YcO7xMQ2
92ZNZ/XOswah+0s6MyWDCsU8jKnhQ87ESnB4JItI5skKj+001Jg=
=ddxn
-----END PGP SIGNATURE-----