Design session notes: GPU acceleration in Xen

Wed Jun 19 16:56:35 UTC 2024

On Wed, Jun 19, 2024 at 12:27 PM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 18.06.24 um 16:12 schrieb Demi Marie Obenour:
> > On Tue, Jun 18, 2024 at 08:33:38AM +0200, Christian König wrote:
> > > Am 18.06.24 um 02:57 schrieb Demi Marie Obenour:
> > >> On Mon, Jun 17, 2024 at 10:46:13PM +0200, Marek Marczykowski-Górecki
> > >> wrote:
> > >>> On Mon, Jun 17, 2024 at 09:46:29AM +0200, Roger Pau Monné wrote:
> > >>>> On Sun, Jun 16, 2024 at 08:38:19PM -0400, Demi Marie Obenour wrote:
> > >>>>> In both cases, the device physical
> > >>>>> addresses are identical to dom0’s physical addresses.
> > >>>>
> > >>>> Yes, but a PV dom0 physical address space can be very scattered.
> > >>>>
> > >>>> IIRC there's an hypercall to request physically contiguous memory for
> > >>>> PV, but you don't want to be using that every time you allocate a
> > >>>> buffer (not sure it would support the sizes needed by the GPU
> > >>>> anyway).
> > >>
> > >>> Indeed that isn't going to fly. In older Qubes versions we had PV
> > >>> sys-net with PCI passthrough for a network card. After some uptime it
> > >>> was basically impossible to restart and still have enough contagious
> > >>> memory for a network driver, and there it was about _much_ smaller
> > >>> buffers, like 2M or 4M. At least not without shutting down a lot more
> > >>> things to free some more memory.
> > >>
> > >> Ouch!  That makes me wonder if all GPU drivers actually need physically
> > >> contiguous buffers, or if it is (as I suspect) driver-specific. CCing
> > >> Christian König who has mentioned issues in this area.
> >
> > > Well GPUs don't need physical contiguous memory to function, but if they
> > > only get 4k pages to work with it means a quite large (up to 30%)
> > > performance penalty.
> >
> > The status quo is "no GPU acceleration at all", so 70% of bare metal
> > performance would be amazing right now.
>
> Well AMD uses native context approach in XEN which which delivers over
> 90% of bare metal performance.
>
> Pierre-Eric can tell you more, but we certainly have GPU solutions in
> productions with XEN which would suffer greatly if we see the underlying
> memory fragmented like this.
>
> >   However, the implementation
> > should not preclude eliminating this performance penalty in the future.
> >
> > What size pages do GPUs need for good performance?  Is it the same as
> > CPU huge pages?
>
> 2MiB are usually sufficient.

Larger pages are helpful for both system memory and VRAM, but it's
more important for VRAM.

Alex

>
> Regards,
> Christian.
>
> >   PV dom0 doesn't get huge pages at all, but PVH and HVM
> > guests do, and the goal is to move away from PV guests as they have lots
> > of unrelated problems.
> >
> > > So scattering memory like you described is probably a very bad idea
> > if you
> > > want any halve way decent performance.
> >
> > For an initial prototype a 30% performance penalty is acceptable, but
> > it's good to know that memory fragmentation needs to be avoided.
> >
> > > Regards,
> > > Christian
> >
> > Thanks for the prompt response!
>