[PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages

Wed Jun 28 08:04:10 UTC 2023

Hi David,

> 
> On 27.06.23 08:37, Kasireddy, Vivek wrote:
> > Hi David,
> >
> 
> Hi!
> 
> sorry for taking a bit longer to reply lately.
No problem.

> 
> [...]
> 
> >>> Sounds right, maybe it needs to go back to the old GUP solution, though,
> as
> >>> mmu notifiers are also mm-based not fd-based. Or to be explicit, I think
> >>> it'll be pin_user_pages(FOLL_LONGTERM) with the new API.  It'll also
> solve
> >>> the movable pages issue on pinning.
> >>
> >> It better should be pin_user_pages(FOLL_LONGTERM). But I'm afraid we
> >> cannot achieve that without breaking the existing kernel interface ...
> > Yeah, as you suggest, we unfortunately cannot go back to using GUP
> > without breaking udmabuf_create UAPI that expects memfds and file
> > offsets.
> >
> >>
> >> So we might have to implement the same page migration as gup does on
> >> FOLL_LONGTERM here ... maybe there are more such cases/drivers that
> >> actually require that handling when simply taking pages out of the
> >> memfd, believing they can hold on to them forever.
> > IIUC, I don't think just handling the page migration in udmabuf is going to
> > cut it. It might require active cooperation of the Guest GPU driver as well
> > if this is even feasible.
> 
> The idea is, that once you extract the page from the memfd and it
> resides somewhere bad (MIGRATE_CMA, ZONE_MOVABLE), you trigger page
> migration. Essentially what migrate_longterm_unpinnable_pages() does:
So, IIUC, it looks like calling check_and_migrate_movable_pages() at the time
of creation (udmabuf_create) and when we get notified about something like
FALLOC_FL_PUNCH_HOLE will be all that needs to be done in udmabuf?

> 
> Why would the guest driver have to be involved? It shouldn't care about
> page migration in the hypervisor.
Yeah, it appears that the page migration would be transparent to the Guest
driver.

> 
> [...]
> 
> >> balloon, and then using that memory for communicating with the device]
> >>
> >> Maybe it's all fine with udmabuf because of the way it is setup/torn
> >> down by the guest driver. Unfortunately I can't tell.
> > Here are the functions used by virtio-gpu (Guest GPU driver) to allocate
> > pages for its resources:
> > __drm_gem_shmem_create:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_sh
> mem_helper.c#L97
> > Interestingly, the comment in the above function says that the pages
> > should not be allocated from the MOVABLE zone.
> 
> It doesn't add GFP_MOVABLE, so pages don't end up in
> ZONE_MOVABLE/MIGRATE_CMA *in the guest*. But we care about the
> ZONE_MOVABLE /MIGRATE_CMA *in the host*. (what the guest does is
> right,
> though)
> 
> IOW, what udmabuf does with guest memory on the hypervisor side, not the
> guest driver on the guest side.
Ok, got it.

> 
> > The pages along with their dma addresses are then extracted and shared
> > with Qemu using these two functions:
> > drm_gem_get_pages:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem.c#
> L534
> > virtio_gpu_object_shmem_init:
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/virtio/virtgpu
> _object.c#L135
> 
> ^ so these two target the guest driver as well, right? IOW, there is a
> memfd (shmem) in the guest that the guest driver uses to allocate pages
> from and there is the memfd in the hypervisor to back guest RAM.
> 
> The latter gets registered with udmabuf.
Yes, that's exactly what happens.

> 
> > Qemu then translates the dma addresses into file offsets and creates
> > udmabufs -- as an optimization to avoid data copies only if blob is set
> > to true.
> 
> If the guest OS doesn't end up freeing/reallocating that memory while
> it's registered with udmabuf in the hypervisor, then we should be fine.
IIUC, udmabuf does get notified when something like that happens.

Thanks,
Vivek

> 
> Because that way, the guest won't end up trigger MADV_DONTNEED by
> "accident".
> 
> --
> Cheers,
> 
> David / dhildenb