[PATCH v1 0/2] udmabuf: Add back support for mapping hugetlb pages

Tue Jun 27 07:10:32 UTC 2023

On 27.06.23 08:37, Kasireddy, Vivek wrote:
> Hi David,
> 

Hi!

sorry for taking a bit longer to reply lately.

[...]

>>> Sounds right, maybe it needs to go back to the old GUP solution, though, as
>>> mmu notifiers are also mm-based not fd-based. Or to be explicit, I think
>>> it'll be pin_user_pages(FOLL_LONGTERM) with the new API.  It'll also solve
>>> the movable pages issue on pinning.
>>
>> It better should be pin_user_pages(FOLL_LONGTERM). But I'm afraid we
>> cannot achieve that without breaking the existing kernel interface ...
> Yeah, as you suggest, we unfortunately cannot go back to using GUP
> without breaking udmabuf_create UAPI that expects memfds and file
> offsets.
> 
>>
>> So we might have to implement the same page migration as gup does on
>> FOLL_LONGTERM here ... maybe there are more such cases/drivers that
>> actually require that handling when simply taking pages out of the
>> memfd, believing they can hold on to them forever.
> IIUC, I don't think just handling the page migration in udmabuf is going to
> cut it. It might require active cooperation of the Guest GPU driver as well
> if this is even feasible.

The idea is, that once you extract the page from the memfd and it 
resides somewhere bad (MIGRATE_CMA, ZONE_MOVABLE), you trigger page 
migration. Essentially what migrate_longterm_unpinnable_pages() does:

Why would the guest driver have to be involved? It shouldn't care about
page migration in the hypervisor.

[...]

>> balloon, and then using that memory for communicating with the device]
>>
>> Maybe it's all fine with udmabuf because of the way it is setup/torn
>> down by the guest driver. Unfortunately I can't tell.
> Here are the functions used by virtio-gpu (Guest GPU driver) to allocate
> pages for its resources:
> __drm_gem_shmem_create: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem_shmem_helper.c#L97
> Interestingly, the comment in the above function says that the pages
> should not be allocated from the MOVABLE zone.

It doesn't add GFP_MOVABLE, so pages don't end up in 
ZONE_MOVABLE/MIGRATE_CMA *in the guest*. But we care about the 
ZONE_MOVABLE /MIGRATE_CMA *in the host*. (what the guest does is right, 
though)

IOW, what udmabuf does with guest memory on the hypervisor side, not the 
guest driver on the guest side.

> The pages along with their dma addresses are then extracted and shared
> with Qemu using these two functions:
> drm_gem_get_pages: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_gem.c#L534
> virtio_gpu_object_shmem_init: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/virtio/virtgpu_object.c#L135

^ so these two target the guest driver as well, right? IOW, there is a 
memfd (shmem) in the guest that the guest driver uses to allocate pages 
from and there is the memfd in the hypervisor to back guest RAM.

The latter gets registered with udmabuf.

> Qemu then translates the dma addresses into file offsets and creates
> udmabufs -- as an optimization to avoid data copies only if blob is set
> to true.

If the guest OS doesn't end up freeing/reallocating that memory while 
it's registered with udmabuf in the hypervisor, then we should be fine.

Because that way, the guest won't end up trigger MADV_DONTNEED by 
"accident".

-- 
Cheers,

David / dhildenb