vm binding interfaces and parallel with mmap

Mon Aug 22 13:27:25 UTC 2022

Am 22.08.22 um 10:34 schrieb Bas Nieuwenhuizen:
> On Mon, Aug 22, 2022 at 9:28 AM Dave Airlie <airlied at gmail.com> wrote:
>> On Mon, 22 Aug 2022 at 17:05, Dave Airlie <airlied at gmail.com> wrote:
>>> Hey,
>>>
>>> I've just been looking at the vm bind type interfaces and wanted to at
>>> least document how we think the unmapping API should work. I know I've
>>> talked on irc before about this, but wanted to solidify things a bit
>>> more around what is required vs what is a nice to have.
>>>
>>> My main concerns/thoughts are around the unbind interfaces and how
>>> close to munmap they should be.
>>>
>>> I think the mapping operation is mostly consistent
>>> MAP(bo handle, offset into bo, range, VM offset, VM flags)
>>> which puts the range inside to bo at the offset in the current VM
>>> (maybe take an optional vm_id).
>>>
>>> now the simplest unmap I can see if one that parallel munmap
>>> UNMAP(vmaddr, range);
>>>
>>> But it begs the question on then how much the kernel needs to deal
>>> with here, if we support random vmaddr,range then we really need to be
>>> able to do everything munmap does for CPU VMA, which means splitting
>>> ranges, joining ranges etc.
>>>
>>> like
>>> MAP(1, 0, 0x8000, 0xc0000)
>>> UNMAP(0xc1000, 0x1000)
>>> should that be possible?
>>>
>>> Do we have any API usage (across Vulkan/CL/CUDA/ROCm etc) that
>>> requires this sort of control, or should we be fine with only
>>> unmapping objects exactly like how they were mapped in the first
>>> place, and not have any splitting/joining?
> Vulkan allows for this, though I haven't checked to what extent apps use it.

This is massively used for partial resident textures under OpenGL as far 
as I know.

E.g. you map a range like 1->10 as PRT and then then map real textures 
at 2, 5 and 7 or something like that.

Saying that a functionality to map/enable PRT for a range is necessary 
as well. On amdgpu we have a special flag for that and in this case the 
BO to map can be NULL.

> We could technically split all mapping/unmapping to be per single tile
> in the userspace driver, which avoids the need for splitting/merging,
> but that could very much be a pessimization.

That would be pretty much a NAK from my side. A couple of hardware 
optimizations require mappings to be as large as possible.

Otherwise we wouldn't be able to use huge/giant (2MiB, 1GiB) pages, 
power of two TLB reach optimizations (8KiB, 16KiB, 32KiB.....) as well 
as texture fetcher optimizations.

>> I suppose it also asks the question around paralleling
>>
>> fd = open()
>> ptr = mmap(fd,)
>> close(fd)
>> the mapping is still valid.
>>
>> I suppose our equiv is
>> handle = bo_alloc()
>> gpu_addr = vm_bind(handle,)
>> gem_close(handle)
>> is the gpu_addr still valid does the VM hold a reference on the kernel
>> bo internally.
> For Vulkan it looks like this is undefined and the above is not necessary:
>
> "It is important to note that freeing a VkDeviceMemory object with
> vkFreeMemory will not cause resources (or resource regions) bound to
> the memory object to become unbound. Applications must not access
> resources bound to memory that has been freed."
> (32.7.6)

Additional to what was discussed here so far we need an array on in and 
out drm_syncobj for both map as well as unmap.

E.g. when the mapping/unmapping should happen and when it is completed 
etc...

Christian.

>
>
>> Dave.
>>> Dave.