Question about lima kernel MM implementation
Liviu Dudau
liviu at dudau.co.uk
Thu Feb 15 14:21:25 UTC 2018
On Tue, Feb 13, 2018 at 09:34:26PM +0800, Qiang Yu wrote:
> Hi guys,
>
> I'm working on the Lima project for ARM mali400/450 GPU. Now lima
> kernel driver uses CMA for all buffers, but mali400/450 GPU has MMU
> for each vertex/fragment shader processor, so I want to refine the lima
> kernel driver for non-contiguous memory support.
>
> After some investigation on current available MM method used by
> several linux kernel DRM driver, I can't find an exactly match one for
> lima. So I'd like to hear some advise from you and see if I have some
> miss understanding on current MMs and if there's better approach.
> If can't use existing MM, I may have to write one for lima.
>
> About Mali400/450 GPU:
> 1. it has separate vertex and fragment shader processors, 1 vertex
> processor and 1~4 fragment processors are grouped to process an
> OpenGL draw
> 2. each processor has an MMU work independently
> 3. Mali400/450 will work with different display DRM driver, some
> display DRM driver support non-contiguous framebuffer and some
> not
>
> My requirement:
> 1. support non-contiguous memory allocation as GPU buffer
> 2. support contiguous memory allocation too for exporting to some
> display DRM driver as framebuffer
> 3. no GPU page fault for better performance and avoid multi MMU
> page fault handling, CPU page fault is OK
> 4. better have buffer swap to disk feature when memory is full
>
> Current MM:
> 1. drm_gem_cma_object, only support contiguous memory
Please note that drm_gem_cma_object only looks at memory after the MMU
has done the mapping. If you have a good IOMMU driver that registers
correctly the dma_ops then you can allocate memory from anywhere and
still import it into the lima driver via drm_gem_cma_prime_import_sg_table()
hook attached to the gem_prime_import_sg_table.
> 2. drm_gem_get_pages
> 1) need to combine with cma method for contiguous memory
> 2) when shrink is needed, swap some idle buffer to disk and put
> pages, need implement by myself
> 3) additional shmem layer introduced
> 3. TTM TTM_PL_SYSTEM only
> 1) no contiguous memory support
> 2) too complicated as we don't need other functions of TTM
> 3) need GPU page fault to populate memory?
> 4) no page pool for cached memory
>
> My plan:
> 1. for contiguous memory allocation use dma_alloc_*
> 2. for non-contiguous memory allocation, use a page pool from
> alloc_page
You should probably try to figure out who is your primary memory allocator.
Most of the times you don't want the GPU driver to allocate the memory,
you want that to come from a library that takes into account all the
constraints of the devices in the chain (GPU + display driver). There is
more to memory allocation for GPU than contiguous memory (alignment,
buffer encoding, etc).
Best regards,
Liviu
> 3. buffer is not really allocated when GEM_CREATE, but in CPU
> page fault handler and task submit buffer validation which make
> sure no GPU page fault
> 4. in shrinker handler, free un-used page in the pool, if still not
> enough, swap some idle buffer to disk
>
> 3&4 apply to both dma_alloc buffer and alloc_page buffer.
>
> Thanks,
> Qiang
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--
/`\
/ : |
_.._ | '/
/` \ | /
| .-._ '-"` (
|_/ / o o\
| == () ==
\ -- / ______________________________________
/ ---<_ ________| |_______
| \\ \ | I would like to fix the world but | /
| | \\__ \ | no one gives me the source code. | /
/ ; |.__) / |______________________________________| \
(_/.-. ; /__________) (_________\
{ `| \_/
'-\ / |
| / |
/ \ '-.
\__|-----'
More information about the dri-devel
mailing list