Question about lima kernel MM implementation

Tue Feb 13 13:34:26 UTC 2018

Hi guys,

I'm working on the Lima project for ARM mali400/450 GPU. Now lima
kernel driver uses CMA for all buffers, but mali400/450 GPU has MMU
for each vertex/fragment shader processor, so I want to refine the lima
kernel driver for non-contiguous memory support.

After some investigation on current available MM method used by
several linux kernel DRM driver, I can't find an exactly match one for
lima. So I'd like to hear some advise from you and see if I have some
miss understanding on current MMs and if there's better approach.
If can't use existing MM, I may have to write one for lima.

About Mali400/450 GPU:
1. it has separate vertex and fragment shader processors, 1 vertex
processor and 1~4 fragment processors are grouped to process an
OpenGL draw
2. each processor has an MMU work independently
3. Mali400/450 will work with different display DRM driver, some
display DRM driver support non-contiguous framebuffer and some
not

My requirement:
1. support non-contiguous memory allocation as GPU buffer
2. support contiguous memory allocation too for exporting to some
display DRM driver as framebuffer
3. no GPU page fault for better performance and avoid multi MMU
page fault handling, CPU page fault is OK
4. better have buffer swap to disk feature when memory is full

Current MM:
1. drm_gem_cma_object, only support contiguous memory
2. drm_gem_get_pages
  1) need to combine with cma method for contiguous memory
  2) when shrink is needed, swap some idle buffer to disk and put
      pages, need implement by myself
  3) additional shmem layer introduced
3. TTM TTM_PL_SYSTEM only
  1) no contiguous memory support
  2) too complicated as we don't need other functions of TTM
  3) need GPU page fault to populate memory?
  4) no page pool for cached memory

My plan:
1. for contiguous memory allocation use dma_alloc_*
2. for non-contiguous memory allocation, use a page pool from
alloc_page
3. buffer is not really allocated when GEM_CREATE, but in CPU
page fault handler and task submit buffer validation which make
sure no GPU page fault
4. in shrinker handler, free un-used page in the pool, if still not
enough, swap some idle buffer to disk

3&4 apply to both dma_alloc buffer and alloc_page buffer.

Thanks,
Qiang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180213/d913384a/attachment-0001.html>