Question about lima kernel MM implementation

Wed Feb 14 12:26:05 UTC 2018

On Tue, Feb 13, 2018 at 8:34 AM, Qiang Yu <yuq825 at gmail.com> wrote:
> Hi guys,
>
> I'm working on the Lima project for ARM mali400/450 GPU. Now lima
> kernel driver uses CMA for all buffers, but mali400/450 GPU has MMU
> for each vertex/fragment shader processor, so I want to refine the lima
> kernel driver for non-contiguous memory support.
>
> After some investigation on current available MM method used by
> several linux kernel DRM driver, I can't find an exactly match one for
> lima. So I'd like to hear some advise from you and see if I have some
> miss understanding on current MMs and if there's better approach.
> If can't use existing MM, I may have to write one for lima.
>
> About Mali400/450 GPU:
> 1. it has separate vertex and fragment shader processors, 1 vertex
> processor and 1~4 fragment processors are grouped to process an
> OpenGL draw
> 2. each processor has an MMU work independently
> 3. Mali400/450 will work with different display DRM driver, some
> display DRM driver support non-contiguous framebuffer and some
> not
>
> My requirement:
> 1. support non-contiguous memory allocation as GPU buffer
> 2. support contiguous memory allocation too for exporting to some
> display DRM driver as framebuffer

btw, I think etnaviv deals w/ contiguous scanout buffer by just
importing the scanout buffer from the other display drm driver.  So I
think you could avoid having to allocate these buffers.

(iirc, etnaviv does need contiguous buffers internally for a few
things, like cmdstream (?) and mmu pagetables)

BR,
-R

> 3. no GPU page fault for better performance and avoid multi MMU
> page fault handling, CPU page fault is OK
> 4. better have buffer swap to disk feature when memory is full
>
> Current MM:
> 1. drm_gem_cma_object, only support contiguous memory
> 2. drm_gem_get_pages
>   1) need to combine with cma method for contiguous memory
>   2) when shrink is needed, swap some idle buffer to disk and put
>       pages, need implement by myself
>   3) additional shmem layer introduced
> 3. TTM TTM_PL_SYSTEM only
>   1) no contiguous memory support
>   2) too complicated as we don't need other functions of TTM
>   3) need GPU page fault to populate memory?
>   4) no page pool for cached memory
>
> My plan:
> 1. for contiguous memory allocation use dma_alloc_*
> 2. for non-contiguous memory allocation, use a page pool from
> alloc_page
> 3. buffer is not really allocated when GEM_CREATE, but in CPU
> page fault handler and task submit buffer validation which make
> sure no GPU page fault
> 4. in shrinker handler, free un-used page in the pool, if still not
> enough, swap some idle buffer to disk
>
> 3&4 apply to both dma_alloc buffer and alloc_page buffer.
>
> Thanks,
> Qiang
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>