[PATCH RFC 00/24] Lima DRM driver

Thu May 24 07:25:48 UTC 2018

On Thu, May 24, 2018 at 8:27 AM, Christian König
<christian.koenig at amd.com> wrote:
> Am 24.05.2018 um 02:31 schrieb Qiang Yu:
>>
>> On Wed, May 23, 2018 at 11:44 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
>>>
>>> On Wed, May 23, 2018 at 3:52 PM, Qiang Yu <yuq825 at gmail.com> wrote:
>>>>
>>>> On Wed, May 23, 2018 at 5:29 PM, Christian König
>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>
>>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu:
>>>>>>
>>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs.
>>>>>>
>>>>>> This implementation mainly take amdgpu DRM driver as reference.
>>>>>>
>>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>>>>>>     OpenGL vertex shader processing and PP is for fragment shader
>>>>>>     processing. Each processor has its own MMU so prcessors work in
>>>>>>     virtual address space.
>>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>>>>>>     for mali 450) in the same mali 4xx GPU. All PPs are grouped
>>>>>>     togather to handle a single fragment shader task divided by
>>>>>>     FB output tiled pixels. Mali 400 user space driver is
>>>>>>     responsible for assign target tiled pixels to each PP, but mali
>>>>>>     450 has a HW module called DLBU to dynamically balance each
>>>>>>     PP's load.
>>>>>> - User space driver allocate buffer object and map into GPU
>>>>>>     virtual address space, upload command stream and draw data with
>>>>>>     CPU mmap of the buffer object, then submit task to GP/PP with
>>>>>>     a register frame indicating where is the command stream and misc
>>>>>>     settings.
>>>>>> - There's no command stream validation/relocation due to each user
>>>>>>     process has its own GPU virtual address space. GP/PP's MMU switch
>>>>>>     virtual address space before running two tasks from different
>>>>>>     user process. Error or evil user space code just get MMU fault
>>>>>>     or GP/PP error IRQ, then the HW/SW will be recovered.
>>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of
>>>>>>     lima buffer object which is allocated from TTM page pool. all
>>>>>>     lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when
>>>>>>     allocation, so there's no buffer eviction and swap for now. We
>>>>>>     need reverse engineering to see if and how GP/PP support MMU
>>>>>>     fault recovery (continue execution). Otherwise we have to
>>>>>>     pin/unpin each envolved buffer when task creation/deletion.
>>>>>
>>>>>
>>>>> Well pinning all memory is usually a no-go for upstreaming. But since
>>>>> you
>>>>> are already using the drm_sched for GPU task scheduling why are you
>>>>> actually
>>>>> needing this?
>>>>>
>>>>> The scheduler should take care of signaling all fences when the
>>>>> hardware is
>>>>> done with it's magic and that is enough for TTM to note that a buffer
>>>>> object
>>>>> is movable again (e.g. unpin them).
>>>>
>>>> Please correct me if I'm wrong.
>>>>
>>>> One way to implement eviction/swap is like this:
>>>> call validation on each buffers involved in a task, but this won't
>>>> prevent it from
>>>> eviction/swap when executing, so a GPU MMU fault may happen and in the
>>>> handler we need to recover the buffer evicted/swapped.
>>>>
>>>> Another way is pin/unpin buffers evolved when task create/free.
>>>>
>>>> First way is better when memory load is low and second way is better
>>>> when
>>>> memory load is high. First way also need less memory.
>>>>
>>>> So I'd prefer first way but due to the GPU MMU fault
>>>> HW op need reverse engineering, I have to pin all buffers now. After
>>>> the HW op is clear, I can choose one way to implement.
>>>
>>> All the drivers using ttm have something that looks like vram, or a
>>> requirement to move buffers around. Afaiui that includes virtio drm
>>> driver.
>>
>> Does virtio drm driver need to move buffers around? amdgpu also
>> has no vram when APU.

Afaiui APUs have a range of stolen memory which looks and acts and is
managed like discrete vram. Including moving buffers around.

>>>  From your description you don't have such a requirement, and
>>> then doing what etnaviv has done would be a lot simpler. Everything
>>> that's not related to buffer movement handling is also available
>>> outside of ttm already.
>>
>> Yeah, I could do like etnaviv, but it's not simpler than using ttm
>> directly especially want some optimization (like ttm page pool,
>> ttm_eu_reserve_buffers, ttm_bo_mmap). If I have/want to implement
>> them, why not just use TTM directly with all those helper functions.
>
>
> Well TTM has some design flaws (e.g. heavily layered design etc...), but it
> also offers some rather nice functionality.

Yeah, but I still think that for non-discrete drivers just moving a
bunch of more of the neat ttm functionality into helpers where
everyone can use them (instead of the binary ttm y/n decision) would
be much better. E.g. the allocator pool definitely sounds like
something gem helpers should be able to do, same for reserving a pile
of buffers or default mmap implementations. A lot of that also exists
already, thanks to lots of efforts from Noralf Tronnes and others.

I think ideally the long-term goal would be to modularize ttm concepts
as much as possible, so that drivers can flexibly pick&choose the bits
they need. We're slowly getting there (but definitely not yet there if
you need to manage discrete vram I think).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch