[PATCH RFC 00/24] Lima DRM driver

Christian König christian.koenig at amd.com
Thu May 24 06:27:59 UTC 2018

Am 24.05.2018 um 02:31 schrieb Qiang Yu:
> On Wed, May 23, 2018 at 11:44 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> On Wed, May 23, 2018 at 3:52 PM, Qiang Yu <yuq825 at gmail.com> wrote:
>>> On Wed, May 23, 2018 at 5:29 PM, Christian König
>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu:
>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs.
>>>>> This implementation mainly take amdgpu DRM driver as reference.
>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>>>>>     OpenGL vertex shader processing and PP is for fragment shader
>>>>>     processing. Each processor has its own MMU so prcessors work in
>>>>>     virtual address space.
>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>>>>>     for mali 450) in the same mali 4xx GPU. All PPs are grouped
>>>>>     togather to handle a single fragment shader task divided by
>>>>>     FB output tiled pixels. Mali 400 user space driver is
>>>>>     responsible for assign target tiled pixels to each PP, but mali
>>>>>     450 has a HW module called DLBU to dynamically balance each
>>>>>     PP's load.
>>>>> - User space driver allocate buffer object and map into GPU
>>>>>     virtual address space, upload command stream and draw data with
>>>>>     CPU mmap of the buffer object, then submit task to GP/PP with
>>>>>     a register frame indicating where is the command stream and misc
>>>>>     settings.
>>>>> - There's no command stream validation/relocation due to each user
>>>>>     process has its own GPU virtual address space. GP/PP's MMU switch
>>>>>     virtual address space before running two tasks from different
>>>>>     user process. Error or evil user space code just get MMU fault
>>>>>     or GP/PP error IRQ, then the HW/SW will be recovered.
>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of
>>>>>     lima buffer object which is allocated from TTM page pool. all
>>>>>     lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when
>>>>>     allocation, so there's no buffer eviction and swap for now. We
>>>>>     need reverse engineering to see if and how GP/PP support MMU
>>>>>     fault recovery (continue execution). Otherwise we have to
>>>>>     pin/unpin each envolved buffer when task creation/deletion.
>>>> Well pinning all memory is usually a no-go for upstreaming. But since you
>>>> are already using the drm_sched for GPU task scheduling why are you actually
>>>> needing this?
>>>> The scheduler should take care of signaling all fences when the hardware is
>>>> done with it's magic and that is enough for TTM to note that a buffer object
>>>> is movable again (e.g. unpin them).
>>> Please correct me if I'm wrong.
>>> One way to implement eviction/swap is like this:
>>> call validation on each buffers involved in a task, but this won't
>>> prevent it from
>>> eviction/swap when executing, so a GPU MMU fault may happen and in the
>>> handler we need to recover the buffer evicted/swapped.
>>> Another way is pin/unpin buffers evolved when task create/free.
>>> First way is better when memory load is low and second way is better when
>>> memory load is high. First way also need less memory.
>>> So I'd prefer first way but due to the GPU MMU fault
>>> HW op need reverse engineering, I have to pin all buffers now. After
>>> the HW op is clear, I can choose one way to implement.
>> All the drivers using ttm have something that looks like vram, or a
>> requirement to move buffers around. Afaiui that includes virtio drm
>> driver.
> Does virtio drm driver need to move buffers around? amdgpu also
> has no vram when APU.
>>  From your description you don't have such a requirement, and
>> then doing what etnaviv has done would be a lot simpler. Everything
>> that's not related to buffer movement handling is also available
>> outside of ttm already.
> Yeah, I could do like etnaviv, but it's not simpler than using ttm
> directly especially want some optimization (like ttm page pool,
> ttm_eu_reserve_buffers, ttm_bo_mmap). If I have/want to implement
> them, why not just use TTM directly with all those helper functions.

Well TTM has some design flaws (e.g. heavily layered design etc...), but 
it also offers some rather nice functionality.


> Regards,
> Qiang

More information about the dri-devel mailing list