[PATCH RFC 00/24] Lima DRM driver
Daniel Vetter
daniel at ffwll.ch
Thu May 24 07:25:48 UTC 2018
On Thu, May 24, 2018 at 8:27 AM, Christian König
<christian.koenig at amd.com> wrote:
> Am 24.05.2018 um 02:31 schrieb Qiang Yu:
>>
>> On Wed, May 23, 2018 at 11:44 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
>>>
>>> On Wed, May 23, 2018 at 3:52 PM, Qiang Yu <yuq825 at gmail.com> wrote:
>>>>
>>>> On Wed, May 23, 2018 at 5:29 PM, Christian König
>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>
>>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu:
>>>>>>
>>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs.
>>>>>>
>>>>>> This implementation mainly take amdgpu DRM driver as reference.
>>>>>>
>>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>>>>>> OpenGL vertex shader processing and PP is for fragment shader
>>>>>> processing. Each processor has its own MMU so prcessors work in
>>>>>> virtual address space.
>>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>>>>>> for mali 450) in the same mali 4xx GPU. All PPs are grouped
>>>>>> togather to handle a single fragment shader task divided by
>>>>>> FB output tiled pixels. Mali 400 user space driver is
>>>>>> responsible for assign target tiled pixels to each PP, but mali
>>>>>> 450 has a HW module called DLBU to dynamically balance each
>>>>>> PP's load.
>>>>>> - User space driver allocate buffer object and map into GPU
>>>>>> virtual address space, upload command stream and draw data with
>>>>>> CPU mmap of the buffer object, then submit task to GP/PP with
>>>>>> a register frame indicating where is the command stream and misc
>>>>>> settings.
>>>>>> - There's no command stream validation/relocation due to each user
>>>>>> process has its own GPU virtual address space. GP/PP's MMU switch
>>>>>> virtual address space before running two tasks from different
>>>>>> user process. Error or evil user space code just get MMU fault
>>>>>> or GP/PP error IRQ, then the HW/SW will be recovered.
>>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of
>>>>>> lima buffer object which is allocated from TTM page pool. all
>>>>>> lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when
>>>>>> allocation, so there's no buffer eviction and swap for now. We
>>>>>> need reverse engineering to see if and how GP/PP support MMU
>>>>>> fault recovery (continue execution). Otherwise we have to
>>>>>> pin/unpin each envolved buffer when task creation/deletion.
>>>>>
>>>>>
>>>>> Well pinning all memory is usually a no-go for upstreaming. But since
>>>>> you
>>>>> are already using the drm_sched for GPU task scheduling why are you
>>>>> actually
>>>>> needing this?
>>>>>
>>>>> The scheduler should take care of signaling all fences when the
>>>>> hardware is
>>>>> done with it's magic and that is enough for TTM to note that a buffer
>>>>> object
>>>>> is movable again (e.g. unpin them).
>>>>
>>>> Please correct me if I'm wrong.
>>>>
>>>> One way to implement eviction/swap is like this:
>>>> call validation on each buffers involved in a task, but this won't
>>>> prevent it from
>>>> eviction/swap when executing, so a GPU MMU fault may happen and in the
>>>> handler we need to recover the buffer evicted/swapped.
>>>>
>>>> Another way is pin/unpin buffers evolved when task create/free.
>>>>
>>>> First way is better when memory load is low and second way is better
>>>> when
>>>> memory load is high. First way also need less memory.
>>>>
>>>> So I'd prefer first way but due to the GPU MMU fault
>>>> HW op need reverse engineering, I have to pin all buffers now. After
>>>> the HW op is clear, I can choose one way to implement.
>>>
>>> All the drivers using ttm have something that looks like vram, or a
>>> requirement to move buffers around. Afaiui that includes virtio drm
>>> driver.
>>
>> Does virtio drm driver need to move buffers around? amdgpu also
>> has no vram when APU.
Afaiui APUs have a range of stolen memory which looks and acts and is
managed like discrete vram. Including moving buffers around.
>>> From your description you don't have such a requirement, and
>>> then doing what etnaviv has done would be a lot simpler. Everything
>>> that's not related to buffer movement handling is also available
>>> outside of ttm already.
>>
>> Yeah, I could do like etnaviv, but it's not simpler than using ttm
>> directly especially want some optimization (like ttm page pool,
>> ttm_eu_reserve_buffers, ttm_bo_mmap). If I have/want to implement
>> them, why not just use TTM directly with all those helper functions.
>
>
> Well TTM has some design flaws (e.g. heavily layered design etc...), but it
> also offers some rather nice functionality.
Yeah, but I still think that for non-discrete drivers just moving a
bunch of more of the neat ttm functionality into helpers where
everyone can use them (instead of the binary ttm y/n decision) would
be much better. E.g. the allocator pool definitely sounds like
something gem helpers should be able to do, same for reserving a pile
of buffers or default mmap implementations. A lot of that also exists
already, thanks to lots of efforts from Noralf Tronnes and others.
I think ideally the long-term goal would be to modularize ttm concepts
as much as possible, so that drivers can flexibly pick&choose the bits
they need. We're slowly getting there (but definitely not yet there if
you need to manage discrete vram I think).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the dri-devel
mailing list