[Lima] [PATCH 0/2] Lima DRM driver

Eric Anholt eric at anholt.net
Wed Feb 13 01:00:45 UTC 2019


Rob Herring <robh at kernel.org> writes:

> On Thu, Feb 7, 2019 at 9:51 AM Daniel Vetter <daniel at ffwll.ch> wrote:
>>
>> On Thu, Feb 07, 2019 at 11:21:52PM +0800, Qiang Yu wrote:
>> > On Thu, Feb 7, 2019 at 5:09 PM Daniel Vetter <daniel at ffwll.ch> wrote:
>> > >
>> > > On Wed, Feb 06, 2019 at 09:14:55PM +0800, Qiang Yu wrote:
>> > > > Kernel DRM driver for ARM Mali 400/450 GPUs.
>> > > >
>> > > > Since last RFC, all feedback has been addressed. Most Mali DTS
>> > > > changes are already upstreamed by SoC maintainers. The kernel
>> > > > driver and user-kernel interface are quite stable for several
>> > > > months, so I think it's ready to be upstreamed.
>> > > >
>> > > > This implementation mainly take amdgpu DRM driver as reference.
>> > > >
>> > > > - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>> > > >   OpenGL vertex shader processing and PP is for fragment shader
>> > > >   processing. Each processor has its own MMU so prcessors work in
>> > > >   virtual address space.
>> > > > - There's only one GP but multiple PP (max 4 for mali 400 and 8
>> > > >   for mali 450) in the same mali 4xx GPU. All PPs are grouped
>> > > >   togather to handle a single fragment shader task divided by
>> > > >   FB output tiled pixels. Mali 400 user space driver is
>> > > >   responsible for assign target tiled pixels to each PP, but mali
>> > > >   450 has a HW module called DLBU to dynamically balance each
>> > > >   PP's load.
>> > > > - User space driver allocate buffer object and map into GPU
>> > > >   virtual address space, upload command stream and draw data with
>> > > >   CPU mmap of the buffer object, then submit task to GP/PP with
>> > > >   a register frame indicating where is the command stream and misc
>> > > >   settings.
>> > > > - There's no command stream validation/relocation due to each user
>> > > >   process has its own GPU virtual address space. GP/PP's MMU switch
>> > > >   virtual address space before running two tasks from different
>> > > >   user process. Error or evil user space code just get MMU fault
>> > > >   or GP/PP error IRQ, then the HW/SW will be recovered.
>> > > > - Use TTM as MM. TTM_PL_TT type memory is used as the content of
>> > > >   lima buffer object which is allocated from TTM page pool. all
>> > > >   lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when
>> > > >   allocation, so there's no buffer eviction and swap for now.
>> > >
>> > > All other render gpu drivers that have unified memory (aka is on the SoC)
>> > > use GEM directly, with some of the helpers we have. So msm, etnaviv, vc4
>> > > (and i915 is kinda the same too really). TTM makes sense if you have some
>> > > discrete memory to manage, but imo not in any other place really.
>> > >
>> > > What's the design choice behind this?
>> > To be honest, it's just because TTM offers more helpers. I did implement
>> > a GEM way with cma alloc at the beginning. But when implement paged mem,
>> > I found TTM has mem pool alloc, sync and mmap related helpers which covers
>> > much of my existing code. It's totally possible with GEM, but not as easy as
>> > TTM to me. And virtio-gpu seems an example to use TTM without discrete
>> > mem. Shouldn't TTM a super set of both unified mem and discrete mem?
>>
>> virtio does have fake vram and migration afaiui. And sure, you can use TTM
>> without the vram migration, it's just that most of the complexity of TTM
>> is due to buffer placement and migration and all that stuff. If you never
>> need to move buffers, then you don't need that ever.
>>
>> Wrt lack of helpers, what exactly are you looking for? A big part of these
>> for TTM is that TTM is a bid a midlayer, so reinvents a bunch of things
>> provided by e.g. dma-api. It's cleaner to use the dma-api directly. Basing
>> the lima kernel driver on vc4, freedreno or etnaviv (last one is probably
>> closest, since it doesn't have a display block either) would be better I
>> think.
>
> FWIW, I'm working on the panfrost driver and am using the shmem
> helpers from Noralf. It's the early stages though. I started a patch
> for etnaviv to use it too, but found I need to rework it to sub-class
> the shmem GEM object.

Did you just convert the shmem helpers over to doing alloc_coherent?  If
so, I'd be interested in picking them up for v3d, and that might help
get another patch out of your stack.

I'm particularly interested in the shmem helpers because I should start
doing dynamic binding in and out of the GPU's page table, to avoid
pinning so much memory all the time.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/lima/attachments/20190212/67cdc7de/attachment.sig>


More information about the lima mailing list