[PATCH RFC 00/24] Lima DRM driver

Wed May 23 14:27:12 UTC 2018

On Wed, May 23, 2018 at 10:19 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 23.05.2018 um 16:13 schrieb Qiang Yu:
>>
>> On Wed, May 23, 2018 at 9:59 PM, Christian König
>> <christian.koenig at amd.com> wrote:
>>>
>>> Am 23.05.2018 um 15:52 schrieb Qiang Yu:
>>>>
>>>> On Wed, May 23, 2018 at 5:29 PM, Christian König
>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>
>>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu:
>>>>>>
>>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs.
>>>>>>
>>>>>> This implementation mainly take amdgpu DRM driver as reference.
>>>>>>
>>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>>>>>>      OpenGL vertex shader processing and PP is for fragment shader
>>>>>>      processing. Each processor has its own MMU so prcessors work in
>>>>>>      virtual address space.
>>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>>>>>>      for mali 450) in the same mali 4xx GPU. All PPs are grouped
>>>>>>      togather to handle a single fragment shader task divided by
>>>>>>      FB output tiled pixels. Mali 400 user space driver is
>>>>>>      responsible for assign target tiled pixels to each PP, but mali
>>>>>>      450 has a HW module called DLBU to dynamically balance each
>>>>>>      PP's load.
>>>>>> - User space driver allocate buffer object and map into GPU
>>>>>>      virtual address space, upload command stream and draw data with
>>>>>>      CPU mmap of the buffer object, then submit task to GP/PP with
>>>>>>      a register frame indicating where is the command stream and misc
>>>>>>      settings.
>>>>>> - There's no command stream validation/relocation due to each user
>>>>>>      process has its own GPU virtual address space. GP/PP's MMU switch
>>>>>>      virtual address space before running two tasks from different
>>>>>>      user process. Error or evil user space code just get MMU fault
>>>>>>      or GP/PP error IRQ, then the HW/SW will be recovered.
>>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of
>>>>>>      lima buffer object which is allocated from TTM page pool. all
>>>>>>      lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when
>>>>>>      allocation, so there's no buffer eviction and swap for now. We
>>>>>>      need reverse engineering to see if and how GP/PP support MMU
>>>>>>      fault recovery (continue execution). Otherwise we have to
>>>>>>      pin/unpin each envolved buffer when task creation/deletion.
>>>>>
>>>>>
>>>>> Well pinning all memory is usually a no-go for upstreaming. But since
>>>>> you
>>>>> are already using the drm_sched for GPU task scheduling why are you
>>>>> actually
>>>>> needing this?
>>>>>
>>>>> The scheduler should take care of signaling all fences when the
>>>>> hardware
>>>>> is
>>>>> done with it's magic and that is enough for TTM to note that a buffer
>>>>> object
>>>>> is movable again (e.g. unpin them).
>>>>
>>>> Please correct me if I'm wrong.
>>>
>>>
>>> Well, you are wrong :)
>>>
>>>> One way to implement eviction/swap is like this:
>>>> call validation on each buffers involved in a task, but this won't
>>>> prevent it from
>>>> eviction/swap when executing, so a GPU MMU fault may happen and in the
>>>> handler we need to recover the buffer evicted/swapped.
>>>>
>>>> Another way is pin/unpin buffers evolved when task create/free.
>>>>
>>>> First way is better when memory load is low and second way is better
>>>> when
>>>> memory load is high. First way also need less memory.
>>>>
>>>> So I'd prefer first way but due to the GPU MMU fault
>>>> HW op need reverse engineering, I have to pin all buffers now. After
>>>> the HW op is clear, I can choose one way to implement.
>>>
>>>
>>> The general approach is:
>>> 1.) Lock all BOs
>>> 2.) Validate all BOs
>>> 3.) Add the fence
>>> 4.) Unlock the BOs
>>
>> This is the task prepare process, right?
>
>
> Yes.
>
>>> BOs can't be evicted while they are locked
>>
>> During the task prepare stage, they're locked, but after task queued, they
>> get unlocked and be evictable?
>
>
> Yes, the fence you added to the BO prevents TTM from evicting the BO until
> the fence signaled.
>
>>
>>> and since you already add the
>>> fence that should be perfectly sufficient to prevent it from being
>>> evicted
>>> until your operation is completed.
>>
>> You mean I have to explicitly pin it with TTM_PL_FLAG_NO_EVICT
>> when task creation or TTM will check buffer's reservation object and
>> won't evict it if see a fence?
>
>
> The second. You *don't* have to explicitly pin it with TTM_PL_FLAG_NO_EVICT
> as long as you always add the correct fence with your command submissions.
>
> When evicting something TTM will take a look at the fences assigned to the
> BO and either don't evict it at all or wait for all fences to be completed
> before doing so.
>
> When you need to update some internal state or flush caches or stuff like
> that when a BO is evicted TTM also has callbacks for this.

Oh, thanks for clearing this for me, it makes my life easier.

Regards,
Qiang

>
> Regards,
> Christian.
>
>
>>
>> Regards,
>> Qiang
>>
>>> Using the MMU is certainly be better in general, but usually only
>>> optional
>>> and a pain in the ass to get working. We have that in amdgpu for quite a
>>> while as well now and still don't use it because of that.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>> Regards,
>>>> Qiang
>>>>
>>>>> Christian.
>>>>>
>>>>>
>>>>>> - Use drm_sched for GPU task schedule. Each OpenGL context should
>>>>>>      have a lima context object in the kernel to distinguish tasks
>>>>>>      from different user. drm_sched gets task from each lima context
>>>>>>      in a fair way.
>>>>>>
>>>>>> Not implemented:
>>>>>> - Dump buffer support
>>>>>> - Power management
>>>>>> - Performance counter
>>>>>>
>>>>>> This patch serial just pack a pair of .c/.h files in each patch.
>>>>>> For whole history of this driver's development, see:
>>>>>> https://github.com/yuq/linux-lima/commits/lima-4.17-rc4
>>>>>>
>>>>>> Mesa driver is still in development and not ready for daily usage,
>>>>>> but can run some simple tests like kmscube and glamrk2, see:
>>>>>> https://github.com/yuq/mesa-lima
>>>>>>
>>>>>> Andrei Paulau (1):
>>>>>>      arm64/dts: add switch-delay for meson mali
>>>>>>
>>>>>> Lima Project Developers (10):
>>>>>>      drm/lima: add mali 4xx GPU hardware regs
>>>>>>      drm/lima: add lima core driver
>>>>>>      drm/lima: add GPU device functions
>>>>>>      drm/lima: add PMU related functions
>>>>>>      drm/lima: add PP related functions
>>>>>>      drm/lima: add MMU related functions
>>>>>>      drm/lima: add GPU virtual memory space handing
>>>>>>      drm/lima: add GEM related functions
>>>>>>      drm/lima: add GEM Prime related functions
>>>>>>      drm/lima: add makefile and kconfig
>>>>>>
>>>>>> Qiang Yu (12):
>>>>>>      dt-bindings: add switch-delay property for mali-utgard
>>>>>>      arm64/dts: add switch-delay for meson mali
>>>>>>      Revert "drm: Nerf the preclose callback for modern drivers"
>>>>>>      drm/lima: add lima uapi header
>>>>>>      drm/lima: add L2 cache functions
>>>>>>      drm/lima: add GP related functions
>>>>>>      drm/lima: add BCAST related function
>>>>>>      drm/lima: add DLBU related functions
>>>>>>      drm/lima: add TTM subsystem functions
>>>>>>      drm/lima: add buffer object functions
>>>>>>      drm/lima: add GPU schedule using DRM_SCHED
>>>>>>      drm/lima: add context related functions
>>>>>>
>>>>>> Simon Shields (1):
>>>>>>      ARM: dts: add gpu node to exynos4
>>>>>>
>>>>>>     .../bindings/gpu/arm,mali-utgard.txt          |   4 +
>>>>>>     arch/arm/boot/dts/exynos4.dtsi                |  33 ++
>>>>>>     arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi   |   1 +
>>>>>>     .../boot/dts/amlogic/meson-gxl-mali.dtsi      |   1 +
>>>>>>     drivers/gpu/drm/Kconfig                       |   2 +
>>>>>>     drivers/gpu/drm/Makefile                      |   1 +
>>>>>>     drivers/gpu/drm/drm_file.c                    |   8 +-
>>>>>>     drivers/gpu/drm/lima/Kconfig                  |   9 +
>>>>>>     drivers/gpu/drm/lima/Makefile                 |  19 +
>>>>>>     drivers/gpu/drm/lima/lima_bcast.c             |  65 +++
>>>>>>     drivers/gpu/drm/lima/lima_bcast.h             |  34 ++
>>>>>>     drivers/gpu/drm/lima/lima_ctx.c               | 143 +++++
>>>>>>     drivers/gpu/drm/lima/lima_ctx.h               |  51 ++
>>>>>>     drivers/gpu/drm/lima/lima_device.c            | 407 ++++++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_device.h            | 136 +++++
>>>>>>     drivers/gpu/drm/lima/lima_dlbu.c              |  75 +++
>>>>>>     drivers/gpu/drm/lima/lima_dlbu.h              |  37 ++
>>>>>>     drivers/gpu/drm/lima/lima_drv.c               | 466
>>>>>> ++++++++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_drv.h               |  77 +++
>>>>>>     drivers/gpu/drm/lima/lima_gem.c               | 459
>>>>>> ++++++++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_gem.h               |  41 ++
>>>>>>     drivers/gpu/drm/lima/lima_gem_prime.c         |  66 +++
>>>>>>     drivers/gpu/drm/lima/lima_gem_prime.h         |  31 ++
>>>>>>     drivers/gpu/drm/lima/lima_gp.c                | 293 +++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_gp.h                |  34 ++
>>>>>>     drivers/gpu/drm/lima/lima_l2_cache.c          |  98 ++++
>>>>>>     drivers/gpu/drm/lima/lima_l2_cache.h          |  32 ++
>>>>>>     drivers/gpu/drm/lima/lima_mmu.c               | 154 ++++++
>>>>>>     drivers/gpu/drm/lima/lima_mmu.h               |  34 ++
>>>>>>     drivers/gpu/drm/lima/lima_object.c            | 120 +++++
>>>>>>     drivers/gpu/drm/lima/lima_object.h            |  87 +++
>>>>>>     drivers/gpu/drm/lima/lima_pmu.c               |  85 +++
>>>>>>     drivers/gpu/drm/lima/lima_pmu.h               |  30 ++
>>>>>>     drivers/gpu/drm/lima/lima_pp.c                | 418
>>>>>> +++++++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_pp.h                |  37 ++
>>>>>>     drivers/gpu/drm/lima/lima_regs.h              | 304 +++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_sched.c             | 497
>>>>>> ++++++++++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_sched.h             | 126 +++++
>>>>>>     drivers/gpu/drm/lima/lima_ttm.c               | 409 ++++++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_ttm.h               |  44 ++
>>>>>>     drivers/gpu/drm/lima/lima_vm.c                | 312 +++++++++++
>>>>>>     drivers/gpu/drm/lima/lima_vm.h                |  73 +++
>>>>>>     include/drm/drm_drv.h                         |  23 +-
>>>>>>     include/uapi/drm/lima_drm.h                   | 195 +++++++
>>>>>>     44 files changed, 5565 insertions(+), 6 deletions(-)
>>>>>>     create mode 100644 drivers/gpu/drm/lima/Kconfig
>>>>>>     create mode 100644 drivers/gpu/drm/lima/Makefile
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_bcast.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_bcast.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_ctx.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_ctx.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_device.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_device.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_dlbu.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_dlbu.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_drv.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_drv.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_gem.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_gem.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_gem_prime.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_gem_prime.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_gp.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_gp.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_l2_cache.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_l2_cache.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_mmu.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_mmu.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_object.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_object.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_pmu.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_pmu.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_pp.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_pp.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_regs.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_sched.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_sched.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_ttm.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_ttm.h
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_vm.c
>>>>>>     create mode 100644 drivers/gpu/drm/lima/lima_vm.h
>>>>>>     create mode 100644 include/uapi/drm/lima_drm.h
>>>>>>
>