[PATCH RFC 00/24] Lima DRM driver
Qiang Yu
yuq825 at gmail.com
Wed May 23 14:27:12 UTC 2018
On Wed, May 23, 2018 at 10:19 PM, Christian König
<christian.koenig at amd.com> wrote:
> Am 23.05.2018 um 16:13 schrieb Qiang Yu:
>>
>> On Wed, May 23, 2018 at 9:59 PM, Christian König
>> <christian.koenig at amd.com> wrote:
>>>
>>> Am 23.05.2018 um 15:52 schrieb Qiang Yu:
>>>>
>>>> On Wed, May 23, 2018 at 5:29 PM, Christian König
>>>> <ckoenig.leichtzumerken at gmail.com> wrote:
>>>>>
>>>>> Am 18.05.2018 um 11:27 schrieb Qiang Yu:
>>>>>>
>>>>>> Kernel DRM driver for ARM Mali 400/450 GPUs.
>>>>>>
>>>>>> This implementation mainly take amdgpu DRM driver as reference.
>>>>>>
>>>>>> - Mali 4xx GPUs have two kinds of processors GP and PP. GP is for
>>>>>> OpenGL vertex shader processing and PP is for fragment shader
>>>>>> processing. Each processor has its own MMU so prcessors work in
>>>>>> virtual address space.
>>>>>> - There's only one GP but multiple PP (max 4 for mali 400 and 8
>>>>>> for mali 450) in the same mali 4xx GPU. All PPs are grouped
>>>>>> togather to handle a single fragment shader task divided by
>>>>>> FB output tiled pixels. Mali 400 user space driver is
>>>>>> responsible for assign target tiled pixels to each PP, but mali
>>>>>> 450 has a HW module called DLBU to dynamically balance each
>>>>>> PP's load.
>>>>>> - User space driver allocate buffer object and map into GPU
>>>>>> virtual address space, upload command stream and draw data with
>>>>>> CPU mmap of the buffer object, then submit task to GP/PP with
>>>>>> a register frame indicating where is the command stream and misc
>>>>>> settings.
>>>>>> - There's no command stream validation/relocation due to each user
>>>>>> process has its own GPU virtual address space. GP/PP's MMU switch
>>>>>> virtual address space before running two tasks from different
>>>>>> user process. Error or evil user space code just get MMU fault
>>>>>> or GP/PP error IRQ, then the HW/SW will be recovered.
>>>>>> - Use TTM as MM. TTM_PL_TT type memory is used as the content of
>>>>>> lima buffer object which is allocated from TTM page pool. all
>>>>>> lima buffer object gets pinned with TTM_PL_FLAG_NO_EVICT when
>>>>>> allocation, so there's no buffer eviction and swap for now. We
>>>>>> need reverse engineering to see if and how GP/PP support MMU
>>>>>> fault recovery (continue execution). Otherwise we have to
>>>>>> pin/unpin each envolved buffer when task creation/deletion.
>>>>>
>>>>>
>>>>> Well pinning all memory is usually a no-go for upstreaming. But since
>>>>> you
>>>>> are already using the drm_sched for GPU task scheduling why are you
>>>>> actually
>>>>> needing this?
>>>>>
>>>>> The scheduler should take care of signaling all fences when the
>>>>> hardware
>>>>> is
>>>>> done with it's magic and that is enough for TTM to note that a buffer
>>>>> object
>>>>> is movable again (e.g. unpin them).
>>>>
>>>> Please correct me if I'm wrong.
>>>
>>>
>>> Well, you are wrong :)
>>>
>>>> One way to implement eviction/swap is like this:
>>>> call validation on each buffers involved in a task, but this won't
>>>> prevent it from
>>>> eviction/swap when executing, so a GPU MMU fault may happen and in the
>>>> handler we need to recover the buffer evicted/swapped.
>>>>
>>>> Another way is pin/unpin buffers evolved when task create/free.
>>>>
>>>> First way is better when memory load is low and second way is better
>>>> when
>>>> memory load is high. First way also need less memory.
>>>>
>>>> So I'd prefer first way but due to the GPU MMU fault
>>>> HW op need reverse engineering, I have to pin all buffers now. After
>>>> the HW op is clear, I can choose one way to implement.
>>>
>>>
>>> The general approach is:
>>> 1.) Lock all BOs
>>> 2.) Validate all BOs
>>> 3.) Add the fence
>>> 4.) Unlock the BOs
>>
>> This is the task prepare process, right?
>
>
> Yes.
>
>>> BOs can't be evicted while they are locked
>>
>> During the task prepare stage, they're locked, but after task queued, they
>> get unlocked and be evictable?
>
>
> Yes, the fence you added to the BO prevents TTM from evicting the BO until
> the fence signaled.
>
>>
>>> and since you already add the
>>> fence that should be perfectly sufficient to prevent it from being
>>> evicted
>>> until your operation is completed.
>>
>> You mean I have to explicitly pin it with TTM_PL_FLAG_NO_EVICT
>> when task creation or TTM will check buffer's reservation object and
>> won't evict it if see a fence?
>
>
> The second. You *don't* have to explicitly pin it with TTM_PL_FLAG_NO_EVICT
> as long as you always add the correct fence with your command submissions.
>
> When evicting something TTM will take a look at the fences assigned to the
> BO and either don't evict it at all or wait for all fences to be completed
> before doing so.
>
> When you need to update some internal state or flush caches or stuff like
> that when a BO is evicted TTM also has callbacks for this.
Oh, thanks for clearing this for me, it makes my life easier.
Regards,
Qiang
>
> Regards,
> Christian.
>
>
>>
>> Regards,
>> Qiang
>>
>>> Using the MMU is certainly be better in general, but usually only
>>> optional
>>> and a pain in the ass to get working. We have that in amdgpu for quite a
>>> while as well now and still don't use it because of that.
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>> Regards,
>>>> Qiang
>>>>
>>>>> Christian.
>>>>>
>>>>>
>>>>>> - Use drm_sched for GPU task schedule. Each OpenGL context should
>>>>>> have a lima context object in the kernel to distinguish tasks
>>>>>> from different user. drm_sched gets task from each lima context
>>>>>> in a fair way.
>>>>>>
>>>>>> Not implemented:
>>>>>> - Dump buffer support
>>>>>> - Power management
>>>>>> - Performance counter
>>>>>>
>>>>>> This patch serial just pack a pair of .c/.h files in each patch.
>>>>>> For whole history of this driver's development, see:
>>>>>> https://github.com/yuq/linux-lima/commits/lima-4.17-rc4
>>>>>>
>>>>>> Mesa driver is still in development and not ready for daily usage,
>>>>>> but can run some simple tests like kmscube and glamrk2, see:
>>>>>> https://github.com/yuq/mesa-lima
>>>>>>
>>>>>> Andrei Paulau (1):
>>>>>> arm64/dts: add switch-delay for meson mali
>>>>>>
>>>>>> Lima Project Developers (10):
>>>>>> drm/lima: add mali 4xx GPU hardware regs
>>>>>> drm/lima: add lima core driver
>>>>>> drm/lima: add GPU device functions
>>>>>> drm/lima: add PMU related functions
>>>>>> drm/lima: add PP related functions
>>>>>> drm/lima: add MMU related functions
>>>>>> drm/lima: add GPU virtual memory space handing
>>>>>> drm/lima: add GEM related functions
>>>>>> drm/lima: add GEM Prime related functions
>>>>>> drm/lima: add makefile and kconfig
>>>>>>
>>>>>> Qiang Yu (12):
>>>>>> dt-bindings: add switch-delay property for mali-utgard
>>>>>> arm64/dts: add switch-delay for meson mali
>>>>>> Revert "drm: Nerf the preclose callback for modern drivers"
>>>>>> drm/lima: add lima uapi header
>>>>>> drm/lima: add L2 cache functions
>>>>>> drm/lima: add GP related functions
>>>>>> drm/lima: add BCAST related function
>>>>>> drm/lima: add DLBU related functions
>>>>>> drm/lima: add TTM subsystem functions
>>>>>> drm/lima: add buffer object functions
>>>>>> drm/lima: add GPU schedule using DRM_SCHED
>>>>>> drm/lima: add context related functions
>>>>>>
>>>>>> Simon Shields (1):
>>>>>> ARM: dts: add gpu node to exynos4
>>>>>>
>>>>>> .../bindings/gpu/arm,mali-utgard.txt | 4 +
>>>>>> arch/arm/boot/dts/exynos4.dtsi | 33 ++
>>>>>> arch/arm64/boot/dts/amlogic/meson-gxbb.dtsi | 1 +
>>>>>> .../boot/dts/amlogic/meson-gxl-mali.dtsi | 1 +
>>>>>> drivers/gpu/drm/Kconfig | 2 +
>>>>>> drivers/gpu/drm/Makefile | 1 +
>>>>>> drivers/gpu/drm/drm_file.c | 8 +-
>>>>>> drivers/gpu/drm/lima/Kconfig | 9 +
>>>>>> drivers/gpu/drm/lima/Makefile | 19 +
>>>>>> drivers/gpu/drm/lima/lima_bcast.c | 65 +++
>>>>>> drivers/gpu/drm/lima/lima_bcast.h | 34 ++
>>>>>> drivers/gpu/drm/lima/lima_ctx.c | 143 +++++
>>>>>> drivers/gpu/drm/lima/lima_ctx.h | 51 ++
>>>>>> drivers/gpu/drm/lima/lima_device.c | 407 ++++++++++++++
>>>>>> drivers/gpu/drm/lima/lima_device.h | 136 +++++
>>>>>> drivers/gpu/drm/lima/lima_dlbu.c | 75 +++
>>>>>> drivers/gpu/drm/lima/lima_dlbu.h | 37 ++
>>>>>> drivers/gpu/drm/lima/lima_drv.c | 466
>>>>>> ++++++++++++++++
>>>>>> drivers/gpu/drm/lima/lima_drv.h | 77 +++
>>>>>> drivers/gpu/drm/lima/lima_gem.c | 459
>>>>>> ++++++++++++++++
>>>>>> drivers/gpu/drm/lima/lima_gem.h | 41 ++
>>>>>> drivers/gpu/drm/lima/lima_gem_prime.c | 66 +++
>>>>>> drivers/gpu/drm/lima/lima_gem_prime.h | 31 ++
>>>>>> drivers/gpu/drm/lima/lima_gp.c | 293 +++++++++++
>>>>>> drivers/gpu/drm/lima/lima_gp.h | 34 ++
>>>>>> drivers/gpu/drm/lima/lima_l2_cache.c | 98 ++++
>>>>>> drivers/gpu/drm/lima/lima_l2_cache.h | 32 ++
>>>>>> drivers/gpu/drm/lima/lima_mmu.c | 154 ++++++
>>>>>> drivers/gpu/drm/lima/lima_mmu.h | 34 ++
>>>>>> drivers/gpu/drm/lima/lima_object.c | 120 +++++
>>>>>> drivers/gpu/drm/lima/lima_object.h | 87 +++
>>>>>> drivers/gpu/drm/lima/lima_pmu.c | 85 +++
>>>>>> drivers/gpu/drm/lima/lima_pmu.h | 30 ++
>>>>>> drivers/gpu/drm/lima/lima_pp.c | 418
>>>>>> +++++++++++++++
>>>>>> drivers/gpu/drm/lima/lima_pp.h | 37 ++
>>>>>> drivers/gpu/drm/lima/lima_regs.h | 304 +++++++++++
>>>>>> drivers/gpu/drm/lima/lima_sched.c | 497
>>>>>> ++++++++++++++++++
>>>>>> drivers/gpu/drm/lima/lima_sched.h | 126 +++++
>>>>>> drivers/gpu/drm/lima/lima_ttm.c | 409 ++++++++++++++
>>>>>> drivers/gpu/drm/lima/lima_ttm.h | 44 ++
>>>>>> drivers/gpu/drm/lima/lima_vm.c | 312 +++++++++++
>>>>>> drivers/gpu/drm/lima/lima_vm.h | 73 +++
>>>>>> include/drm/drm_drv.h | 23 +-
>>>>>> include/uapi/drm/lima_drm.h | 195 +++++++
>>>>>> 44 files changed, 5565 insertions(+), 6 deletions(-)
>>>>>> create mode 100644 drivers/gpu/drm/lima/Kconfig
>>>>>> create mode 100644 drivers/gpu/drm/lima/Makefile
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_bcast.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_bcast.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ctx.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ctx.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_device.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_device.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_dlbu.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_dlbu.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_drv.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_drv.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem_prime.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gem_prime.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gp.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_gp.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_l2_cache.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_l2_cache.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_mmu.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_mmu.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_object.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_object.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pmu.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pmu.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pp.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_pp.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_regs.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_sched.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_sched.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ttm.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_ttm.h
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_vm.c
>>>>>> create mode 100644 drivers/gpu/drm/lima/lima_vm.h
>>>>>> create mode 100644 include/uapi/drm/lima_drm.h
>>>>>>
>
More information about the dri-devel
mailing list