[PATCH 0/7] Add a DRM driver to support AI Processing Unit (APU)

Fri May 26 15:45:47 UTC 2023


On 5/24/23 12:40, Daniel Vetter wrote:
> On Wed, May 24, 2023 at 01:27:00PM +0300, Oded Gabbay wrote:
>> On Wed, May 24, 2023 at 2:34 AM Kevin Hilman <khilman at baylibre.com> wrote:
>>>
>>> Jeffrey Hugo <quic_jhugo at quicinc.com> writes:
>>>
>>>> On 5/17/2023 8:52 AM, Alexandre Bailon wrote:
>>>>> This adds a DRM driver that implements communication between the CPU and an
>>>>> APU. The driver target embedded device that usually run inference using some
>>>>> prebuilt models. The goal is to provide common infrastructure that could be
>>>>> re-used to support many accelerators. Both kernel, userspace and firmware tries
>>>>> to use standard and existing to leverage the development and maintenance effort.
>>>>> The series implements two platform drivers, one for simulation and another one for
>>>>> the mt8183 (compatible with mt8365).
>>>>
>>>> This looks like the 3 existing Accel drivers.  Why is this in DRM?
>>>
>>> Yes, this belongs in accel.  I think Alex had some issues around the
>>> infra in accel with device nodes not appearing/opening properly, but
>>> I'll let him comment there.  But either way, the right approach should
>>> be to fix any issues in accel and move it there.
>>>
>>> [...]
>>>
>>>>>    .../devicetree/bindings/gpu/mtk,apu-drm.yaml  |  38 ++
>>>>>    drivers/gpu/drm/Kconfig                       |   2 +
>>>>>    drivers/gpu/drm/Makefile                      |   1 +
>>>>>    drivers/gpu/drm/apu/Kconfig                   |  22 +
>>>>>    drivers/gpu/drm/apu/Makefile                  |  10 +
>>>>>    drivers/gpu/drm/apu/apu_drv.c                 | 282 +++++++++
>>>>>    drivers/gpu/drm/apu/apu_gem.c                 | 230 +++++++
>>>>>    drivers/gpu/drm/apu/apu_internal.h            | 205 ++++++
>>>>>    drivers/gpu/drm/apu/apu_sched.c               | 592 ++++++++++++++++++
>>>>>    drivers/gpu/drm/apu/simu_apu.c                | 313 +++++++++
>>>>>    include/uapi/drm/apu_drm.h                    |  81 +++
>>>>
>>>> "apu" seems too generic.  We already have 3 "AI processing units" over
>>>> in drivers/accel already...
>>>
>>> Indeed, it is generic, but that's kind of the point for this driver
>>> since it's targetted at generalizing the interface with "AI processing
>>> units" on a growing number of embedded SoCs (ARM, RISC-V, etc.)  In
>>> addition, the generic naming is intentional because the goal is bigger
>>> than the kernel and is working towards a generic, shared "libAPU"
>>> userspace[1], but also common firmware for DSP-style inference engines
>>> (e.g. analgous Sound Open Firmware for audio DSPs.)
>>>
>>> As usual, the various SoC vendors use different names (APU, NPU, NN
>>> unit, etc.)  but we'd like a generic name for the class of devices
>>> targetted by this driver.  And unfortunately, it looks like the equally
>>> generic "Versatile processing unit" is already taken Intel's
>>> drivers/accel/ivpu. :)
>>>
>>> Maybe since this is more about generalizing the interface between the
>>> CPU running linux and the APU, what about the name apu_if?  But I guess
>>> that applies to the other 3 drivers in drivers/accell also.  Hmmm...
>>>
>>> Naming things is hard[2], so we're definitly open to other ideas.  Any
>>> suggestions?
>> Maybe model it according to the tiny driver in drm display ? You can
>> then call it tiny_apu :-)
>> Disclosure: It was Daniel's suggestion, he can chime in with more
>> details on the tiny driver concept.
> 
> Yeah so maybe a bit more detail on my thoughts:
> 
> First this smells like a need bypass of the entire "we want open userspace
> for accel drivers" rule. The rule isn't quite a strict as for drm gpu
> drivers (not sure we ended up documenting exactly what, but iirc the
> consensus was that for build-time only dependencies we're ok with
> downstream compilers), but it's still there.
What is letting you think that we want to bypass open source requirements ?
Although the neural network firmware and userspace application are not yet
opensource, our intention is to develop a full open source stack.
Currently, we only support Mediatek APU (an Xtensa VP6) and we have to 
use closed source sotfware to execute inferences on the accelerator.
As far I know, there software stack similar to mesa where we could add
support of a new accelerator (this is also true for firmware).
That is actually what we would like to do. But this will take a lot of 
time and we consider this driver as a first (small) step.
> 
> And at least from a quick look apu.ko and libapu just look like a generic
> accel interface, and that's not enough.
> 
> For the big training engines it's more or less "enough to run pytorch, but
> it can be really slow", not sure what the right standard for these
> inference-only drivers should be.
To be honest, I don't know what would be required for training engines.
We only target accelerators for embedded device that usually only run 
inferences. In my opinion, this is 2 different use cases and I don't 
think we could address them in the same way.
> 
> So that's the first reason why I don't like this.
> 
> The other is that I think if we do end up with a pile of tiny accel
> drivers, we should probably look into something like simmpledrm for the
> tiny display drivers. Probably still IP specific ioctls (at least most) so
> that IP specific job knows and all that are easy, but then just pass to a
> framework that simplifies a drm gem driver to "write ptes" and "run job"
> callback, maybe with an optional "create/destroy vm/ctx" for hw which can
> do that.
> 
> So maybe we end up with a drivers/accel/tiny and a bunch more helpers
> around the existing gem ones. The rule we have for drm/tiny is "1 file,
> less than 1kloc", and there's a bunch of them. I do think we can achieve
> the same for tiny accel inference engines (but it's still a bit a road).
> Maybe tiny accel is more like "less than 5kloc" since you need a bit more
> glue for the driver specific ioctl stuff - maybe that's only needed for
> the submit ioctl, maybe also for buffer map/unmap and creation.
This makes sense to me.
> 
> Also note that there's an entire pile of in-flight work for adding new
> helpers to the gem world to make this all easier. Once we have gpuva and
> exec helpers there not much glue left to tie it all together with the
> scheduler.
I wrote this series a long time ago and just rebased it recently.
I will take some time to see the in-flight work and see if that 
something I could start using.
> 
> But the real crux is that an accel inference driver really needs to have
> enough userspace to do an actual inference job with some
> android/cros/whatever framework for inference (there's just too many).
We are currently stuck with closed source fimrware, userspace 
applications and toolchains (works with android and linux).
We are looking for a solution but implementing something will take some 
time.

Alexandre
> -Daniel
> 
>> Oded
>>
>>>
>>> Kevin
>>>
>>> [1] https://gitlab.baylibre.com/baylibre/libapu/libapu
>>>
>>> [2]
>>> "There are 2 hard problems in computer science: cache invalidation,
>>>   naming things and off-by-1 errors."
>>>   -- https://twitter.com/secretGeek/status/7269997868
>>>
>