About upstreaming ArmChina NPU driver

Oded Gabbay oded.gabbay at gmail.com
Wed Apr 3 06:25:47 UTC 2024


On Thu, Mar 28, 2024 at 10:01 AM Dejia Shang <Dejia.Shang at armchina.com> wrote:
>
> Dear Kernel Maintainers,
>
> I am a driver developer and would like to upstream the ArmChina Zhouyi NPU driver ("Zhouyi" is the brand) to accel subsystem.
>
> The driver is already open sourced (both UMD and KMD) and anyone can find the code from https://github.com/Arm-China/Compass_NPU_Driver.git.
>
> This driver is responsible for scheduling AI inference tasks to the NPU cores (V1/V2/V3). Specifically, a simplified end-to-end flow is:
>
>         1. A TFLite/ONNX model is transformed to an executable binary file in ELF format by the NN graph compiler (designed by ArmChina)
>         2. An application loads the executable binary file to UMD and provides the input data.
>         3. UMD parses the binary and sends ioctls to KMD (open device, do memory allocation/mmap/free, submit the job descriptor).
>         4. KMD dispatches the job to NPU h/w, handles interrupts and updates the execution status.
>         5. UMD polls the status of the pre-scheduled job.
>         6. The application gets the output results.
>
> So...for the upstreaming,
>
> Q1: do you think our NPU driver is suitable for accel? If the answer is yes, which tree & branch should the patches be based on?
Hi Dejia,
Yes, it definitely sounds as a good fit to the accel subsystem.
Please base your patches on "drm-misc-next" branch in drm-misc repo:
https://anongit.freedesktop.org/git/drm/drm-misc.git

>
> Q2: in thread https://lore.kernel.org/lkml/ec547d33-214f-4952-aa33-c271e9edad63@kernel.org/ showing a similar case, Oded mentioned that:
>
>         "If we would have upstreamed a new driver, the expectation would have been that we would use some drm mechanisms.", and
>         "the minimal requirement is to use GEM/BOs for memory management operations".
>
> I guess those requirements are also applicable for the Zhouyi NPU KMD? Currently, the memory management (MM) in KMD is based on dma-mapping APIs, which handles both reserved CMA region(s) and SMMU mapped buffers, and supports the dma-buf framework. Maybe I should replace the implementations with DRM APIs.
Yes, those requirements definitely apply here.
>
> Q3: if you have looked at the KMD code, do you think I should make any other major change before submitting the first patch series? Thank you!
I took a quick glance. In general, it seems to be ok, but I noticed
two things related to the integration with drm/accel:

1. You us a scheduler for the job submission, which provides the
ability to defer jobs. In that case, I suggest to check if you can use
drm_sched instead of your own implementation. No point in re-inventing
the wheel.
2. You provide several memory zones for allocation of memory. I would
suggest here to look at using ttm as the memory manager instead of
re-implementing your own.

And please remove the IMPORTANT NOTICE at the end of your emails. I
would have to refrain from answering to further emails if that notice
remains.

Thanks,
Oded

>
> Thanks for your time and look forward to your reply~ 😊
>
> Best Regards,
> Dejia
> IMPORTANT NOTICE: The contents of this email and any attachments may be privileged and confidential. If you are not the intended recipient, please delete the email immediately. It is strictly prohibited to disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ©Arm Technology (China) Co., Ltd copyright and reserve all rights. 重要提示:本邮件(包括任何附件)可能含有专供明确的个人或目的使用的机密信息,并受法律保护。如果您并非该收件人,请立即删除此邮件。严禁通过任何渠道,以任何目的,向任何人披露、储存或复制邮件信息或者据此采取任何行动。感谢您的配合。 ©安谋科技(中国)有限公司 版权所有并保留一切权利。


More information about the dri-devel mailing list