[PATCH V2 00/10] AMD XDNA driver

Lizhi Hou lizhi.hou at amd.com
Tue Sep 3 17:50:11 UTC 2024


Hi Jeffrey,


I have completed the code changes based on your comments. And we are 
still working on the documentation.

Should I send out another patch set with only the code changes for your 
to review? Or you would wait for the documentation to review together?


Thanks,

Lizhi

On 8/14/24 13:06, Lizhi Hou wrote:
>
> On 8/14/24 11:49, Jeffrey Hugo wrote:
>> On 8/12/2024 12:16 PM, Lizhi Hou wrote:
>>>
>>> On 8/9/24 08:21, Jeffrey Hugo wrote:
>>>> On 8/5/2024 11:39 AM, Lizhi Hou wrote:
>>>>> This patchset introduces a new Linux Kernel Driver, amdxdna for 
>>>>> AMD NPUs.
>>>>> The driver is based on Linux accel subsystem.
>>>>>
>>>>> NPU (Neural Processing Unit) is an AI inference accelerator 
>>>>> integrated
>>>>> into AMD client CPUs. NPU enables efficient execution of Machine 
>>>>> Learning
>>>>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>>>>> architecture [1].
>>>>>
>>>>> AMD NPU consists of the following components:
>>>>>
>>>>>    - Tiled array of AMD AI Engine processors.
>>>>>    - Micro Controller which runs the NPU Firmware responsible for
>>>>>      command processing, AIE array configuration, and execution 
>>>>> management.
>>>>>    - PCI EP for host control of the NPU device.
>>>>>    - Interconnect for connecting the NPU components together.
>>>>>    - SRAM for use by the NPU Firmware.
>>>>>    - Address translation hardware for protected host memory access 
>>>>> by the
>>>>>      NPU.
>>>>>
>>>>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>>>>> contexts may be bound to AI Engine array spatially and or 
>>>>> temporarily.
>>>>>
>>>>> The driver is licensed under GPL-2.0 except for UAPI header which is
>>>>> licensed GPL-2.0 WITH Linux-syscall-note.
>>>>>
>>>>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>>>>> IREE [3].
>>>>
>>>> Is there a special branch with the code?  I don't see any of the 
>>>> uAPI in either project when searching for the ioctl codes or ioctl 
>>>> structures.
>>>
>>> Please see git repo: https://github.com/amd/xdna-driver
>>>
>>> This contains the out tree driver and shim code which interact with 
>>> driver. E.g.
>>>
>>> https://github.com/amd/xdna-driver/blob/main/src/shim/bo.cpp#L18
>>
>> Ok, I need to have a look at this.  Long term is the plan to move the 
>> shim to the XRT repo once the driver is merged upstream?
> Yes.
>>
>>>
>>>>
>>>>>
>>>>> The firmware for the NPU is distributed as a closed source binary, 
>>>>> and has
>>>>> already been pushed to the DRM firmware repository [4].
>>>>>
>>>>> [1] https://www.amd.com/en/technologies/xdna.html
>>>>> [2] https://github.com/Xilinx/XRT
>>>>> [3] https://github.com/nod-ai/iree-amd-aie
>>>>> [4] 
>>>>> https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>>>>
>>>>>
>>>>> Changes since v1:
>>>>> - Remove some inline defines
>>>>> - Minor changes based code review comments
>>>>>
>>>>> Lizhi Hou (10):
>>>>>    accel/amdxdna: Add a new driver for AMD AI Engine
>>>>>    accel/amdxdna: Support hardware mailbox
>>>>>    accel/amdxdna: Add hardware resource solver
>>>>>    accel/amdxdna: Add hardware context
>>>>>    accel/amdxdna: Add GEM buffer object management
>>>>>    accel/amdxdna: Add command execution
>>>>>    accel/amdxdna: Add suspend and resume
>>>>>    accel/amdxdna: Add error handling
>>>>>    accel/amdxdna: Add query functions
>>>>>    accel/amdxdna: Add firmware debug buffer support
>>>>>
>>>>>   MAINTAINERS                                   |   9 +
>>>>>   drivers/accel/Kconfig                         |   1 +
>>>>>   drivers/accel/Makefile                        |   1 +
>>>>>   drivers/accel/amdxdna/Kconfig                 |  15 +
>>>>>   drivers/accel/amdxdna/Makefile                |  22 +
>>>>>   drivers/accel/amdxdna/TODO                    |   4 +
>>>>>   drivers/accel/amdxdna/aie2_ctx.c              | 949 
>>>>> ++++++++++++++++++
>>>>>   drivers/accel/amdxdna/aie2_error.c            | 349 +++++++
>>>>>   drivers/accel/amdxdna/aie2_message.c          | 775 ++++++++++++++
>>>>>   drivers/accel/amdxdna/aie2_msg_priv.h         | 372 +++++++
>>>>>   drivers/accel/amdxdna/aie2_pci.c              | 756 ++++++++++++++
>>>>>   drivers/accel/amdxdna/aie2_pci.h              | 264 +++++
>>>>>   drivers/accel/amdxdna/aie2_psp.c              | 137 +++
>>>>>   drivers/accel/amdxdna/aie2_smu.c              | 112 +++
>>>>>   drivers/accel/amdxdna/aie2_solver.c           | 329 ++++++
>>>>>   drivers/accel/amdxdna/aie2_solver.h           | 156 +++
>>>>>   drivers/accel/amdxdna/amdxdna_ctx.c           | 597 +++++++++++
>>>>>   drivers/accel/amdxdna/amdxdna_ctx.h           | 165 +++
>>>>>   drivers/accel/amdxdna/amdxdna_drm.c           | 172 ++++
>>>>>   drivers/accel/amdxdna/amdxdna_drm.h           | 114 +++
>>>>>   drivers/accel/amdxdna/amdxdna_gem.c           | 700 +++++++++++++
>>>>>   drivers/accel/amdxdna/amdxdna_gem.h           |  73 ++
>>>>>   drivers/accel/amdxdna/amdxdna_mailbox.c       | 582 +++++++++++
>>>>>   drivers/accel/amdxdna/amdxdna_mailbox.h       | 124 +++
>>>>>   .../accel/amdxdna/amdxdna_mailbox_helper.c    |  50 +
>>>>>   .../accel/amdxdna/amdxdna_mailbox_helper.h    |  43 +
>>>>>   drivers/accel/amdxdna/amdxdna_pci_drv.c       | 234 +++++
>>>>>   drivers/accel/amdxdna/amdxdna_pci_drv.h       |  31 +
>>>>>   drivers/accel/amdxdna/amdxdna_sysfs.c         |  58 ++
>>>>>   drivers/accel/amdxdna/npu1_regs.c             |  94 ++
>>>>>   drivers/accel/amdxdna/npu2_regs.c             | 111 ++
>>>>>   drivers/accel/amdxdna/npu4_regs.c             | 111 ++
>>>>>   drivers/accel/amdxdna/npu5_regs.c             | 111 ++
>>>>>   include/trace/events/amdxdna.h                | 101 ++
>>>>>   include/uapi/drm/amdxdna_accel.h              | 456 +++++++++
>>>>>   35 files changed, 8178 insertions(+)
>>>>>   create mode 100644 drivers/accel/amdxdna/Kconfig
>>>>>   create mode 100644 drivers/accel/amdxdna/Makefile
>>>>>   create mode 100644 drivers/accel/amdxdna/TODO
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_ctx.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_error.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_message.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_msg_priv.h
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_pci.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_pci.h
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_psp.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_smu.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_solver.c
>>>>>   create mode 100644 drivers/accel/amdxdna/aie2_solver.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.c
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_ctx.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_drm.c
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_drm.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_gem.c
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_gem.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.c
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.c
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_mailbox_helper.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.c
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_pci_drv.h
>>>>>   create mode 100644 drivers/accel/amdxdna/amdxdna_sysfs.c
>>>>>   create mode 100644 drivers/accel/amdxdna/npu1_regs.c
>>>>>   create mode 100644 drivers/accel/amdxdna/npu2_regs.c
>>>>>   create mode 100644 drivers/accel/amdxdna/npu4_regs.c
>>>>>   create mode 100644 drivers/accel/amdxdna/npu5_regs.c
>>>>>   create mode 100644 include/trace/events/amdxdna.h
>>>>>   create mode 100644 include/uapi/drm/amdxdna_accel.h
>>>>>
>>>>
>>>> No Documentation?
>>>
>>> Is it ok to add a work item to TODO and add documentation in later 
>>> patches?
>>
>> I beleive best practice would be to add Documnetation in the same 
>> patch/series that adds the functionality.  I'm not expecting 
>> Documentation for items not implemented in this series, however I 
>> think describing the product/architecture/other high level topics 
>> would help put the code in context during review.
>>
>> It does seem like the AMD GPU driver had a lot of documentation, 
>> which makes the lack of documentation for the AMD Accel driver 
>> particularly odd.
>
> Ok.  We will work on the document
>
>
> Thanks,
>
> Lizhi
>


More information about the dri-devel mailing list