[PATCH V5 00/10] AMD XDNA driver

Lizhi Hou lizhi.hou at amd.com
Tue Oct 29 15:24:37 UTC 2024


On 10/25/24 15:02, Jeffrey Hugo wrote:
> On 10/25/2024 3:28 PM, Lizhi Hou wrote:
>>
>> On 10/25/24 10:55, Jeffrey Hugo wrote:
>>> On 10/21/2024 10:19 AM, Lizhi Hou wrote:
>>>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD 
>>>> NPUs.
>>>> The driver is based on Linux accel subsystem.
>>>>
>>>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>>>> into AMD client CPUs. NPU enables efficient execution of Machine 
>>>> Learning
>>>> applications like CNNs, LLMs, etc.  NPU is based on AMD XDNA
>>>> architecture [1].
>>>>
>>>> AMD NPU consists of the following components:
>>>>
>>>>    - Tiled array of AMD AI Engine processors.
>>>>    - Micro Controller which runs the NPU Firmware responsible for
>>>>      command processing, AIE array configuration, and execution 
>>>> management.
>>>>    - PCI EP for host control of the NPU device.
>>>>    - Interconnect for connecting the NPU components together.
>>>>    - SRAM for use by the NPU Firmware.
>>>>    - Address translation hardware for protected host memory access 
>>>> by the
>>>>      NPU.
>>>>
>>>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>>>> contexts may be bound to AI Engine array spatially and or temporarily.
>>>>
>>>> The driver is licensed under GPL-2.0 except for UAPI header which is
>>>> licensed GPL-2.0 WITH Linux-syscall-note.
>>>>
>>>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for 
>>>> IREE [3].
>>>>
>>>> The firmware for the NPU is distributed as a closed source binary, 
>>>> and has
>>>> already been pushed to the DRM firmware repository [4].
>>>>
>>>> [1]https://www.amd.com/en/technologies/xdna.html
>>>> [2]https://github.com/Xilinx/XRT
>>>> [3]https://github.com/nod-ai/iree-amd-aie
>>>> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu 
>>>>
>>>>
>>>> Changes since v4:
>>>> - Fix lockdep errors
>>>> - Use __u* structure for struct aie_error
>>>
>>> One nit, when you send the next version would you please either To: 
>>> or Cc: me on the entire series?  I only get pieces in my inbox which 
>>> is mildly annoying on my end.
>> Sure.
>>>
>>> Looks like we are getting close here.  One procedural question I 
>>> have, do you have commit permissions to drm-misc?
>> No, I do not have commit permissions yet.
>
> You should apply for access.  Assuming this series is ready before 
> that goes through, I'll apply it.
>
>>> I applied the series to drm-misc-next and tried to build.  Got the 
>>> following errors -
>>
>> Could you share the build command line? So I can reproduce and verify 
>> my fix.
>
> The command is simple:
> make -j20
>
> The system details, incase it somehow matters:
> Ubuntu 22.04 w/ 5.15 kernel
>
> $ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 22.04.3 LTS
> Release:        22.04
> Codename:       jammy
>
> $ uname -a
> Linux jhugo-lnx 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 
> UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
>
> The kernel config is probably the relevant piece.  When I first built 
> after applying the series, I was asked to choose what to do with the 
> new config item.  I selected =m.
> .config can be found at 
> https://gist.github.com/quic-jhugo/4cc249b1e3ba127039fbc709a513a432
>
>>
>> I used "make M=drivers/accel/amdxdna" and did not reproduce the error 
>> with drm-misc-next. It looks build robot did not complain with the 
>> patch neither.
>>
>> $ git branch
>> * drm-misc-next
>> $ make M=drivers/accel/amdxdna
>>    CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>>    CC [M]  drivers/accel/amdxdna/aie2_error.o
>>    CC [M]  drivers/accel/amdxdna/aie2_message.o
>>    CC [M]  drivers/accel/amdxdna/aie2_pci.o
>>    CC [M]  drivers/accel/amdxdna/aie2_psp.o
>>    CC [M]  drivers/accel/amdxdna/aie2_smu.o
>>    CC [M]  drivers/accel/amdxdna/aie2_solver.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>>    CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>>    CC [M]  drivers/accel/amdxdna/npu1_regs.o
>>    CC [M]  drivers/accel/amdxdna/npu2_regs.o
>>    CC [M]  drivers/accel/amdxdna/npu4_regs.o
>>    CC [M]  drivers/accel/amdxdna/npu5_regs.o
>>    LD [M]  drivers/accel/amdxdna/amdxdna.o
>>    MODPOST drivers/accel/amdxdna/Module.symvers
>>    CC [M]  drivers/accel/amdxdna/amdxdna.mod.o
>>    CC [M]  drivers/accel/amdxdna/.module-common.o
>>    LD [M]  drivers/accel/amdxdna/amdxdna.ko
>> $
>>
>>>
>>>   CC [M]  drivers/accel/amdxdna/aie2_ctx.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_error.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_message.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_pci.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_psp.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_smu.o
>>>   CC [M]  drivers/accel/amdxdna/aie2_solver.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_ctx.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_gem.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_mailbox_helper.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_pci_drv.o
>>>   CC [M]  drivers/accel/amdxdna/amdxdna_sysfs.o
>>>   CC [M]  drivers/accel/amdxdna/npu1_regs.o
>>>   CC [M]  drivers/accel/amdxdna/npu2_regs.o
>>>   CC [M]  drivers/accel/amdxdna/npu4_regs.o
>>>   CC [M]  drivers/accel/amdxdna/npu5_regs.o
>>>   AR      drivers/base/firmware_loader/built-in.a
>>>   AR      drivers/base/built-in.a
>>> In file included from drivers/accel/amdxdna/aie2_message.c:19:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> In file included from drivers/accel/amdxdna/aie2_psp.c:11:
>>> drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
>>> drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration 
>>> of function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
>>>    62 | FIELD_GET(PSP_STATUS_READY, ready),
>>>       |                                  ^~~~~~~~~
>>> ./include/linux/iopoll.h:47:21: note: in definition of macro 
>>> ‘read_poll_timeout’
>>>    47 |                 if (cond) \
>>>       |                     ^~~~
>>> drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro 
>>> ‘readx_poll_timeout’
>>>    61 |         ret = readx_poll_timeout(readl, PSP_REG(psp, 
>>> PSP_STATUS_REG), ready,
>>>       |               ^~~~~~~~~~~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> In file included from drivers/accel/amdxdna/aie2_pci.c:22:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> cc1: all warnings being treated as errors
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
>>> drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to 
>>> function ‘drm_sched_start’
>>>   114 | drm_sched_start(&hwctx->priv->sched);
>>>       |         ^~~~~~~~~~~~~~~
>>> In file included from ./include/trace/events/amdxdna.h:12,
>>>                  from drivers/accel/amdxdna/aie2_ctx.c:13:
>>> ./include/drm/gpu_scheduler.h:593:6: note: declared here
>>>   593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int 
>>> errno);
>>>       |      ^~~~~~~~~~~~~~~
>>> make[5]: *** [scripts/Makefile.build:229: 
>>> drivers/accel/amdxdna/aie2_psp.o] Error 1
>>> make[5]: *** Waiting for unfinished jobs....
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit 
>>> declaration of function ‘FIELD_GET’ 
>>> [-Werror=implicit-function-declaration]
>>>   112 |         return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
>>>       |                ^~~~~~~~~
>>> cc1: all warnings being treated as errors
>>> make[5]: *** [scripts/Makefile.build:229: 
>>> drivers/accel/amdxdna/aie2_ctx.o] Error 1
>>> drivers/accel/amdxdna/amdxdna_ctx.h: In function 
>>> ‘amdxdna_cmd_set_state’:
>>> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   121 |         cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
>>>       |                        ^~~~~~~~~~
>>> drivers/accel/amdxdna/amdxdna_mailbox.c: In function 
>>> ‘xdna_mailbox_send_msg’:
>>> drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit 
>>> declaration of function ‘FIELD_PREP’ 
>>> [-Werror=implicit-function-declaration]
>>>   444 |         header->sz_ver = FIELD_PREP(MSG_BODY_SZ, 
>>> msg->send_size) |
>>>       |                          ^~~~~~~~~~
>>>
>>>
>>> You also have the following checkpatch issues -
>>
>> Could you share the command you used?  I tried to use 'dim 
>> checkpatch' and it did not find out the misspelling issue.
>
> ./scripts/checkpatch.pl --strict --codespell *.patch
>
> Note, --codespell requires some local setup.  I beleive the comments 
> in the checkpatch.pl script are fairly straightforward. I use a copy 
> of the database from the github that is rather recent.  The Ubuntu 
> distro package is really out of date and I don't think I looked to see 
> if there is a pythong pip version. Grabbing the one file from the 
> github repo seemed simple emough.

I was able to reproduce with your suggestions. Thanks a lot.


Lizhi

>
> -Jeff


More information about the dri-devel mailing list