[PATCH V5 00/10] AMD XDNA driver
Lizhi Hou
lizhi.hou at amd.com
Fri Oct 25 21:28:03 UTC 2024
On 10/25/24 10:55, Jeffrey Hugo wrote:
> On 10/21/2024 10:19 AM, Lizhi Hou wrote:
>> This patchset introduces a new Linux Kernel Driver, amdxdna for AMD
>> NPUs.
>> The driver is based on Linux accel subsystem.
>>
>> NPU (Neural Processing Unit) is an AI inference accelerator integrated
>> into AMD client CPUs. NPU enables efficient execution of Machine
>> Learning
>> applications like CNNs, LLMs, etc. NPU is based on AMD XDNA
>> architecture [1].
>>
>> AMD NPU consists of the following components:
>>
>> - Tiled array of AMD AI Engine processors.
>> - Micro Controller which runs the NPU Firmware responsible for
>> command processing, AIE array configuration, and execution
>> management.
>> - PCI EP for host control of the NPU device.
>> - Interconnect for connecting the NPU components together.
>> - SRAM for use by the NPU Firmware.
>> - Address translation hardware for protected host memory access by
>> the
>> NPU.
>>
>> NPU supports multiple concurrent fully isolated contexts. Concurrent
>> contexts may be bound to AI Engine array spatially and or temporarily.
>>
>> The driver is licensed under GPL-2.0 except for UAPI header which is
>> licensed GPL-2.0 WITH Linux-syscall-note.
>>
>> User mode driver stack consists of XRT [2] and AMD AIE Plugin for
>> IREE [3].
>>
>> The firmware for the NPU is distributed as a closed source binary,
>> and has
>> already been pushed to the DRM firmware repository [4].
>>
>> [1]https://www.amd.com/en/technologies/xdna.html
>> [2]https://github.com/Xilinx/XRT
>> [3]https://github.com/nod-ai/iree-amd-aie
>> [4]https://gitlab.freedesktop.org/drm/firmware/-/tree/amd-ipu-staging/amdnpu
>>
>>
>> Changes since v4:
>> - Fix lockdep errors
>> - Use __u* structure for struct aie_error
>
> One nit, when you send the next version would you please either To: or
> Cc: me on the entire series? I only get pieces in my inbox which is
> mildly annoying on my end.
Sure.
>
> Looks like we are getting close here. One procedural question I have,
> do you have commit permissions to drm-misc?
No, I do not have commit permissions yet.
>
> I applied the series to drm-misc-next and tried to build. Got the
> following errors -
Could you share the build command line? So I can reproduce and verify my
fix.
I used "make M=drivers/accel/amdxdna" and did not reproduce the error
with drm-misc-next. It looks build robot did not complain with the patch
neither.
$ git branch
* drm-misc-next
$ make M=drivers/accel/amdxdna
CC [M] drivers/accel/amdxdna/aie2_ctx.o
CC [M] drivers/accel/amdxdna/aie2_error.o
CC [M] drivers/accel/amdxdna/aie2_message.o
CC [M] drivers/accel/amdxdna/aie2_pci.o
CC [M] drivers/accel/amdxdna/aie2_psp.o
CC [M] drivers/accel/amdxdna/aie2_smu.o
CC [M] drivers/accel/amdxdna/aie2_solver.o
CC [M] drivers/accel/amdxdna/amdxdna_ctx.o
CC [M] drivers/accel/amdxdna/amdxdna_gem.o
CC [M] drivers/accel/amdxdna/amdxdna_mailbox.o
CC [M] drivers/accel/amdxdna/amdxdna_mailbox_helper.o
CC [M] drivers/accel/amdxdna/amdxdna_pci_drv.o
CC [M] drivers/accel/amdxdna/amdxdna_sysfs.o
CC [M] drivers/accel/amdxdna/npu1_regs.o
CC [M] drivers/accel/amdxdna/npu2_regs.o
CC [M] drivers/accel/amdxdna/npu4_regs.o
CC [M] drivers/accel/amdxdna/npu5_regs.o
LD [M] drivers/accel/amdxdna/amdxdna.o
MODPOST drivers/accel/amdxdna/Module.symvers
CC [M] drivers/accel/amdxdna/amdxdna.mod.o
CC [M] drivers/accel/amdxdna/.module-common.o
LD [M] drivers/accel/amdxdna/amdxdna.ko
$
>
> CC [M] drivers/accel/amdxdna/aie2_ctx.o
> CC [M] drivers/accel/amdxdna/aie2_error.o
> CC [M] drivers/accel/amdxdna/aie2_message.o
> CC [M] drivers/accel/amdxdna/aie2_pci.o
> CC [M] drivers/accel/amdxdna/aie2_psp.o
> CC [M] drivers/accel/amdxdna/aie2_smu.o
> CC [M] drivers/accel/amdxdna/aie2_solver.o
> CC [M] drivers/accel/amdxdna/amdxdna_ctx.o
> CC [M] drivers/accel/amdxdna/amdxdna_gem.o
> CC [M] drivers/accel/amdxdna/amdxdna_mailbox.o
> CC [M] drivers/accel/amdxdna/amdxdna_mailbox_helper.o
> CC [M] drivers/accel/amdxdna/amdxdna_pci_drv.o
> CC [M] drivers/accel/amdxdna/amdxdna_sysfs.o
> CC [M] drivers/accel/amdxdna/npu1_regs.o
> CC [M] drivers/accel/amdxdna/npu2_regs.o
> CC [M] drivers/accel/amdxdna/npu4_regs.o
> CC [M] drivers/accel/amdxdna/npu5_regs.o
> AR drivers/base/firmware_loader/built-in.a
> AR drivers/base/built-in.a
> In file included from drivers/accel/amdxdna/aie2_message.c:19:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
> declaration of function ‘FIELD_GET’
> [-Werror=implicit-function-declaration]
> 112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
> | ^~~~~~~~~
> In file included from drivers/accel/amdxdna/amdxdna_gem.c:15:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
> declaration of function ‘FIELD_GET’
> [-Werror=implicit-function-declaration]
> 112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
> | ^~~~~~~~~
> In file included from drivers/accel/amdxdna/aie2_psp.c:11:
> drivers/accel/amdxdna/aie2_psp.c: In function ‘psp_exec’:
> drivers/accel/amdxdna/aie2_psp.c:62:34: error: implicit declaration of
> function ‘FIELD_GET’ [-Werror=implicit-function-declaration]
> 62 | FIELD_GET(PSP_STATUS_READY, ready),
> | ^~~~~~~~~
> ./include/linux/iopoll.h:47:21: note: in definition of macro
> ‘read_poll_timeout’
> 47 | if (cond) \
> | ^~~~
> drivers/accel/amdxdna/aie2_psp.c:61:15: note: in expansion of macro
> ‘readx_poll_timeout’
> 61 | ret = readx_poll_timeout(readl, PSP_REG(psp,
> PSP_STATUS_REG), ready,
> | ^~~~~~~~~~~~~~~~~~
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
> | ^~~~~~~~~~
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
> | ^~~~~~~~~~
> In file included from drivers/accel/amdxdna/aie2_pci.c:22:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
> declaration of function ‘FIELD_GET’
> [-Werror=implicit-function-declaration]
> 112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
> | ^~~~~~~~~
> In file included from drivers/accel/amdxdna/aie2_ctx.c:18:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
> declaration of function ‘FIELD_GET’
> [-Werror=implicit-function-declaration]
> 112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
> | ^~~~~~~~~
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
> | ^~~~~~~~~~
> In file included from drivers/accel/amdxdna/amdxdna_ctx.c:16:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
> declaration of function ‘FIELD_GET’
> [-Werror=implicit-function-declaration]
> 112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
> | ^~~~~~~~~
> cc1: all warnings being treated as errors
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
> | ^~~~~~~~~~
> drivers/accel/amdxdna/aie2_ctx.c: In function ‘aie2_hwctx_restart’:
> drivers/accel/amdxdna/aie2_ctx.c:114:9: error: too few arguments to
> function ‘drm_sched_start’
> 114 | drm_sched_start(&hwctx->priv->sched);
> | ^~~~~~~~~~~~~~~
> In file included from ./include/trace/events/amdxdna.h:12,
> from drivers/accel/amdxdna/aie2_ctx.c:13:
> ./include/drm/gpu_scheduler.h:593:6: note: declared here
> 593 | void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
> | ^~~~~~~~~~~~~~~
> make[5]: *** [scripts/Makefile.build:229:
> drivers/accel/amdxdna/aie2_psp.o] Error 1
> make[5]: *** Waiting for unfinished jobs....
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
> | ^~~~~~~~~~
> In file included from drivers/accel/amdxdna/amdxdna_pci_drv.c:18:
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_get_op’:
> drivers/accel/amdxdna/amdxdna_ctx.h:112:16: error: implicit
> declaration of function ‘FIELD_GET’
> [-Werror=implicit-function-declaration]
> 112 | return FIELD_GET(AMDXDNA_CMD_OPCODE, cmd->header);
> | ^~~~~~~~~
> cc1: all warnings being treated as errors
> make[5]: *** [scripts/Makefile.build:229:
> drivers/accel/amdxdna/aie2_ctx.o] Error 1
> drivers/accel/amdxdna/amdxdna_ctx.h: In function ‘amdxdna_cmd_set_state’:
> drivers/accel/amdxdna/amdxdna_ctx.h:121:24: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 121 | cmd->header |= FIELD_PREP(AMDXDNA_CMD_STATE, s);
> | ^~~~~~~~~~
> drivers/accel/amdxdna/amdxdna_mailbox.c: In function
> ‘xdna_mailbox_send_msg’:
> drivers/accel/amdxdna/amdxdna_mailbox.c:444:26: error: implicit
> declaration of function ‘FIELD_PREP’
> [-Werror=implicit-function-declaration]
> 444 | header->sz_ver = FIELD_PREP(MSG_BODY_SZ,
> msg->send_size) |
> | ^~~~~~~~~~
>
>
> You also have the following checkpatch issues -
Could you share the command you used? I tried to use 'dim checkpatch'
and it did not find out the misspelling issue.
Thanks,
Lizhi
>
>
> WARNING: 'Disalbe' may be misspelled - perhaps 'Disable'?
> #1646: FILE: drivers/accel/amdxdna/amdxdna_mailbox.c:553:
> + /* Disalbe an irq and wait. This might sleep. */
> ^^^^^^^
>
> WARNING: 'splite' may be misspelled - perhaps 'split'?
> #1695: FILE: drivers/accel/amdxdna/amdxdna_mailbox.h:21:
> + * The mailbox will splite the sending data in to multiple firmware
> message if
> ^^^^^^
>
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #1875: FILE: drivers/accel/amdxdna/amdxdna_mailbox_helper.h:9:
> +#define TX_TIMEOUT 2000 /* miliseconds */
> ^^^^^^^^^^^
>
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #1876: FILE: drivers/accel/amdxdna/amdxdna_mailbox_helper.h:10:
> +#define RX_TIMEOUT 5000 /* miliseconds */
> ^^^^^^^^^^^
>
> total: 0 errors, 4 warnings, 0 checks, 1916 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
> mechanically convert to the typical style using --fix or
> --fix-inplace.
>
> 0003-accel-amdxdna-Support-hardware-mailbox.patch has style problems,
> please review.
>
>
>
> 0007-accel-amdxdna-Add-command-execution.patch
> ----------------------------------------------
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #59: FILE: drivers/accel/amdxdna/aie2_ctx.c:27:
> +#define HWCTX_MAX_TIMEOUT 60000 /* miliseconds */
> ^^^^^^^^^^^
>
> WARNING: 'reverve' may be misspelled - perhaps 'reserve'?
> #612: FILE: drivers/accel/amdxdna/aie2_ctx.c:779:
> + XDNA_WARN(xdna, "Failed to reverve fence, ret %d", ret);
> ^^^^^^^
>
> WARNING: 'Exectuion' may be misspelled - perhaps 'Execution'?
> #1899: FILE: drivers/accel/amdxdna/amdxdna_pci_drv.c:139:
> + /* Exectuion */
> ^^^^^^^^^
>
> WARNING: 'exectuion' may be misspelled - perhaps 'execution'?
> #2113: FILE: include/uapi/drm/amdxdna_accel.h:239:
> + * struct amdxdna_drm_wait_cmd - Wait exectuion command.
> ^^^^^^^^^
>
> total: 0 errors, 10 warnings, 0 checks, 1983 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
> mechanically convert to the typical style using --fix or
> --fix-inplace.
>
> 0007-accel-amdxdna-Add-command-execution.patch has style problems,
> please review.
>
>
> 0008-accel-amdxdna-Add-suspend-and-resume.patch
> -----------------------------------------------
> WARNING: 'miliseconds' may be misspelled - perhaps 'milliseconds'?
> #163: FILE: drivers/accel/amdxdna/amdxdna_pci_drv.c:22:
> +#define AMDXDNA_AUTOSUSPEND_DELAY 5000 /* miliseconds */
> ^^^^^^^^^^^
>
> total: 0 errors, 1 warnings, 0 checks, 302 lines checked
>
> NOTE: For some of the reported defects, checkpatch may be able to
> mechanically convert to the typical style using --fix or
> --fix-inplace.
>
> 0008-accel-amdxdna-Add-suspend-and-resume.patch has style problems,
> please review.
>
>
> -Jeff
More information about the dri-devel
mailing list