[PATCH v2 4/4] drm/nvdla/uapi: Add UAPI of NVDLA driver

Christian König christian.koenig at amd.com
Tue Apr 26 08:29:44 UTC 2022


Am 26.04.22 um 10:23 schrieb Cai Huoqing:
> On 26 4月 22 08:31:05, Christian König wrote:
>> Am 26.04.22 um 08:08 schrieb Cai Huoqing:
>>> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
>>> which is integrated into NVIDIA Jetson AGX Xavier,
>>> so add UAPI of this driver.
>>>
>>> Signed-off-by: Cai Huoqing <cai.huoqing at linux.dev>
>>> ---
>>> v1->v2:
>>> *Rename nvdla_drm.[ch] to nvdla_drv.[ch] and rename nvdla_ioctl.h to nvdla_drm.h,
>>>    move it to uapi.
>>>    comments link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20bac605-97e6-e5cd-c4e4-83a8121645d8%40amd.com%2F&data=05%7C01%7Cchristian.koenig%40amd.com%7C0777513b15b34d20c30e08da275e235c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637865582541002548%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ziQSwKxqhevOLDxq%2FvgfinF8BG3hiAwmUxsH3ZzZF4E%3D&reserved=0
>>>
>>>    include/uapi/drm/nvdla_drm.h | 99 ++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 99 insertions(+)
>>>    create mode 100644 include/uapi/drm/nvdla_drm.h
>>>
>>> diff --git a/include/uapi/drm/nvdla_drm.h b/include/uapi/drm/nvdla_drm.h
>>> new file mode 100644
>>> index 000000000000..984635285525
>>> --- /dev/null
>>> +++ b/include/uapi/drm/nvdla_drm.h
>>> @@ -0,0 +1,99 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */
>>> +/*
>>> + * Copyright (C) 2017-2018 NVIDIA CORPORATION.
>>> + * Copyright (C) 2022 Cai Huoqing
>>> + */
>>> +
>>> +#ifndef __LINUX_NVDLA_IOCTL_H
>>> +#define __LINUX_NVDLA_IOCTL_H
>>> +
>>> +#include <linux/ioctl.h>
>>> +#include <linux/types.h>
>>> +
>>> +#if !defined(__KERNEL__)
>>> +#define __user
>>> +#endif
>>> +
>>> +/**
>>> + * struct nvdla_mem_handle structure for memory handles
>>> + *
>>> + * @handle		handle to DMA buffer allocated in userspace
>>> + * @reserved		Reserved for padding
>>> + * @offset		offset in bytes from start address of buffer
>>> + *
>>> + */
>>> +struct nvdla_mem_handle {
>>> +	__u32 handle;
>>> +	__u32 reserved;
>>> +	__u64 offset;
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_ioctl_submit_task structure for single task information
>>> + *
>>> + * @num_addresses		total number of entries in address_list
>>> + * @reserved			Reserved for padding
>>> + * @address_list		pointer to array of struct nvdla_mem_handle
>>> + *
>>> + */
>>> +struct nvdla_ioctl_submit_task {
>>> +#define NVDLA_MAX_BUFFERS_PER_TASK (6144)
>>> +	__u32 num_addresses;
>>> +#define NVDLA_NO_TIMEOUT    (0xffffffff)
>>> +	__u32 timeout;
>> What format does that timeout value have?
>>
>> In general it is best practice to have absolute 64bit nanosecond timeouts
>> (to be used with ktime inside the kernel) so that restarting interrupted
>> IOCTLs works smooth.
>>
>>> +	__u64 address_list;
>> Maybe make the comments inline, cause I just wanted to write that you should
>> note that this is pointing to an nvdla_mem_handle array until I saw the
>> comment above.
>>
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_submit_args structure for task submit
>>> + *
>>> + * @tasks		pointer to array of struct nvdla_ioctl_submit_task
>>> + * @num_tasks		number of entries in tasks
>>> + * @flags		flags for task submit, no flags defined yet
>>> + * @version		version of task structure
>>> + *
>>> + */
>>> +struct nvdla_submit_args {
>>> +	__u64 tasks;
>>> +	__u16 num_tasks;
>>> +#define NVDLA_MAX_TASKS_PER_SUBMIT	24
>>> +#define NVDLA_SUBMIT_FLAGS_ATOMIC	(1 << 0)
>> Well that "no flags defined yet" from the comment above is probably outdated
>> :)
>>
>> A comment what this flag means would also be nice to have.
>>
>> Apart from all those nit picks that looks pretty solid to me. Just one core
>> functionality we usually have seems to be missing here: How is completion
>> signaling implemented?
> Hi,thank for your reply.
>
> Do you mean fence signal? In this driver, IOCTL_SUBMIT is a synchronous call
> which do task submission & wait for done completion. This accerletor deal
> with massive compute operator (Pooling, Conv...), that is different to
> GPU. It's unnecessary to expose fence API to UMD for reducing such less time.

You should probably add that as a comment somewhere here.

Thanks,
Christian.

>
> Thanks,
> Cai
>> Regards,
>> Christian.
>>
>>> +	__u16 flags;
>>> +	__u32 version;
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_gem_create_args for allocating DMA buffer through GEM
>>> + *
>>> + * @handle		handle updated by kernel after allocation
>>> + * @flags		implementation specific flags
>>> + * @size		size of buffer to allocate
>>> + */
>>> +struct nvdla_gem_create_args {
>>> +	__u32 handle;
>>> +	__u32 flags;
>>> +	__u64 size;
>>> +};
>>> +
>>> +/**
>>> + * struct nvdla_gem_map_offset_args for mapping DMA buffer
>>> + *
>>> + * @handle		handle of the buffer
>>> + * @reserved		reserved for padding
>>> + * @offset		offset updated by kernel after mapping
>>> + */
>>> +struct nvdla_gem_map_offset_args {
>>> +	__u32 handle;
>>> +	__u32 reserved;
>>> +	__u64 offset;
>>> +};
>>> +
>>> +#define DRM_NVDLA_SUBMIT		0x00
>>> +#define DRM_NVDLA_GEM_CREATE	0x01
>>> +#define DRM_NVDLA_GEM_MMAP		0x02
>>> +
>>> +#define DRM_IOCTL_NVDLA_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_SUBMIT, struct nvdla_submit_args)
>>> +#define DRM_IOCTL_NVDLA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_CREATE, struct nvdla_gem_create_args)
>>> +#define DRM_IOCTL_NVDLA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_NVDLA_GEM_MMAP, struct nvdla_gem_map_offset_args)
>>> +
>>> +#endif



More information about the dri-devel mailing list