New KFD ioctls: taking the skeletons out of the closet

Tue Mar 6 22:44:41 UTC 2018

Hi all,

Christian raised two potential issues in a recent KFD upstreaming code
review that are related to the KFD ioctl APIs:

 1. behaviour of -ERESTARTSYS
 2. transactional nature of KFD ioctl definitions, or lack thereof

I appreciate constructive feedback, but I also want to encourage an
open-minded rather than a dogmatic approach to API definitions. So let
me take all the skeletons out of my closet and get these APIs reviewed
in the appropriate forum before we commit to them upstream. See the end
of this email for reference.

The controversial part at this point is kfd_ioctl_map_memory_to_gpu. If
any of the other APIs raise concerns or questions, please ask.

Because of the HSA programming model, KFD memory management APIs are
synchronous. There is no pipelining. Command submission to GPUs through
user mode queues does not involve KFD. This means KFD doesn't know what
memory is used by the GPUs and when it's used. That means, when the
map_memory_to_gpu ioctl returns to user mode, all memory mapping
operations are complete and the memory can be used by the CPUs or GPUs
immediately.

HSA also uses a shared virtual memory model, so typically memory gets
mapped on multiple GPUs and CPUs at the same virtual address.

The point of contention seems to be the ability to map memory to
multiple GPUs in a single ioctl and the behaviour in failure cases. I'll
discuss two main failure cases:

1: Failure after all mappings have been dispatched via SDMA, but a
signal interrupts the wait for completion and we return -ERESTARTSYS.
Documentation/kernel-hacking/hacking.rst only says "[...] you should be
prepared to process the restart, e.g. if you're in the middle of
manipulating some data structure." I think we do that by ensuring that
memory that's already mapped won't be mapped again. So the restart will
become a no-op and just end up waiting for all the previous mappings to
complete.

Christian has a stricter requirement, and I'd like to know where that
comes from: "An interrupted IOCTL should never have a visible effect."

2: Failure to map on some but not all GPUs. This comes down to the
question, do all ioctl APIs or system calls in general need to be
transactional? As a counter example I'd give incomplete read or write
system calls that return how much was actually read or written. Our
current implementation of map_memory_to_gpu doesn't do this, but it
could be modified to return to user mode how many of the mappings, or
which mappings specifically failed or succeeded.

I'd like to know whether such behaviour is acceptable.

The alternative would be to break multi-GPU mappings, and the final wait
for completion, into multiple ioctl calls. That would result in
additional system call overhead. I'd argue that the end result is the
same for user mode, so I don't see why I'd use multiple ioctls over a
single one.

I'm looking forward to your feedback.

Thanks,
  Felix

Reference: After the last rework, these are the ioctls I'm hoping to
upstream in my current patch series (with annotations):

/* Acquire a VM from a DRM render node FD for use by KFD on a specific device
 *
 * @drm_fd: DRM render node file descriptor
 * @gpu_id: device identifier (used throughout the KFD API)
 */
struct kfd_ioctl_acquire_vm_args {
	__u32 drm_fd;	/* to KFD */
	__u32 gpu_id;	/* to KFD */
};

/* Allocation flags: memory types */
#define KFD_IOC_ALLOC_MEM_FLAGS_VRAM		(1 << 0)
#define KFD_IOC_ALLOC_MEM_FLAGS_GTT		(1 << 1)
#define KFD_IOC_ALLOC_MEM_FLAGS_USERPTR		(1 << 2)
#define KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL	(1 << 3)
/* Allocation flags: attributes/access options */
#define KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
#define KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
#define KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC		(1 << 29)
#define KFD_IOC_ALLOC_MEM_FLAGS_NO_SUBSTITUTE	(1 << 28)
#define KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM	(1 << 27)
#define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)

/* Allocate memory for later SVM (shared virtual memory) mapping.
 *
 * @va_addr:     virtual address of the memory to be allocated
 *               all later mappings on all GPUs will use this address
 * @size:        size in bytes
 * @handle:      buffer handle returned to user mode, used to refer to
 *               this allocation for mapping, unmapping and freeing
 * @mmap_offset: for CPU-mapping the allocation by mmapping a render node
 *               for userptrs this is overloaded to specify the CPU address
 * @gpu_id:      device identifier
 * @flags:       memory type and attributes. See KFD_IOC_ALLOC_MEM_FLAGS above
 */
struct kfd_ioctl_alloc_memory_of_gpu_args {
	__u64 va_addr;		/* to KFD */
	__u64 size;		/* to KFD */
	__u64 handle;		/* from KFD */
	__u64 mmap_offset;	/* to KFD (userptr), from KFD (mmap offset) */
	__u32 gpu_id;		/* to KFD */
	__u32 flags;
};

/* Free memory allocated with kfd_ioctl_alloc_memory_of_gpu
 *
 * @handle: memory handle returned by alloc
 */
struct kfd_ioctl_free_memory_of_gpu_args {
	__u64 handle;		/* to KFD */
};

/* Map memory to one of more GPUs
 *
 * @handle:                memory handle returned by alloc
 * @device_ids_array_ptr:  array of gpu_ids
 * @device_ids_array_size: size of the gpu_ids array
 */
struct kfd_ioctl_map_memory_to_gpu_args {
	__u64 handle;			/* to KFD */
	__u64 device_ids_array_ptr;	/* to KFD */
	__u32 device_ids_array_size;	/* to KFD */
	__u32 pad;
};

/* Unmap memory from one or more GPUs
 *
 * same arguments as for mapping
 */
struct kfd_ioctl_unmap_memory_from_gpu_args {
	__u64 handle;			/* to KFD */
	__u64 device_ids_array_ptr;	/* to KFD */
	__u32 device_ids_array_size;	/* to KFD */
	__u32 pad;
};

-- 
F e l i x   K u e h l i n g
PMTS Software Development Engineer | Vertical Workstation/Compute
1 Commerce Valley Dr. East, Markham, ON L3T 7X6 Canada
(O) +1(289)695-1597
   _     _   _   _____   _____
  / \   | \ / | |  _  \  \ _  |
 / A \  | \M/ | | |D) )  /|_| |
/_/ \_\ |_| |_| |_____/ |__/ \|   facebook.com/AMD | amd.com