[Intel-gfx] [PATCH v3 3/3] drm/doc/rfc: VM_BIND uapi definition

Thu Jun 23 14:47:41 UTC 2022

On Thu, Jun 23, 2022 at 09:27:22AM +0100, Tvrtko Ursulin wrote:
>
>On 22/06/2022 17:44, Niranjana Vishwanathapura wrote:
>>On Wed, Jun 22, 2022 at 04:57:17PM +0100, Tvrtko Ursulin wrote:
>>>
>>>On 22/06/2022 16:12, Niranjana Vishwanathapura wrote:
>>>>On Wed, Jun 22, 2022 at 09:10:07AM +0100, Tvrtko Ursulin wrote:
>>>>>
>>>>>On 22/06/2022 04:56, Niranjana Vishwanathapura wrote:
>>>>>>VM_BIND and related uapi definitions
>>>>>>
>>>>>>v2: Reduce the scope to simple Mesa use case.
>>>>>>v3: Expand VM_UNBIND documentation and add
>>>>>>    I915_GEM_VM_BIND/UNBIND_FENCE_VALID
>>>>>>    and I915_GEM_VM_BIND_TLB_FLUSH flags.
>>>>>>
>>>>>>Signed-off-by: Niranjana Vishwanathapura 
>>>>>><niranjana.vishwanathapura at intel.com>
>>>>>>---
>>>>>> Documentation/gpu/rfc/i915_vm_bind.h | 243 
>>>>>>+++++++++++++++++++++++++++
>>>>>> 1 file changed, 243 insertions(+)
>>>>>> create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>
>>>>>>diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
>>>>>>b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>new file mode 100644
>>>>>>index 000000000000..fa23b2d7ec6f
>>>>>>--- /dev/null
>>>>>>+++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>@@ -0,0 +1,243 @@
>>>>>>+/* SPDX-License-Identifier: MIT */
>>>>>>+/*
>>>>>>+ * Copyright © 2022 Intel Corporation
>>>>>>+ */
>>>>>>+
>>>>>>+/**
>>>>>>+ * DOC: I915_PARAM_HAS_VM_BIND
>>>>>>+ *
>>>>>>+ * VM_BIND feature availability.
>>>>>>+ * See typedef drm_i915_getparam_t param.
>>>>>>+ */
>>>>>>+#define I915_PARAM_HAS_VM_BIND        57
>>>>>>+
>>>>>>+/**
>>>>>>+ * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>>>>>+ *
>>>>>>+ * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>>>>>+ * See struct drm_i915_gem_vm_control flags.
>>>>>>+ *
>>>>>>+ * The older execbuf2 ioctl will not support VM_BIND mode 
>>>>>>of operation.
>>>>>>+ * For VM_BIND mode, we have new execbuf3 ioctl which will 
>>>>>>not accept any
>>>>>>+ * execlist (See struct drm_i915_gem_execbuffer3 for more details).
>>>>>>+ *
>>>>>>+ */
>>>>>>+#define I915_VM_CREATE_FLAGS_USE_VM_BIND    (1 << 0)
>>>>>>+
>>>>>>+/* VM_BIND related ioctls */
>>>>>>+#define DRM_I915_GEM_VM_BIND        0x3d
>>>>>>+#define DRM_I915_GEM_VM_UNBIND        0x3e
>>>>>>+#define DRM_I915_GEM_EXECBUFFER3    0x3f
>>>>>>+
>>>>>>+#define DRM_IOCTL_I915_GEM_VM_BIND 
>>>>>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct 
>>>>>>drm_i915_gem_vm_bind)
>>>>>>+#define DRM_IOCTL_I915_GEM_VM_UNBIND 
>>>>>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct 
>>>>>>drm_i915_gem_vm_bind)
>>>>>>+#define DRM_IOCTL_I915_GEM_EXECBUFFER3 
>>>>>>DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct 
>>>>>>drm_i915_gem_execbuffer3)
>>>>>>+
>>>>>>+/**
>>>>>>+ * struct drm_i915_gem_vm_bind_fence - Bind/unbind 
>>>>>>completion notification.
>>>>>>+ *
>>>>>>+ * A timeline out fence for vm_bind/unbind completion notification.
>>>>>>+ */
>>>>>>+struct drm_i915_gem_vm_bind_fence {
>>>>>>+    /** @handle: User's handle for a drm_syncobj to signal. */
>>>>>>+    __u32 handle;
>>>>>>+
>>>>>>+    /** @rsvd: Reserved, MBZ */
>>>>>>+    __u32 rsvd;
>>>>>>+
>>>>>>+    /**
>>>>>>+     * @value: A point in the timeline.
>>>>>>+     * Value must be 0 for a binary drm_syncobj. A Value of 0 for a
>>>>>>+     * timeline drm_syncobj is invalid as it turns a 
>>>>>>drm_syncobj into a
>>>>>>+     * binary one.
>>>>>>+     */
>>>>>>+    __u64 value;
>>>>>>+};
>>>>>>+
>>>>>>+/**
>>>>>>+ * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
>>>>>>+ *
>>>>>>+ * This structure is passed to VM_BIND ioctl and specifies 
>>>>>>the mapping of GPU
>>>>>>+ * virtual address (VA) range to the section of an object 
>>>>>>that should be bound
>>>>>>+ * in the device page table of the specified address space (VM).
>>>>>>+ * The VA range specified must be unique (ie., not 
>>>>>>currently bound) and can
>>>>>>+ * be mapped to whole object or a section of the object 
>>>>>>(partial binding).
>>>>>>+ * Multiple VA mappings can be created to the same section 
>>>>>>of the object
>>>>>>+ * (aliasing).
>>>>>>+ *
>>>>>>+ * The @start, @offset and @length should be 4K page 
>>>>>>aligned. However the DG2
>>>>>>+ * and XEHPSDV has 64K page size for device local-memory 
>>>>>>and has compact page
>>>>>>+ * table. On those platforms, for binding device 
>>>>>>local-memory objects, the
>>>>>>+ * @start should be 2M aligned, @offset and @length should 
>>>>>>be 64K aligned.
>>>>>
>>>>>Should some error codes be documented and has the ability to 
>>>>>programmatically probe the alignment restrictions been 
>>>>>considered?
>>>>>
>>>>
>>>>Currently what we have internally is that -EINVAL is returned if 
>>>>the sart, offset
>>>>and length are not aligned. If the specified mapping already 
>>>>exits, we return
>>>>-EEXIST. If there are conflicts in the VA range and VA range 
>>>>can't be reserved,
>>>>then -ENOSPC is returned. I can add this documentation here. But 
>>>>I am worried
>>>>that there will be more suggestions/feedback about error codes 
>>>>while reviewing
>>>>the code patch series, and we have to revisit it again.
>>>
>>>I'd still suggest documenting those three. It makes sense to 
>>>explain to userspace what behaviour they will see if they get it 
>>>wrong.
>>>
>>
>>Ok.
>>
>>>>>>+ * Also, on those platforms, it is not allowed to bind an 
>>>>>>device local-memory
>>>>>>+ * object and a system memory object in a single 2M section 
>>>>>>of VA range.
>>>>>
>>>>>Text should be clear whether "not allowed" means there will be 
>>>>>an error returned, or it will appear to work but bad things 
>>>>>will happen.
>>>>>
>>>>
>>>>Yah, error returned, will fix.
>>>>
>>>>>>+ */
>>>>>>+struct drm_i915_gem_vm_bind {
>>>>>>+    /** @vm_id: VM (address space) id to bind */
>>>>>>+    __u32 vm_id;
>>>>>>+
>>>>>>+    /** @handle: Object handle */
>>>>>>+    __u32 handle;
>>>>>>+
>>>>>>+    /** @start: Virtual Address start to bind */
>>>>>>+    __u64 start;
>>>>>>+
>>>>>>+    /** @offset: Offset in object to bind */
>>>>>>+    __u64 offset;
>>>>>>+
>>>>>>+    /** @length: Length of mapping to bind */
>>>>>>+    __u64 length;
>>>>>>+
>>>>>>+    /**
>>>>>>+     * @flags: Supported flags are:
>>>>>>+     *
>>>>>>+     * I915_GEM_VM_BIND_FENCE_VALID:
>>>>>>+     * @fence is valid, needs bind completion notification.
>>>>>>+     *
>>>>>>+     * I915_GEM_VM_BIND_READONLY:
>>>>>>+     * Mapping is read-only.
>>>>>>+     *
>>>>>>+     * I915_GEM_VM_BIND_CAPTURE:
>>>>>>+     * Capture this mapping in the dump upon GPU error.
>>>>>>+     *
>>>>>>+     * I915_GEM_VM_BIND_TLB_FLUSH:
>>>>>>+     * Flush the TLB for the specified range after bind completion.
>>>>>>+     */
>>>>>>+    __u64 flags;
>>>>>>+#define I915_GEM_VM_BIND_FENCE_VALID    (1 << 0)
>>>>>>+#define I915_GEM_VM_BIND_READONLY    (1 << 1)
>>>>>>+#define I915_GEM_VM_BIND_CAPTURE    (1 << 2)
>>>>>>+#define I915_GEM_VM_BIND_TLB_FLUSH    (1 << 2)
>>>>>
>>>>>What is the use case for allowing any random user to play with 
>>>>>(global) TLB flushing?
>>>>>
>>>>
>>>>I heard it from Daniel on intel-gfx, apparently it is a Mesa 
>>>>requirement.
>>>
>>>Okay I think that one needs clarifying.
>>>
>>
>>After chatting with Jason, I think we can remove it for now and
>>we can revisit it later if Mesa thinks it is required.
>
>IRC or some other thread?

#intel-gfx

Niranjana

>
>>>>>>+
>>>>>>+    /** @fence: Timeline fence for bind completion signaling */
>>>>>>+    struct drm_i915_gem_vm_bind_fence fence;
>>>>>
>>>>>As agreed the other day - please document in the main 
>>>>>kerneldoc section that all (un)binds are executed 
>>>>>asynchronously and out of order.
>>>>>
>>>>
>>>>I have added it in the latest revision of .rst file.
>>>
>>>Right, but I'd say to mention it in the uapi docs.
>>>
>>
>>Ok
>>
>>>>>>+
>>>>>>+    /** @extensions: 0-terminated chain of extensions */
>>>>>>+    __u64 extensions;
>>>>>>+};
>>>>>>+
>>>>>>+/**
>>>>>>+ * struct drm_i915_gem_vm_unbind - VA to object mapping to unbind.
>>>>>>+ *
>>>>>>+ * This structure is passed to VM_UNBIND ioctl and 
>>>>>>specifies the GPU virtual
>>>>>>+ * address (VA) range that should be unbound from the 
>>>>>>device page table of the
>>>>>>+ * specified address space (VM). The specified VA range 
>>>>>>must match one of the
>>>>>>+ * mappings created with the VM_BIND ioctl. TLB is flushed 
>>>>>>upon unbind
>>>>>>+ * completion. The unbind operation will force unbind the specified
>>>>>
>>>>>Do we want to provide TLB flushing guarantees here and why? 
>>>>>(As opposed to leaving them for implementation details.) If 
>>>>>there is no implied order in either binds/unbinds, or between 
>>>>>the two intermixed, then what is the point of guaranteeing a 
>>>>>TLB flush on unbind completion?
>>>>>
>>>>
>>>>I think we ensure that tlb is flushed before signaling the out fence
>>>>of vm_unbind call, then user ensure corretness by staging submissions
>>>>or vm_bind calls after vm_unbind out fence signaling.
>>>
>>>I don't see why is this required. Driver does not need to flush 
>>>immediately on unbind for correctness/security and neither for the 
>>>uapi contract. If there is no subsequent usage/bind then the flush 
>>>is pointless. And if the user re-binds to same VA range, against 
>>>an active VM, then perhaps the expectations need to be defined. Is 
>>>this supported or user error or what.
>>>
>>
>>After a vm_unbind, UMD can re-bind to same VA range against an active VM.
>>Though I am not sue with Mesa usecase if that new mapping is required for
>>running GPU job or it will be for the next submission. But ensuring the
>>tlb flush upon unbind, KMD can ensure correctness.
>
>Isn't that their problem? If they re-bind for submitting _new_ work 
>then they get the flush as part of batch buffer pre-amble.
>
>>Note that on platforms with selective TLB invalidation, it is not
>>as expensive as flushing the whole TLB. On platforms without selective
>>tlb invalidation, we can put some optimization later as mentioned
>>in the .rst file.
>>
>>Also note that UMDs can vm_unbind a mapping while VM is active.
>>By flushing the tlb, we ensure there is no inadvertent access to
>>mapping that no longer exists. I can add this to documentation.
>
>This one would surely be their problem. Kernel only needs to flush 
>when it decides to re-use the backing store.
>
>To be clear, overall I have reservations about offering strong 
>guarantees about the TLB flushing behaviour at the level of these two 
>ioctls. If we don't need to offer them it would be good to not do it, 
>otherwise we limit ourselves on the implementation side and more 
>importantly add a global performance hit where majority of userspace 
>do not need this guarantee to start with.
>
>I only don't fully remember how is that compute use case supposed to 
>work where new work keeps getting submitted against a running batch. 
>Am I missing something there?
>
>>>>>range from
>>>>>>+ * device page table without waiting for any GPU job to 
>>>>>>complete. It is UMDs
>>>>>>+ * responsibility to ensure the mapping is no longer in use 
>>>>>>before calling
>>>>>>+ * VM_UNBIND.
>>>>>>+ *
>>>>>>+ * The @start and @length musy specify a unique mapping 
>>>>>>bound with VM_BIND
>>>>>>+ * ioctl.
>>>>>>+ */
>>>>>>+struct drm_i915_gem_vm_unbind {
>>>>>>+    /** @vm_id: VM (address space) id to bind */
>>>>>>+    __u32 vm_id;
>>>>>>+
>>>>>>+    /** @rsvd: Reserved, MBZ */
>>>>>>+    __u32 rsvd;
>>>>>>+
>>>>>>+    /** @start: Virtual Address start to unbind */
>>>>>>+    __u64 start;
>>>>>>+
>>>>>>+    /** @length: Length of mapping to unbind */
>>>>>>+    __u64 length;
>>>>>>+
>>>>>>+    /**
>>>>>>+     * @flags: Supported flags are:
>>>>>>+     *
>>>>>>+     * I915_GEM_VM_UNBIND_FENCE_VALID:
>>>>>>+     * @fence is valid, needs unbind completion notification.
>>>>>>+     */
>>>>>>+    __u64 flags;
>>>>>>+#define I915_GEM_VM_UNBIND_FENCE_VALID    (1 << 0)
>>>>>>+
>>>>>>+    /** @fence: Timeline fence for unbind completion signaling */
>>>>>>+    struct drm_i915_gem_vm_bind_fence fence;
>>>>>
>>>>>I am not sure the simplified ioctl story is super coherent. If 
>>>>>everything is now fully async and out of order, but the input 
>>>>>fence has been dropped, then how is userspace supposed to 
>>>>>handle the address space? It will have to wait (in userspace) 
>>>>>for unbinds to complete before submitting subsequent binds 
>>>>>which use the same VA range.
>>>>>
>>>>
>>>>Yah and Mesa appararently will be having the support to handle it.
>>>>
>>>>>Maybe that's passable, but then the fact execbuf3 has no input 
>>>>>fence suggests a userspace wait between it and binds. And I am 
>>>>>pretty sure historically those were always quite bad for 
>>>>>performance.
>>>>>
>>>>
>>>>execbuf3 has the input fence through timline fence array support.
>>>
>>>I think I confused the field in execbuf3 for for the output 
>>>fence.. So that part is fine, async binds chained with input fence 
>>>to execbuf3. Fire and forget for userspace.
>>>
>>>Although I then don't understand why execbuf3 wouldn't support an 
>>>output fence? What mechanism is userspace supposed to use for 
>>>that? Export a fence from batch buffer BO? That would be an extra 
>>>ioctl so if we can avoid it why not?
>>>
>>
>>execbuf3 supports out fence as well through timeline fence array.
>
>Ah okay, I am uninformed in this topic, sorry.
>
>Regards,
>
>Tvrtko