[Intel-gfx] [RFC v3 3/3] drm/doc/rfc: VM_BIND uapi definition

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Wed Jun 8 07:34:36 UTC 2022


On 07/06/2022 22:25, Niranjana Vishwanathapura wrote:
> On Tue, Jun 07, 2022 at 11:42:08AM +0100, Tvrtko Ursulin wrote:
>>
>> On 03/06/2022 07:53, Niranjana Vishwanathapura wrote:
>>> On Wed, Jun 01, 2022 at 10:08:35PM -0700, Niranjana Vishwanathapura 
>>> wrote:
>>>> On Wed, Jun 01, 2022 at 11:27:17AM +0200, Daniel Vetter wrote:
>>>>> On Wed, 1 Jun 2022 at 11:03, Dave Airlie <airlied at gmail.com> wrote:
>>>>>>
>>>>>> On Tue, 24 May 2022 at 05:20, Niranjana Vishwanathapura
>>>>>> <niranjana.vishwanathapura at intel.com> wrote:
>>>>>>>
>>>>>>> On Thu, May 19, 2022 at 04:07:30PM -0700, Zanoni, Paulo R wrote:
>>>>>>>> On Tue, 2022-05-17 at 11:32 -0700, Niranjana Vishwanathapura wrote:
>>>>>>>>> VM_BIND and related uapi definitions
>>>>>>>>>
>>>>>>>>> v2: Ensure proper kernel-doc formatting with cross references.
>>>>>>>>>      Also add new uapi and documentation as per review comments
>>>>>>>>>      from Daniel.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Niranjana Vishwanathapura 
>>>>>>> <niranjana.vishwanathapura at intel.com>
>>>>>>>>> ---
>>>>>>>>>   Documentation/gpu/rfc/i915_vm_bind.h | 399 
>>>>>>> +++++++++++++++++++++++++++
>>>>>>>>>   1 file changed, 399 insertions(+)
>>>>>>>>>   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>>>>
>>>>>>>>> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h 
>>>>>>> b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>>>> new file mode 100644
>>>>>>>>> index 000000000000..589c0a009107
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
>>>>>>>>> @@ -0,0 +1,399 @@
>>>>>>>>> +/* SPDX-License-Identifier: MIT */
>>>>>>>>> +/*
>>>>>>>>> + * Copyright © 2022 Intel Corporation
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * DOC: I915_PARAM_HAS_VM_BIND
>>>>>>>>> + *
>>>>>>>>> + * VM_BIND feature availability.
>>>>>>>>> + * See typedef drm_i915_getparam_t param.
>>>>>>>>> + */
>>>>>>>>> +#define I915_PARAM_HAS_VM_BIND               57
>>>>>>>>> +
>>>>>>>>> +/**
>>>>>>>>> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
>>>>>>>>> + *
>>>>>>>>> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
>>>>>>>>> + * See struct drm_i915_gem_vm_control flags.
>>>>>>>>> + *
>>>>>>>>> + * A VM in VM_BIND mode will not support the older 
>>>>>>> execbuff mode of binding.
>>>>>>>>> + * In VM_BIND mode, execbuff ioctl will not accept any 
>>>>>>> execlist (ie., the
>>>>>>>>> + * &drm_i915_gem_execbuffer2.buffer_count must be 0).
>>>>>>>>> + * Also, &drm_i915_gem_execbuffer2.batch_start_offset and
>>>>>>>>> + * &drm_i915_gem_execbuffer2.batch_len must be 0.
>>>>>>>>> + * DRM_I915_GEM_EXECBUFFER_EXT_BATCH_ADDRESSES extension 
>>>>>>> must be provided
>>>>>>>>> + * to pass in the batch buffer addresses.
>>>>>>>>> + *
>>>>>>>>> + * Additionally, I915_EXEC_NO_RELOC, I915_EXEC_HANDLE_LUT and
>>>>>>>>> + * I915_EXEC_BATCH_FIRST of 
>>>>>>> &drm_i915_gem_execbuffer2.flags must be 0
>>>>>>>>> + * (not used) in VM_BIND mode. I915_EXEC_USE_EXTENSIONS 
>>>>>>> flag must always be
>>>>>>>>> + * set (See struct drm_i915_gem_execbuffer_ext_batch_addresses).
>>>>>>>>> + * The buffers_ptr, buffer_count, batch_start_offset and 
>>>>>>> batch_len fields
>>>>>>>>> + * of struct drm_i915_gem_execbuffer2 are also not used 
>>>>>>> and must be 0.
>>>>>>>>> + */
>>>>>>>>
>>>>>>>> From that description, it seems we have:
>>>>>>>>
>>>>>>>> struct drm_i915_gem_execbuffer2 {
>>>>>>>>         __u64 buffers_ptr;              -> must be 0 (new)
>>>>>>>>         __u32 buffer_count;             -> must be 0 (new)
>>>>>>>>         __u32 batch_start_offset;       -> must be 0 (new)
>>>>>>>>         __u32 batch_len;                -> must be 0 (new)
>>>>>>>>         __u32 DR1;                      -> must be 0 (old)
>>>>>>>>         __u32 DR4;                      -> must be 0 (old)
>>>>>>>>         __u32 num_cliprects; (fences)   -> must be 0 since 
>>>>>>> using extensions
>>>>>>>>         __u64 cliprects_ptr; (fences, extensions) -> 
>>>>>>> contains an actual pointer!
>>>>>>>>         __u64 flags;                    -> some flags must be 0 
>>>>>>>> (new)
>>>>>>>>         __u64 rsvd1; (context info)     -> repurposed field (old)
>>>>>>>>         __u64 rsvd2;                    -> unused
>>>>>>>> };
>>>>>>>>
>>>>>>>> Based on that, why can't we just get drm_i915_gem_execbuffer3 
>>>>>>>> instead
>>>>>>>> of adding even more complexity to an already abused interface? 
>>>>>>>> While
>>>>>>>> the Vulkan-like extension thing is really nice, I don't think what
>>>>>>>> we're doing here is extending the ioctl usage, we're completely
>>>>>>>> changing how the base struct should be interpreted based on 
>>>>>>> how the VM
>>>>>>>> was created (which is an entirely different ioctl).
>>>>>>>>
>>>>>>>> From Rusty Russel's API Design grading, drm_i915_gem_execbuffer2 is
>>>>>>>> already at -6 without these changes. I think after vm_bind we'll 
>>>>>>>> need
>>>>>>>> to create a -11 entry just to deal with this ioctl.
>>>>>>>>
>>>>>>>
>>>>>>> The only change here is removing the execlist support for VM_BIND
>>>>>>> mode (other than natual extensions).
>>>>>>> Adding a new execbuffer3 was considered, but I think we need to 
>>>>>>> be careful
>>>>>>> with that as that goes beyond the VM_BIND support, including any 
>>>>>>> future
>>>>>>> requirements (as we don't want an execbuffer4 after VM_BIND).
>>>>>>
>>>>>> Why not? it's not like adding extensions here is really that 
>>>>>> different
>>>>>> than adding new ioctls.
>>>>>>
>>>>>> I definitely think this deserves an execbuffer3 without even
>>>>>> considering future requirements. Just  to burn down the old
>>>>>> requirements and pointless fields.
>>>>>>
>>>>>> Make execbuffer3 be vm bind only, no relocs, no legacy bits, leave 
>>>>>> the
>>>>>> older sw on execbuf2 for ever.
>>>>>
>>>>> I guess another point in favour of execbuf3 would be that it's less
>>>>> midlayer. If we share the entry point then there's quite a few vfuncs
>>>>> needed to cleanly split out the vm_bind paths from the legacy
>>>>> reloc/softping paths.
>>>>>
>>>>> If we invert this and do execbuf3, then there's the existing ioctl
>>>>> vfunc, and then we share code (where it even makes sense, probably
>>>>> request setup/submit need to be shared, anything else is probably
>>>>> cleaner to just copypaste) with the usual helper approach.
>>>>>
>>>>> Also that would guarantee that really none of the old concepts like
>>>>> i915_active on the vma or vma open counts and all that stuff leaks
>>>>> into the new vm_bind execbuf.
>>>>>
>>>>> Finally I also think that copypasting would make backporting easier,
>>>>> or at least more flexible, since it should make it easier to have the
>>>>> upstream vm_bind co-exist with all the other things we have. Without
>>>>> huge amounts of conflicts (or at least much less) that pushing a pile
>>>>> of vfuncs into the existing code would cause.
>>>>>
>>>>> So maybe we should do this?
>>>>
>>>> Thanks Dave, Daniel.
>>>> There are a few things that will be common between execbuf2 and
>>>> execbuf3, like request setup/submit (as you said), fence handling 
>>>> (timeline fences, fence array, composite fences), engine selection,
>>>> etc. Also, many of the 'flags' will be there in execbuf3 also (but
>>>> bit position will differ).
>>>> But I guess these should be fine as the suggestion here is to
>>>> copy-paste the execbuff code and having a shared code where possible.
>>>> Besides, we can stop supporting some older feature in execbuff3
>>>> (like fence array in favor of newer timeline fences), which will
>>>> further reduce common code.
>>>>
>>>> Ok, I will update this series by adding execbuf3 and send out soon.
>>>>
>>>
>>> Does this sound reasonable?
>>>
>>> struct drm_i915_gem_execbuffer3 {
>>>        __u32 ctx_id;        /* previously execbuffer2.rsvd1 */
>>>
>>>        __u32 batch_count;
>>>        __u64 batch_addr_ptr;    /* Pointer to an array of batch gpu 
>>> virtual addresses */
>>
>> Casual stumble upon..
>>
>> Alternatively you could embed N pointers to make life a bit easier for 
>> both userspace and kernel side. Yes, but then "N batch buffers should 
>> be enough for everyone" problem.. :)
>>
> 
> Thanks Tvrtko,
> Yes, hence the batch_addr_ptr.

Right, but then userspace has to allocate a separate buffer and kernel 
has to access it separately from a single copy_from_user. Pros and cons 
of "this many batches should be enough for everyone" versus the extra 
operations.

Hmm.. for the common case of one batch - you could define the uapi to 
say if batch_count is one then pointer is GPU VA to the batch itself, 
not a pointer to userspace array of GPU VA?

Regards,

Tvrtko

>>>        __u64 flags;
>>> #define I915_EXEC3_RING_MASK              (0x3f)
>>> #define I915_EXEC3_DEFAULT                (0<<0)
>>> #define I915_EXEC3_RENDER                 (1<<0)
>>> #define I915_EXEC3_BSD                    (2<<0)
>>> #define I915_EXEC3_BLT                    (3<<0)
>>> #define I915_EXEC3_VEBOX                  (4<<0)
>>>
>>> #define I915_EXEC3_SECURE               (1<<6)
>>> #define I915_EXEC3_IS_PINNED            (1<<7)
>>>
>>> #define I915_EXEC3_BSD_SHIFT     (8)
>>> #define I915_EXEC3_BSD_MASK      (3 << I915_EXEC3_BSD_SHIFT)
>>> #define I915_EXEC3_BSD_DEFAULT   (0 << I915_EXEC3_BSD_SHIFT)
>>> #define I915_EXEC3_BSD_RING1     (1 << I915_EXEC3_BSD_SHIFT)
>>> #define I915_EXEC3_BSD_RING2     (2 << I915_EXEC3_BSD_SHIFT)
>>
>> I'd suggest legacy engine selection is unwanted, especially not with 
>> the convoluted BSD1/2 flags. Can we just require context with engine 
>> map and index? Or if default context has to be supported then I'd 
>> suggest ...class_instance for that mode.
>>
> 
> Ok, I will be happy to remove it and only support contexts with
> engine map, if UMDs agree on that.
> 
>>> #define I915_EXEC3_FENCE_IN             (1<<10)
>>> #define I915_EXEC3_FENCE_OUT            (1<<11)
>>> #define I915_EXEC3_FENCE_SUBMIT         (1<<12)
>>
>> People are likely to object to submit fence since generic mechanism to 
>> align submissions was rejected.
>>
> 
> Ok, again, I can remove it if UMDs are ok with it.
> 
>>>
>>>        __u64 in_out_fence;        /* previously execbuffer2.rsvd2 */
>>
>> New ioctl you can afford dedicated fields.
>>
> 
> Yes, but as I asked below, I am not sure if we need this or the
> timeline fence arry extension we have is good enough.
> 
>> In any case I suggest you involve UMD folks in designing it.
>>
> 
> Yah.
> Paulo, Lionel, Jason, Daniel, can you comment on these regarding
> what will UMD need in execbuf3 and what can be removed?
> 
> Thanks,
> Niranjana
> 
>> Regards,
>>
>> Tvrtko
>>
>>>
>>>        __u64 extensions;        /* currently only for 
>>> DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES */
>>> };
>>>
>>> With this, user can pass in batch addresses and count directly,
>>> instead of as an extension (as this rfc series was proposing).
>>>
>>> I have removed many of the flags which were either legacy or not
>>> applicable to BM_BIND mode.
>>> I have also removed fence array support (execbuffer2.cliprects_ptr)
>>> as we have timeline fence array support. Is that fine?
>>> Do we still need FENCE_IN/FENCE_OUT/FENCE_SUBMIT support?
>>>
>>> Any thing else needs to be added or removed?
>>>
>>> Niranjana
>>>
>>>> Niranjana
>>>>
>>>>> -Daniel
>>>>> -- 
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> http://blog.ffwll.ch


More information about the Intel-gfx mailing list