[Intel-gfx] [PATCH v7 00/20] drm/i915/vm_bind: Add VM_BIND functionality
Niranjana Vishwanathapura
niranjana.vishwanathapura at intel.com
Wed Nov 30 03:53:31 UTC 2022
On Fri, Nov 18, 2022 at 03:53:34PM -0800, Zanoni, Paulo R wrote:
>On Sat, 2022-11-12 at 23:57 -0800, Niranjana Vishwanathapura wrote:
>> DRM_I915_GEM_VM_BIND/UNBIND ioctls allows UMD to bind/unbind GEM
>> buffer objects (BOs) or sections of a BOs at specified GPU virtual
>> addresses on a specified address space (VM). Multiple mappings can map
>> to the same physical pages of an object (aliasing). These mappings (also
>> referred to as persistent mappings) will be persistent across multiple
>> GPU submissions (execbuf calls) issued by the UMD, without user having
>> to provide a list of all required mappings during each submission (as
>> required by older execbuf mode).
>>
>> This patch series support VM_BIND version 1, as described by the param
>> I915_PARAM_VM_BIND_VERSION.
>>
>> Add new execbuf3 ioctl (I915_GEM_EXECBUFFER3) which only works in
>> vm_bind mode. The vm_bind mode only works with this new execbuf3 ioctl.
>> The new execbuf3 ioctl will not have any execlist support and all the
>> legacy support like relocations etc., are removed.
>>
>> NOTEs:
>> * It is based on below VM_BIND design+uapi rfc.
>> Documentation/gpu/rfc/i915_vm_bind.rst
>
>Just as a FYI to everyone, there is a Mesa Iris implementation that
>makes use of this series here:
>
>https://gitlab.freedesktop.org/pzanoni/mesa/-/commits/upstream-vmbind/
>
>Some notes on it:
>
>- Tested on TGL and Alchemist (aka DG2).
>
>- The code still has a lot of TODOs and some FIXMEs.
>
>- It was somewhat tested with the common Linux benchmarks (Dota 2,
>Manhattan, Xonotic, etc.) and survived most of what I threw at it. The
>only problems I saw so far are:
>
> - Sometimes you get a random GPU hang with Dota 2, but it seems to go
>away if you use INTEL_DEBUG=nofc . I'm investigating it right now.
>
> - Very very rarely DRM_IOCTL_SYNCOBJ_HANDLE_TO_FD returns EINVAL and
>we don't properly recover. I am still trying to understand this one,
>it's once in a blue moon that it happens.
>
>- Performance seems to be mostly equivalent to non-vm-bind, but I
>haven't spent a lot of time really gathering numbers since I'm mostly
>debugging things. Using VM_PRIVATE BOs is the key here, you cut Dota's
>performance by almost half if you don't use private BOs. Considering
>how ABYSMALLY slower things get, I would assume there's probably
>something the Kernel could do here to handle things a little faster.
>The 'perf' tool shows a lot of i915 fence-related functions wasting CPU
>time when we don't use private BOs.
Thanks Paulo for trying this out.
i915 loops through each non-private BO of working set in the submission path
updating it's dma-resv fence list. So, user have to use privater BOs where
possible and when BOs needs to be shared, VM_UNBIND it when it is no longer
required (thus removing it from working set for subsequent submissions).
>
>- I am not really sure whether the implicit tracking code actually
>works or not. It doesn't really seem to make much of a difference if I
>remove it entirely, but I'm still planning to spend more time analyzing
>this. If anybody knows of a workload that will absolutely fail if we
>get this wrong, please tell me.
>
>- There's the whole frontbuffer tracking discussion from v6 that's
>still pending resolution.
>
>- The Vulkan implementation will come later. I wanted to sort all the
>GL details later so I don't make the same mistakes twice.
>
>- I really dislike how we have to handle the lack of implicit tracking
>support. The code is excessive and racy. See export_bo_sync_state(),
>import_bo_sync_state() and their caller from
>src/gallium/drivers/iris/iris_batch.c. Suggestions for Mesa
>improvements are welcome, but I would really really really prefer to
>have a way to just tell execbuffer3 to handle implicit tracking for
>these buffers for me in an atomic way.
>
>- I kinda wish execbuffer3 still accepted a pointer to struct
>drm_i915_gem_exec_fence in addition to struct
>drm_i915_gem_timeline_fence, since we already have to keep a list of
>exec_fences for execbuf2, and then in the execbuf3 we convert them to
>the new format. We could also do the opposite and leave execbuf2 with
>the slower path. But I could live without this, no problem.
>
You mean having an array of <handle, value> pairs (as with execbuf3)
vs having separate handles array an value array (as with execbuf2)?
I think former is much cleaner interface.
>- Credits to Ken, Jason, Lionel, Niranjana, Nanley, Daniel and
>everybody else who helped me sort things out here.
>
>
>Is this ready to be merged to the Kernel? Maybe, but I'd like us to
>sort these things out first:
>
>1. Get conclusion regarding the frontbuffer tracking issue first.
>
>2. Get some validation from more experienced people (*winks at Jason*)
>that our approach with implicit tracking is correct here. Or convince
>Niranjana to add a way to pass buffers for implicit tracking so the
>Kernel can atomically inside execbuf3 what we're trying to do with 8
>ioctls.
It was discussed during VM_BIND rfc design and implicit dependency
tracking is removed. Note that supporting it in execbuf3 is not trivial
with our VM_BIND dma-resv usage model.
Regards,
Niranjana
>
>3. Fix all the Mesa bugs so we're 100% sure they're not design flaws of
>the Kernel.
>
>But that's just my humble opinion.
>
>Thanks,
>Paulo
>
>>
>> * The IGT RFC series is posted as,
>> [PATCH i-g-t v7 0/13] vm_bind: Add VM_BIND validation support
>>
>> v2: Address various review comments
>> v3: Address review comments and other fixes
>> v4: Remove vm_unbind out fence uapi which is not supported yet,
>> replace vm->vm_bind_mode check with i915_gem_vm_is_vm_bind_mode()
>> v5: Render kernel-doc, use PIN_NOEVICT, limit vm_bind support to
>> non-recoverable faults
>> v6: Rebased, minor fixes, add reserved fields to drm_i915_gem_vm_bind,
>> add new patch for async vm_unbind support
>> v7: Rebased, minor cleanups as per review feedback
>>
>> Test-with: 20221113075309.32023-1-niranjana.vishwanathapura at intel.com
>>
>> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura at intel.com>
>>
>> Niranjana Vishwanathapura (20):
>> drm/i915/vm_bind: Expose vm lookup function
>> drm/i915/vm_bind: Add __i915_sw_fence_await_reservation()
>> drm/i915/vm_bind: Expose i915_gem_object_max_page_size()
>> drm/i915/vm_bind: Add support to create persistent vma
>> drm/i915/vm_bind: Implement bind and unbind of object
>> drm/i915/vm_bind: Support for VM private BOs
>> drm/i915/vm_bind: Add support to handle object evictions
>> drm/i915/vm_bind: Support persistent vma activeness tracking
>> drm/i915/vm_bind: Add out fence support
>> drm/i915/vm_bind: Abstract out common execbuf functions
>> drm/i915/vm_bind: Use common execbuf functions in execbuf path
>> drm/i915/vm_bind: Implement I915_GEM_EXECBUFFER3 ioctl
>> drm/i915/vm_bind: Update i915_vma_verify_bind_complete()
>> drm/i915/vm_bind: Expose i915_request_await_bind()
>> drm/i915/vm_bind: Handle persistent vmas in execbuf3
>> drm/i915/vm_bind: userptr dma-resv changes
>> drm/i915/vm_bind: Limit vm_bind mode to non-recoverable contexts
>> drm/i915/vm_bind: Add uapi for user to enable vm_bind_mode
>> drm/i915/vm_bind: Render VM_BIND documentation
>> drm/i915/vm_bind: Async vm_unbind support
>>
>> Documentation/gpu/i915.rst | 78 +-
>> drivers/gpu/drm/i915/Makefile | 3 +
>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 43 +-
>> drivers/gpu/drm/i915/gem/i915_gem_context.h | 17 +
>> drivers/gpu/drm/i915/gem/i915_gem_create.c | 72 +-
>> drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 6 +
>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 522 +----------
>> .../gpu/drm/i915/gem/i915_gem_execbuffer3.c | 872 ++++++++++++++++++
>> .../drm/i915/gem/i915_gem_execbuffer_common.c | 671 ++++++++++++++
>> .../drm/i915/gem/i915_gem_execbuffer_common.h | 76 ++
>> drivers/gpu/drm/i915/gem/i915_gem_ioctls.h | 2 +
>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 3 +
>> drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +
>> .../gpu/drm/i915/gem/i915_gem_object_types.h | 6 +
>> drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 19 +
>> drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h | 30 +
>> .../drm/i915/gem/i915_gem_vm_bind_object.c | 449 +++++++++
>> drivers/gpu/drm/i915/gt/intel_gtt.c | 17 +
>> drivers/gpu/drm/i915/gt/intel_gtt.h | 21 +
>> drivers/gpu/drm/i915/i915_driver.c | 4 +
>> drivers/gpu/drm/i915/i915_drv.h | 2 +
>> drivers/gpu/drm/i915/i915_gem_gtt.c | 39 +
>> drivers/gpu/drm/i915/i915_gem_gtt.h | 3 +
>> drivers/gpu/drm/i915/i915_getparam.c | 3 +
>> drivers/gpu/drm/i915/i915_sw_fence.c | 28 +-
>> drivers/gpu/drm/i915/i915_sw_fence.h | 23 +-
>> drivers/gpu/drm/i915/i915_vma.c | 186 +++-
>> drivers/gpu/drm/i915/i915_vma.h | 68 +-
>> drivers/gpu/drm/i915/i915_vma_types.h | 39 +
>> include/uapi/drm/i915_drm.h | 274 +++++-
>> 30 files changed, 3027 insertions(+), 551 deletions(-)
>> create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer3.c
>> create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.c
>> create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_execbuffer_common.h
>> create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind.h
>> create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_vm_bind_object.c
>>
>
More information about the Intel-gfx
mailing list