[Intel-xe] [PATCH v5 0/8] Port Xe to use GPUVA and implement NULL VM binds

Thu Apr 6 01:20:48 UTC 2023

On Wed, Apr 05, 2023 at 05:31:15PM +0200, Thomas Hellström wrote:
> Hi,
> 
> On 4/4/23 16:56, Matthew Brost wrote:
> > On Tue, Apr 04, 2023 at 03:20:47PM +0200, Thomas Hellström wrote:
> > > Hi, Matthew,
> > > 
> > > 
> > > On 4/4/23 03:42, Matthew Brost wrote:
> > > > GPUVA is common code written primarily by Danilo with the idea being a
> > > > common place to track GPUVAs (VMAs in Xe) within an address space (VMs
> > > > in Xe), track all the GPUVAs attached to GEMs, and a common way
> > > > implement VM binds / unbinds with MMAP / MUNMAP semantics via creating
> > > > operation lists. All of this adds up to a common way to implement VK
> > > > sparse bindings.
> > > > 
> > > > This series pulls in the GPUVA code written by Danilo plus some small
> > > > fixes by myself into 1 large patch. Once the GPUVA makes it upstream, we
> > > > can rebase and drop this patch. I believe what lands upstream should be
> > > > nearly identical to this patch at least from an API perspective.
> > > > 
> > > > The last three patches port Xe to GPUVA and add support for NULL VM binds
> > > > (writes dropped, read zero, VK sparse support). An example of the
> > > > semantics of this is below.
> > > Going through thew new code in xe_vm.c I'm still concerned about our error
> > > handling. In the cases we attempt to handle, for example an -ENOMEM, the
> > > sync ioctl appears to be doing a fragile unwind whereas the async worker
> > > puts the VM in an error state that can only be exited by a RESTART ioctl
> > > (pls correct me if I'm wrong here). We could of course do the same for sync
> > > ioctls, but I'm still wondering what happens if, for example a MAP operation
> > > on a dual-gt VM fails after binding on the first gt?
> > > 
> > So sync mode we don't allow advanced binds (more than 1 map operation)
> > but I failed to take into account dual-gt VMs, yea that might be an
> > issue foe sync binds. This should just for work async though. Maybe we
> > depreciate the sync mode except for unbinds when in an error state.
> 
> I think we need to separate this from sync/async and what's in the current
> code. I've started to dig into the Nouveau code a bit to get things right in
> the async VM_BIND document, but the error handling should really be agnostic
> as to whether we're async or not. We should be able to run RESTART also
> after a failing sync VM_BIND if the decision is to put the VM in a
> recoverable error state rather than to clean everything up after each
> failure.
> 

Not sure I agree with this but we can hash this out later.

> > 
> > > I think we need to clearly define the desired behaviour in the uAPI and then
> > > make sure we do the necessary changes with that in mind, and Even if I
> > > prefer the !async_worker solution this can be don orthogonally to that
> > > discussion. Ideally I'd see we find a way to *not* put the VM  in an error
> > > state like this, and I figure there are various ways to aid in this, for
> > > example keeping a progress cookie in the user-space data or deferring any
> > > failed bind operations to the next rebind worker or exec, but in the end I
> > > think this boils down to allocating all needed resources up-front for
> > > operations or groups of operations that are not allowed to fail once we
> > > start to modify the page-tables.
> > > 
> > Agree on clearly defined uAPI behavior, I'm not sure if I have answer
> > for the error handling. Personally I like the stop / restart mechanism
> > as it essentially mean binds never can fail with a well behavied user.
> > wrt to Allocating up front, I think that is what Nouveau is doing? We
> > could study that code but allocating everything up front will be a
> > pretty large change.
> > 
> > Obviously we need to do some work here and this is one of our biggest
> > opens left but I don't view this as a true blocker until we remove force
> > probe. GPUVA regardless is a step in the right direction which will make
> > any changes going forward easier + enable VK to get sparse implemented.
> > I'd say let's get GPUVA merged and next focus on these issues you have
> > raised.
> 
> I think we need a pretty major rewrite here, and I think moving over to
> GPUVA would be cleaner with that done and possibilities of a clean error
> recovery in place. I stumbled on these WARN_ON(err) when starting to review,
> when it's IMO perfectly possible for that err to be non-zero, and since
> we're not in POC mode anymore I'm reluctant to Ack something that we know
> might blow up.
>

Everything will be easier if we move to GPUVA first and agree this needs
to be stable, from testing it is. Also GPUVA gets us everything we need
for sparse to which is a plus. Also I want to move some userptr stuff to
GPUVA, move extobj to GPUVA, some locking to GPUVA /w drm exec, and
perhaps preempt fence handling to DRM sched + GPUVA. I'd strongly
advocate for merging ASAP and stage the aforementioned + error handling
fixes behind this merge.

Can you point of the WARN? Is it not hanlding the error from
drm_gpuva_insert? Yea, I'll fix that.

> What I can do, though is to dig more into, Nouveau to see how they handle
> the error unwind, and start implementing a clean unwind also in our driver
> and rebase the GPUVA stuff on top.
>

Lets both dig in.

Matt

> /Thomas
> 
> 
> > 
> > Also another VM bind issue we need address soon:
> > https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/41
> > 
> > Matt
> > 
> > > Thoughts?
> > > 
> > > /Thomas
> > > 
> > > 
> > > 
> > > 
> > > > MAP 0x0000-0x8000 to NULL 	- 0x0000-0x8000 writes dropped + read zero
> > > > MAP 0x4000-0x5000 to a GEM 	- 0x0000-0x4000, 0x5000-0x8000 writes dropped + read zero; 0x4000-0x5000 mapped to a GEM
> > > > UNMAP 0x3000-0x6000		- 0x0000-0x3000, 0x6000-0x8000 writes dropped + read zero
> > > > UNMAP 0x0000-0x8000		- Nothing mapped
> > > > 
> > > > No changins to existing behavior, rather just new functionality./
> > > > 
> > > > v2: Fix CI build failure
> > > > v3: Export mas_preallocate, add patch to avoid rebinds
> > > > v5: Bug fixes, rebase, xe_vma size optimizations
> > > > 
> > > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > > 
> > > > Danilo Krummrich (2):
> > > >     maple_tree: split up MA_STATE() macro
> > > >     drm: manager to keep track of GPUs VA mappings
> > > > 
> > > > Matthew Brost (6):
> > > >     maple_tree: Export mas_preallocate
> > > >     drm/xe: Port Xe to GPUVA
> > > >     drm/xe: NULL binding implementation
> > > >     drm/xe: Avoid doing rebinds
> > > >     drm/xe: Reduce the number list links in xe_vma
> > > >     drm/xe: Optimize size of xe_vma allocation
> > > > 
> > > >    Documentation/gpu/drm-mm.rst                |   31 +
> > > >    drivers/gpu/drm/Makefile                    |    1 +
> > > >    drivers/gpu/drm/drm_debugfs.c               |   56 +
> > > >    drivers/gpu/drm/drm_gem.c                   |    3 +
> > > >    drivers/gpu/drm/drm_gpuva_mgr.c             | 1890 ++++++++++++++++++
> > > >    drivers/gpu/drm/xe/xe_bo.c                  |   10 +-
> > > >    drivers/gpu/drm/xe/xe_bo.h                  |    1 +
> > > >    drivers/gpu/drm/xe/xe_device.c              |    2 +-
> > > >    drivers/gpu/drm/xe/xe_exec.c                |    4 +-
> > > >    drivers/gpu/drm/xe/xe_gt_pagefault.c        |   29 +-
> > > >    drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |   14 +-
> > > >    drivers/gpu/drm/xe/xe_guc_ct.c              |    6 +-
> > > >    drivers/gpu/drm/xe/xe_migrate.c             |    8 +-
> > > >    drivers/gpu/drm/xe/xe_pt.c                  |  176 +-
> > > >    drivers/gpu/drm/xe/xe_trace.h               |   10 +-
> > > >    drivers/gpu/drm/xe/xe_vm.c                  | 1947 +++++++++----------
> > > >    drivers/gpu/drm/xe/xe_vm.h                  |   76 +-
> > > >    drivers/gpu/drm/xe/xe_vm_madvise.c          |   87 +-
> > > >    drivers/gpu/drm/xe/xe_vm_types.h            |  276 ++-
> > > >    include/drm/drm_debugfs.h                   |   24 +
> > > >    include/drm/drm_drv.h                       |    7 +
> > > >    include/drm/drm_gem.h                       |   75 +
> > > >    include/drm/drm_gpuva_mgr.h                 |  734 +++++++
> > > >    include/linux/maple_tree.h                  |    7 +-
> > > >    include/uapi/drm/xe_drm.h                   |    8 +
> > > >    lib/maple_tree.c                            |    1 +
> > > >    26 files changed, 4203 insertions(+), 1280 deletions(-)
> > > >    create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> > > >    create mode 100644 include/drm/drm_gpuva_mgr.h
> > > >