[PATCH] drm/xe: Implement clear VRAM on free

Wed Jun 11 19:03:51 UTC 2025

On Wed, Jun 11, 2025 at 12:01:10PM -0700, Matthew Brost wrote:

One correction...

> On Wed, Jun 11, 2025 at 12:23:13PM -0600, Summers, Stuart wrote:
> > On Wed, 2025-06-11 at 11:20 -0700, Matthew Brost wrote:
> > > On Wed, Jun 11, 2025 at 12:05:36PM -0600, Summers, Stuart wrote:
> > > > On Wed, 2025-06-11 at 10:57 -0700, Matthew Brost wrote:
> > > > > On Wed, Jun 11, 2025 at 11:04:06AM -0600, Summers, Stuart wrote:
> > > > > > On Wed, 2025-06-11 at 09:46 -0700, Matthew Brost wrote:
> > > > > > > On Wed, Jun 11, 2025 at 10:26:44AM -0600, Summers, Stuart
> > > > > > > wrote:
> > > > > > > > On Tue, 2025-06-10 at 22:42 -0700, Matthew Brost wrote:
> > > > > > > > > Clearing on free should hide latency of BO clears on new
> > > > > > > > > user
> > > > > > > > > BO
> > > > > > > > > allocations.
> > > > > > > > 
> > > > > > > > Do you have any test results showing the latency
> > > > > > > > improvement
> > > > > > > > here
> > > > > > > > on
> > > > > > > > high volume submission tests? Also how does this impact at
> > > > > > > > very
> > > > > > > > large
> > > > > > > 
> > > > > > > No performance data yet available. Something we definitely
> > > > > > > should
> > > > > > > look
> > > > > > > into and would be curious about the results. FWIW, AMDGPU
> > > > > > > implements
> > > > > > > clear on free in a very similar way to this series. 
> > 
> > So back to this one, what is the motivation for this change if we don't
> > have data?
> > 
> 
> It is the assumption that in typical cases (no memory pressure) the
> buddy allocator pool can find a clean VRAM upon allocation resulting in
> immediate use of the BO rather than issuing a clear which could delay a
> bind IOCTL, an exec IOCTL, or even mmaping the BO (dma-resv enforces
> this ordering btw).
> 
> Even if there is memory pressure, it is fairly easy to reason that
> clear-on-free creates a pipeline in which the clear is started earlier
> than it would be upon allocation thus reducing the overall delay of the
> BO usage after allocation.
> 
> Yes, there might be odd corner of large clear-on-free followed by small
> allocation where it might be slightly worse, but overall it seems to be
> an easy win - both AMD / Intel have done work to make clear-on-free
> possible.
> 
> > > > > > 
> > > > > > I had a few other comments towards the end of the patch.
> > > > > > Basically
> > > > > > I
> > > > > 
> > > > > I should slow down and read... Repling to all comments here.
> > > > > 
> > > > > > think it would be nice to be able to configure this. There was
> > > > > > significant testing done here for Aurora in i915 and the
> > > > > > performance
> > > > > > benefits were potentially different based on the types of
> > > > > > workloads
> > > > > > being used. Having user hints for instance might allow us to
> > > > > > take
> > > > > > advantage of those workload specific use cases.
> > > > > > 
> > > > > 
> > > > > We can wire this mode to the uAPI for BO creation if needed -
> > > > > this
> > > > > patch
> > > > > does clear-on-free blindly for any user BO. Ofc with uAPI we need
> > > > > a
> > > > > UMD
> > > > > user to upstream. We may just want to start with this patch and
> > > > > do
> > > > > that
> > > > > in a follow up if needed as it a minor tweak from a code PoV.
> > > > 
> > > > Actually it would be great if you can add exactly this to the
> > > > commit
> > > > message. Basically we are intentionally doing clear on free here
> > > > blindly because we don't currently have a user for some kind of
> > > > more
> > > > complicated (from the user perspective) hint system. (just kind of
> > > > repeating what you said)
> > > > 
> > > > This way we have some documentation of why we chose this route
> > > > other
> > > > than just because amd is doing that.
> > > > 
> > > 
> > > Sure.
> > > 
> > > > > 
> > > > > > > 
> > > > > > > > buffer size submissions and mix of large and small buffer
> > > > > > > > submissions
> > > > > > > > with eviction between different processes?
> > > > > > > > 
> > > > > > > 
> > > > > > > Eviction somewhat orthogonal to this. If an eviction occurs
> > > > > > > and
> > > > > > > it is
> > > > > > > a
> > > > > > > new allocation a clear would still have be issued ahead
> > > > > > > handing
> > > > > > > the
> > > > > > > memory over to the new BO, if eviction occurs and paged in BO
> > > > > > > has
> > > > > > > backing store (it was previously evicted) we'd just copy the
> > > > > > > contents
> > > > > > > into the allocated memory.
> > > > > > 
> > > > > > Good point. I still think we should consider the scenarios of
> > > > > > having a
> > > > > > large buffer workload running and then clearing, then we have a
> > > > > > small
> > > > > > buffer workload coming in which now needs to wait on the clear
> > > > > > from
> > > > > > that large workload. Whereas on clear-on-alloc for the small
> > > > > > workload,
> > > > > > we can get that submission in quickly and not wait for that
> > > > > > larger
> > > > > > one.
> > > > > > 
> > > > > 
> > > > > The buddy allocator tries to allocate cleared VRAM first, then
> > > > > falls
> > > > > back to dirty VRAM. Dirty VRAM will still be cleared upon
> > > > > allocation
> > > > > in
> > > > > this patch.
> > > > > 
> > > > > The senerio you describe could result in eviction actually. If a
> > > > > large
> > > > > BO has pending clear - it won't be in either buddy allocator pool
> > > > > -
> > > > > it
> > > > > will be TTM as ghost object (pending free). If the buddy
> > > > > allocator
> > > > > fails
> > > > > an allocation, eviction is triggered with the ghost objects in
> > > > > TTM
> > > > > being
> > > > > tried to allocate first (I think, if I'm misstaing TTM internals
> > > > > my
> > > > > bad). This would as you suggest result in the BO allocation
> > > > > waiting
> > > > > on
> > > > > potentially a large clear to finish even though the allocation is
> > > > > small
> > > > > in size.
> > > > 
> > > > So we have had customers in the past who have wanted explicit
> > > > disabling
> > > > of eviction. I agree in the current implementation that might be
> > > > the
> > > 
> > > Explicitly disabling eviction is not an upstream option, downstream
> > > in
> > > i915 you can do whatever you want as there are no rules. We have
> > > discussed a pin accounting controler vis cgroups or something
> > > upstream
> > > which may be acceptable. Different discussion though.
> > > 
> > > > case, but in the future we might want such a capability, so there
> > > > is
> > > > still a chance of blocking. But again, maybe this can be a future
> > > > enhancement as you said.
> > > > 
> > > > > 
> > > > > > I don't think there is necessarily a one-size-fits-all solution
> > > > > > here
> > > > > > which is why I think the hints are important.
> > > > > > 
> > > > > 
> > > > > Jumping to new uAPI + hints immediately might not be the best
> > > > > approach,
> > > > > but as I stated this is minor thing to add if needed.
> > > > > 
> > > > > > Thanks,
> > > > > > Stuart
> > > > > > 
> > > > > > > 
> > > > > > > Matt
> > > > > > > 
> > > > > > > > > 
> > > > > > > > > Implemented via calling xe_migrate_clear in release
> > > > > > > > > notify
> > > > > > > > > and
> > > > > > > > > updating
> > > > > > > > > iterator in xe_migrate_clear to skip cleared buddy
> > > > > > > > > blocks.
> > > > > > > > > Only
> > > > > > > > > user
> > > > > > > > > BOs
> > > > > > > > > cleared in release notify as kernel BOs could still be in
> > > > > > > > > use
> > > > > > > > > (e.g.,
> > > > > > > > > PT
> > > > > > > > > BOs need to wait for dma-resv to be idle).
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/gpu/drm/xe/xe_bo.c           | 47
> > > > > > > > > ++++++++++++++++++++++++++++
> > > > > > > > >  drivers/gpu/drm/xe/xe_migrate.c      | 14 ++++++---
> > > > > > > > >  drivers/gpu/drm/xe/xe_migrate.h      |  1 +
> > > > > > > > >  drivers/gpu/drm/xe/xe_res_cursor.h   | 26
> > > > > > > > > +++++++++++++++
> > > > > > > > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.c |  5 ++-
> > > > > > > > >  drivers/gpu/drm/xe/xe_ttm_vram_mgr.h |  6 ++++
> > > > > > > > >  6 files changed, 94 insertions(+), 5 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > index 4e39188a021a..74470f4d418d 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_bo.c
> > > > > > > > > @@ -1434,6 +1434,51 @@ static bool
> > > > > > > > > xe_ttm_bo_lock_in_destructor(struct ttm_buffer_object
> > > > > > > > > *ttm_bo)
> > > > > > > > >         return locked;
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +static void xe_ttm_bo_release_clear(struct
> > > > > > > > > ttm_buffer_object
> > > > > > > > > *ttm_bo)
> > > > > > > > > +{
> > > > > > > > > +       struct xe_device *xe = ttm_to_xe_device(ttm_bo-
> > > > > > > > > > bdev);
> > > > > > > > > +       struct dma_fence *fence;
> > > > > > > > > +       int err, idx;
> > > > > > > > > +
> > > > > > > > > +       xe_bo_assert_held(ttm_to_xe_bo(ttm_bo));
> > > > > > > > > +
> > > > > > > > > +       if (ttm_bo->type != ttm_bo_type_device)
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > > > +       if (xe_device_wedged(xe))
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > > > +       if (!ttm_bo->resource ||
> > > > > > > > > !mem_type_is_vram(ttm_bo-
> > > > > > > > > > resource-
> > > > > > > > > > mem_type))
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > > > +       if (!drm_dev_enter(&xe->drm, &idx))
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > > > +       if (!xe_pm_runtime_get_if_active(xe))
> > > > > > > > > +               goto unbind;
> > > > > > > > > +
> > > > > > > > > +       err = dma_resv_reserve_fences(&ttm_bo-
> > > > > > > > > >base._resv,
> > > > > > > > > 1);
> > > > > > > > > +       if (err)
> > > > > > > > > +               goto put_pm;
> > > > > > > > > +
> > > > > > > > > +       fence = xe_migrate_clear(mem_type_to_migrate(xe,
> > > > > > > > > ttm_bo-
> > > > > > > > > > resource->mem_type),
> > > > > > > > > +                                ttm_to_xe_bo(ttm_bo),
> > > > > > > > > ttm_bo-
> > > > > > > > > > resource,
> > > > > > > > > +                               
> > > > > > > > > XE_MIGRATE_CLEAR_FLAG_FULL |
> > > > > > > > > +                               
> > > > > > > > > XE_MIGRATE_CLEAR_NON_DIRTY);
> > > > > > > > > +       if (XE_WARN_ON(IS_ERR(fence)))
> > > > > > > > > +               goto put_pm;
> > > > > > > > > +
> > > > > > > > > +       xe_ttm_vram_mgr_resource_set_cleared(ttm_bo-
> > > > > > > > > > resource);
> > > > > > > > > +       dma_resv_add_fence(&ttm_bo->base._resv, fence,
> > > > > > > > > +                          DMA_RESV_USAGE_KERNEL);
> > > > > > > > > +       dma_fence_put(fence);
> > > > > > > > > +
> > > > > > > > > +put_pm:
> > > > > > > > > +       xe_pm_runtime_put(xe);
> > > > > > > > > +unbind:
> > > > > > > > > +       drm_dev_exit(idx);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static void xe_ttm_bo_release_notify(struct
> > > > > > > > > ttm_buffer_object
> > > > > > > > > *ttm_bo)
> > > > > > > > >  {
> > > > > > > > >         struct dma_resv_iter cursor;
> > > > > > > > > @@ -1478,6 +1523,8 @@ static void
> > > > > > > > > xe_ttm_bo_release_notify(struct
> > > > > > > > > ttm_buffer_object *ttm_bo)
> > > > > > > > >         }
> > > > > > > > >         dma_fence_put(replacement);
> > > > > > > > >  
> > > > > > > > > +       xe_ttm_bo_release_clear(ttm_bo);
> > > > > > > > > +
> > > > > > > > >         dma_resv_unlock(ttm_bo->base.resv);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > > > > index 8f8e9fdfb2a8..39d7200cb366 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > > > > > > > > @@ -1063,7 +1063,7 @@ struct dma_fence
> > > > > > > > > *xe_migrate_clear(struct
> > > > > > > > > xe_migrate *m,
> > > > > > > > >         struct xe_gt *gt = m->tile->primary_gt;
> > > > > > > > >         struct xe_device *xe = gt_to_xe(gt);
> > > > > > > > >         bool clear_only_system_ccs = false;
> > > > > > > > > -       struct dma_fence *fence = NULL;
> > > > > > > > > +       struct dma_fence *fence = dma_fence_get_stub();
> > > > > > > > >         u64 size = bo->size;
> > > > > > > > >         struct xe_res_cursor src_it;
> > > > > > > > >         struct ttm_resource *src = dst;
> > > > > > > > > @@ -1075,10 +1075,13 @@ struct dma_fence
> > > > > > > > > *xe_migrate_clear(struct
> > > > > > > > > xe_migrate *m,
> > > > > > > > >         if (!clear_bo_data && clear_ccs && !IS_DGFX(xe))
> > > > > > > > >                 clear_only_system_ccs = true;
> > > > > > > > >  
> > > > > > > > > -       if (!clear_vram)
> > > > > > > > > +       if (!clear_vram) {
> > > > > > > > >                 xe_res_first_sg(xe_bo_sg(bo), 0, bo-
> > > > > > > > > >size,
> > > > > > > > > &src_it);
> > > > > > > > > -       else
> > > > > > > > > +       } else {
> > > > > > > > >                 xe_res_first(src, 0, bo->size, &src_it);
> > > > > > > > > +               if (!(clear_flags &
> > > > > > > > > XE_MIGRATE_CLEAR_NON_DIRTY))
> > > > > > > > > +                       size -=
> > > > > > > > > xe_res_next_dirty(&src_it);
> > > > > > > > > +       }
> > > > > > > > >  
> > > > > > > > >         while (size) {
> > > > > > > > >                 u64 clear_L0_ofs;
> > > > > > > > > @@ -1125,6 +1128,9 @@ struct dma_fence
> > > > > > > > > *xe_migrate_clear(struct
> > > > > > > > > xe_migrate *m,
> > > > > > > > >                         emit_pte(m, bb, clear_L0_pt,
> > > > > > > > > clear_vram,
> > > > > > > > > clear_only_system_ccs,
> > > > > > > > >                                  &src_it, clear_L0, dst);
> > > > > > > > >  
> > > > > > > > > +               if (clear_vram && !(clear_flags &
> > > > > > > > > XE_MIGRATE_CLEAR_NON_DIRTY))
> > > > > > > > > +                       size -=
> > > > > > > > > xe_res_next_dirty(&src_it);
> > > > > > > > > +
> > > > > > > > >                 bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
> > > > > > > > >                 update_idx = bb->len;
> > > > > > > > >  
> > > > > > > > > @@ -1146,7 +1152,7 @@ struct dma_fence
> > > > > > > > > *xe_migrate_clear(struct
> > > > > > > > > xe_migrate *m,
> > > > > > > > >                 }
> > > > > > > > >  
> > > > > > > > >                 xe_sched_job_add_migrate_flush(job,
> > > > > > > > > flush_flags);
> > > > > > > > > -               if (!fence) {
> > > > > > > > > +               if (fence == dma_fence_get_stub()) {
> > > > > > > > >                         /*
> > > > > > > > >                          * There can't be anything
> > > > > > > > > userspace
> > > > > > > > > related
> > > > > > > > > at this
> > > > > > > > >                          * point, so we just need to
> > > > > > > > > respect
> > > > > > > > > any
> > > > > > > > > potential move
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_migrate.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_migrate.h
> > > > > > > > > index fb9839c1bae0..58a7b747ef11 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_migrate.h
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_migrate.h
> > > > > > > > > @@ -118,6 +118,7 @@ int xe_migrate_access_memory(struct
> > > > > > > > > xe_migrate
> > > > > > > > > *m, struct xe_bo *bo,
> > > > > > > > >  
> > > > > > > > >  #define XE_MIGRATE_CLEAR_FLAG_BO_DATA          BIT(0)
> > > > > > > > >  #define XE_MIGRATE_CLEAR_FLAG_CCS_DATA         BIT(1)
> > > > > > > > > +#define XE_MIGRATE_CLEAR_NON_DIRTY             BIT(2)
> > > > > > > > >  #define
> > > > > > > > > XE_MIGRATE_CLEAR_FLAG_FULL     (XE_MIGRATE_CLEAR_FLAG_BO_
> > > > > > > > > DATA
> > > > > > > > > > \
> > > > > > > > >                                         XE_MIGRATE_CLEAR_
> > > > > > > > > FLAG
> > > > > > > > > _CCS
> > > > > > > > > _DAT
> > > > > > > > > A)
> > > > > > > > >  struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_res_cursor.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_res_cursor.h
> > > > > > > > > index d1a403cfb628..630082e809ba 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_res_cursor.h
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_res_cursor.h
> > > > > > > > > @@ -315,6 +315,32 @@ static inline void
> > > > > > > > > xe_res_next(struct
> > > > > > > > > xe_res_cursor *cur, u64 size)
> > > > > > > > >         }
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +/**
> > > > > > > > > + * xe_res_next_dirty - advance the cursor to next dirty
> > > > > > > > > buddy
> > > > > > > > > block
> > > > > > > > > + *
> > > > > > > > > + * @cur: the cursor to advance
> > > > > > > > > + *
> > > > > > > > > + * Move the cursor until dirty buddy block is found.
> > > > > > > > > + *
> > > > > > > > > + * Return: Number of bytes cursor has been advanced
> > > > > > > > > + */
> > > > > > > > > +static inline u64 xe_res_next_dirty(struct xe_res_cursor
> > > > > > > > > *cur)
> > > > > > > > > +{
> > > > > > > > > +       struct drm_buddy_block *block = cur->node;
> > > > > > > > > +       u64 bytes = 0;
> > > > > > > > > +
> > > > > > > > > +       XE_WARN_ON(cur->mem_type != XE_PL_VRAM0 &&
> > > > > > > > > +                  cur->mem_type != XE_PL_VRAM1);
> > > > > > > > 
> > > > > > > > What if we have more than just these two? Maybe check
> > > > > > > > against
> > > > > > > > the
> > > > > > > > mask
> > > > > > > > instead.
> > > > > 
> > > > > Sure. I think we do this test in a couple of other places in the
> > > > > driver too
> > > > > and clean this up in seperate patch first.
> > > > 
> > > > No problem, just wanted to call it out. We can do this later..
> > > > 
> > > > > 
> > > > > > > > 
> > > > > > > > > +
> > > > > > > > > +       while (cur->remaining &&
> > > > > > > > > drm_buddy_block_is_clear(block))
> > > > > > > > > {
> > > > > > > > > +               bytes += cur->size;
> > > > > > > > > +               xe_res_next(cur, cur->size);
> > > > > > > > > +               block = cur->node;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       return bytes;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  /**
> > > > > > > > >   * xe_res_dma - return dma address of cursor at current
> > > > > > > > > position
> > > > > > > > >   *
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > > > > > > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > > > > > > > index 9e375a40aee9..120046941c1e 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
> > > > > > > > > @@ -84,6 +84,9 @@ static int xe_ttm_vram_mgr_new(struct
> > > > > > > > > ttm_resource_manager *man,
> > > > > > > > >         if (place->fpfn || lpfn != man->size >>
> > > > > > > > > PAGE_SHIFT)
> > > > > > > > >                 vres->flags |=
> > > > > > > > > DRM_BUDDY_RANGE_ALLOCATION;
> > > > > > > > >  
> > > > > > > > > +       if (tbo->type == ttm_bo_type_device)
> > > > > > > > > +               vres->flags |=
> > > > > > > > > DRM_BUDDY_CLEAR_ALLOCATION;
> > > > > > > > > +
> > > > > > > > >         if (WARN_ON(!vres->base.size)) {
> > > > > > > > >                 err = -EINVAL;
> > > > > > > > >                 goto error_fini;
> > > > > > > > > @@ -187,7 +190,7 @@ static void
> > > > > > > > > xe_ttm_vram_mgr_del(struct
> > > > > > > > > ttm_resource_manager *man,
> > > > > > > > >         struct drm_buddy *mm = &mgr->mm;
> > > > > > > > >  
> > > > > > > > >         mutex_lock(&mgr->lock);
> > > > > > > > > -       drm_buddy_free_list(mm, &vres->blocks, 0);
> > > > > > > > > +       drm_buddy_free_list(mm, &vres->blocks, vres-
> > > > > > > > > >flags);
> > > > > > > > >         mgr->visible_avail += vres->used_visible_size;
> > > > > > > > >         mutex_unlock(&mgr->lock);
> > > > > > > > >  
> > > > > > > > > diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > > > > > > > b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > > > > > > > index cc76050e376d..dfc0e6890b3c 100644
> > > > > > > > > --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > > > > > > > +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.h
> > > > > > > > > @@ -36,6 +36,12 @@ to_xe_ttm_vram_mgr_resource(struct
> > > > > > > > > ttm_resource
> > > > > > > > > *res)
> > > > > > > > >         return container_of(res, struct
> > > > > > > > > xe_ttm_vram_mgr_resource,
> > > > > > > > > base);
> > > > > > > > >  }
> > > > > > > > >  
> > > > > > > > > +static inline void
> > > > > > > > > +xe_ttm_vram_mgr_resource_set_cleared(struct ttm_resource
> > > > > > > > > *res)
> > > > > > > > > +{
> > > > > > > > > +       to_xe_ttm_vram_mgr_resource(res)->flags |=
> > > > > > > > > DRM_BUDDY_CLEARED;
> > > > > > > > 
> > > > > > > > I see the amd driver is doing the same thing here. Maybe we
> > > > > > > > can
> > > > > > > > pull
> > > > > > > > this in based on the flags given at the resource
> > > > > > > > initialization
> > > > > > > > so
> > > > > > > > we
> > > > > > > > can potentially tweak this for different scenarios (when to
> > > > > > > > free)
> > > > > > > > and
> > > > > > > > converge what we're doing here and what amd is doing into
> > > > > > > > the
> > > > > > > > common
> > > > > > > > code.
> > > > > > > > 
> > > > > 
> > > > > AMD has a flag, which is the same test in this series as 'type ==
> > > > > ttm_bo_type_device'. AMD blindly sets this flag on user BO
> > > > > creation
> > > > > but
> > > > > as I stated above we could wire this to uAPI if needed.
> > > > > 
> > > > > Also I think there are some kernel BOs which clear-on-free is
> > > > > safe
> > > > > too
> > > > > (e.g., I think only PT BOs need contents to stick around after
> > > > > final
> > > > > put
> > > > > until dma-resv is clean of fences). Would have to fully to test
> > > > > for
> > > > > definitive answer though.
> > > > 
> > > > In most cases the kernel operations are going to be lower priority
> > > > than
> > > 
> > > It is the inverse of that - kernel operations are a always higher
> > > priority as without them a user app can't function (think any page
> > > table
> > > memory allocation or memory allocation for a LRC). This is one of
> > > reasons we just pin kernel memory in Xe compared evictable kernel
> > > memory
> > > in the i915 (also evicting kernel memory is a huge pain as we have
> > > idle
> > > the GuC state which could use that memory).
> > 
> > We're talking about kernel operations attached to different processes
> > though. The incoming kernel work would be for a new user process
> > (prepping the buffer for submission). The existing user process IMO
> > should have higher priority.
> > 
> 
> You still have to wait on the user operation complete in dma-resv mode
> before stealing memory.
> 
> In LR preempt fence mode, we'd wait until LR VM is preempted off the
> hardware.
> 
> In LR fault mode, we can steal the memory immediately.
> 

s/immediately/after TLB invalidation completes

Matt

> There is no difference though if we are trying to get kernel memory or
> user memory - this is just how eviction works.
> 
> Also if we are in LR mode, we have no idea how long the user processes
> which we stealing memory from will take to complete (e.g., we do not
> install fences in dma-resv attached to jobs as the fences could run
> forever breaking the rules of fence or in other word breaking reclaim).
> 
> This is where pinning of user memory could come in play /w some
> accounting controler, likely cgroups. LR mode could pin some memory to
> ensure fast forward progress or buffers it really doesn't want to be
> moved around (e.g., buffers exported over a fast network interconnect).
> 
> Matt
> 
> > Thanks,
> > Stuart
> > 
> > > 
> > > > the user operations (with the exception of maybe page faults). We
> > > > had
> > > > looked at things in i915 like preempting kernel operations when a
> > > > user
> > > > op comes in (the user clear-on-alloc preempts the kernel clear-on-
> > > > free/alloc). This probably falls in the same optimization category
> > > > you
> > > 
> > > We definitely won't do that as this would require multiple migration
> > > queues and preemption via the GuC is so expensive I'd doubt you'd
> > > ever
> > > win. I guess we could have a convolved algorithm in the migration
> > > queue
> > > to jump to higher priortiy jobs but would need some really strong
> > > data
> > > indicating this is needed. Also with dma-resv basically the migration
> > > queue needs to be idle before anything from the user can run.
> > > 
> > > Matt
> > > 
> > > > mentioned, although it doesn't necessarily need a user hint.
> > > > 
> > > > Thanks,
> > > > Stuart
> > > > 
> > > > > 
> > > > > Matt
> > > > > 
> > > > > > > > Thanks,
> > > > > > > > Stuart
> > > > > > > > 
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static inline struct xe_ttm_vram_mgr *
> > > > > > > > >  to_xe_ttm_vram_mgr(struct ttm_resource_manager *man)
> > > > > > > > >  {
> > > > > > > > 
> > > > > > 
> > > > 
> >