[Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic eviction

Thu Aug 15 16:23:14 UTC 2019

On Thu, Aug 15, 2019 at 5:26 PM Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> Quoting Matthew Auld (2019-08-15 11:48:04)
> > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel at ffwll.ch> wrote:
> > >
> > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > Support basic eviction for regions.
> > > >
> > > > Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > > > Cc: Abdiel Janulgue <abdiel.janulgue at linux.intel.com>
> > >
> > > So from a very high level this looks like it was largely modelled after
> > > i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
> > > running out of stuff" code). Any specific reasons?
> >
> > IIRC I think it was originally based on the patches that exposed
> > stolen-memory to userspace from a few years ago.
> >
> > >
> > > I think i915_gem_evict is a lot closer match for what we want for vram (it
> > > started out to manage severely limitted GTT on gen2/3/4) after all. With
> > > the complication that we'll have to manage physical memory with multiple
> > > virtual mappings of it on top, so unfortunately we can't just reuse the
> > > locking patter Chris has come up with in his struct_mutex-removal branch.
> > > But at least conceptually it should be a lot closer.
> >
> > When you say make it more like i915_gem_evict, what does that mean?
> > Are you talking about the eviction roster stuff, or the
> > placement/locking of the eviction logic, with it being deep down in
> > get_pages?
>
> The biggest difference would be the lack of region coalescing; the
> eviction code only tries to free what would result in a successful
> allocation. With the order being put into the scanner somewhat relevant,
> in practice, fragmentation effects cause the range search to be somewhat
> slow and we much prefer the random replacement -- while harmful, it is
> not biased as to who it harms, and so is consistent overhead. However,
> since you don't need to find a slot inside a small range within a few
> million objects, I would expect LRU or even MRU (recently used objects
> in games tend to be more ephemeral and so made good eviction targets, at
> least according to John Carmack back in the day) to require fewer major
> faults.
> https://github.com/ESWAT/john-carmack-plan-archive/blob/master/by_day/johnc_plan_20000307.txt
>
> You would need a very similar scanner to keep a journal of the potential
> frees from which to track the coalescing (slightly more complicated due
> to the disjoint nature of the buddy merges). One suspects that adding
> the scanner would shape the buddy_nodes more towards drm_mm_nodes.
>
> This is also a case where real world testing of a thrashing load beats
> simulation.  So just make sure the eviction doesn't stall the entire GPU
> and submission pipeline and you will be forgiven most transgressions.

Yeah the fancy roster is definitely not on the wishlist until we have
this all optimized already. And even then it's probably better to not
be fancy, since we don't really need a contiguous block for pretty
much anything.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch