[Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

Thu Oct 15 20:32:40 UTC 2020

Quoting Tang, CQ (2020-10-15 21:09:32)
> 
> 
> > -----Original Message-----
> > From: Chris Wilson <chris at chris-wilson.co.uk>
> > Sent: Thursday, October 15, 2020 8:07 AM
> > To: Tang, CQ <cq.tang at intel.com>; intel-gfx at lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> > high priority
> > 
> > Quoting Tang, CQ (2020-10-14 00:29:13)
> > > i915_gem_free_object() is called by multiple threads/processes, they all
> > add objects onto the same free_list. The free_list processing worker thread
> > becomes bottle-neck. I see that the worker is mostly a single thread (with
> > particular thread ID), but sometimes multiple threads are launched to
> > process the 'free_list' work concurrently. But the processing speed is still
> > slower than the multiple process's feeding speed, and 'free_list' is holding
> > more and more memory.
> > 
> > We can also prune the free_list immediately, if we know we are outside of
> > any critical section. (We do this before create ioctls, and I thought upon
> > close(device), but I see that's just contexts.)
> > 
> > > The worker launching time is delayed a lot, we call queue_work() when we
> > add the first object onto the empty 'free_list', but when the worker is
> > launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is
> > because of waiting currently running worker to finish?
> > 
> > 1M is a lot more than is comfortable, and that's even with a high-priority
> > worker.  The problem with objects being freed from any context is that we
> > can't simply put a flush_work around there. (Not without ridding ourselves of
> > a few mutexes at least.) We could try more than worker, but it's no more
> > more effort to starve 2 cpus than it is to starve 1.
> > 
> > No, with that much pressure the only option is to apply the backpressure at
> > the point of allocation ala create_ioctl. i.e. find the hog, and look to see if
> > there's a convenient spot before/after to call
> > i915_gem_flush_free_objects(). Since you highlight the vma-stash as the
> > likely culprit, and the free_pt_stash is unlikely to be inside any critical section,
> > might as well try flushing from there for starters.
> 
> I have not yet tested, but I guess calling i915_gem_flush_free_objects() inside free_pt_stash() will solve the problem that gem_exec_gttfill has, because it will give some back pressure on the system traffic.

Still I'm slightly concerned that so many PD objects are being created;
it's not something that shows up in the smem ppgtt tests (or at least
it's been dwarfed by other bottlenecks), and the set of vma (and so the
PD) are meant to reach a steady state. You would need to be using a
constant set of objects and recycling the vma, not to hit the
create_ioctl flush. However, it points back to the pressure point being
around the vma bind.

> But this is only for the page table 4K lmem objects allocated/freed by vma-stash. We might encounter the same situation with user space allocated objects.

See gem_exec_create, it's mission is to cause memory starvation by
creating as many new objects as it can and releasing them after a nop
batch. That's why we have the freelist flush from create_ioctl.

Now I need to add a pass that tries to create as many vma from a few
objects as is possible.

(And similarly why we try to free requests as they are created.)

One problem is that they will catch the client after the hog, not
necessarily the hog themselves.

I'm optimistic we can make freeing the object atomic, even if that means
pushing the pages onto some reclaim list. (Which is currently a really
nasty drawback of the free worker, a trick lost with the removal of
struct_mutex.)
-Chris