[Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority

Thu Oct 15 20:09:32 UTC 2020

> -----Original Message-----
> From: Chris Wilson <chris at chris-wilson.co.uk>
> Sent: Thursday, October 15, 2020 8:07 AM
> To: Tang, CQ <cq.tang at intel.com>; intel-gfx at lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> Quoting Tang, CQ (2020-10-14 00:29:13)
> > i915_gem_free_object() is called by multiple threads/processes, they all
> add objects onto the same free_list. The free_list processing worker thread
> becomes bottle-neck. I see that the worker is mostly a single thread (with
> particular thread ID), but sometimes multiple threads are launched to
> process the 'free_list' work concurrently. But the processing speed is still
> slower than the multiple process's feeding speed, and 'free_list' is holding
> more and more memory.
> 
> We can also prune the free_list immediately, if we know we are outside of
> any critical section. (We do this before create ioctls, and I thought upon
> close(device), but I see that's just contexts.)
> 
> > The worker launching time is delayed a lot, we call queue_work() when we
> add the first object onto the empty 'free_list', but when the worker is
> launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is
> because of waiting currently running worker to finish?
> 
> 1M is a lot more than is comfortable, and that's even with a high-priority
> worker.  The problem with objects being freed from any context is that we
> can't simply put a flush_work around there. (Not without ridding ourselves of
> a few mutexes at least.) We could try more than worker, but it's no more
> more effort to starve 2 cpus than it is to starve 1.
> 
> No, with that much pressure the only option is to apply the backpressure at
> the point of allocation ala create_ioctl. i.e. find the hog, and look to see if
> there's a convenient spot before/after to call
> i915_gem_flush_free_objects(). Since you highlight the vma-stash as the
> likely culprit, and the free_pt_stash is unlikely to be inside any critical section,
> might as well try flushing from there for starters.

I have not yet tested, but I guess calling i915_gem_flush_free_objects() inside free_pt_stash() will solve the problem that gem_exec_gttfill has, because it will give some back pressure on the system traffic.

But this is only for the page table 4K lmem objects allocated/freed by vma-stash. We might encounter the same situation with user space allocated objects.

--CQ

> 
> Hmm, actually we are tantalizing close to having dropped all mutexes (and
> similar global lock-like effects) from free_objects. That would be a nice
> victory.
> -Chris