[Intel-gfx] [PATCH v2] drm/i915: Shrink the GEM kmem_caches upon idling
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Jan 16 17:25:25 UTC 2018
On 16/01/2018 15:21, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-16 15:12:43)
>>
>> On 16/01/2018 13:05, Chris Wilson wrote:
>>> When we finally decide the gpu is idle, that is a good time to shrink
>>> our kmem_caches.
>>>
>>> v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
>>> worker.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
>>> 1 file changed, 30 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 335731c93b4a..61b13fdfaa71 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
>>> }
>>> }
>>>
>>> +static void shrink_caches(struct drm_i915_private *i915)
>>> +{
>>> + /*
>>> + * kmem_cache_shrink() discards empty slabs and reorders partially
>>> + * filled slabs to prioritise allocating from the mostly full slabs,
>>> + * with the aim of reducing fragmentation.
>>> + */
>>> + kmem_cache_shrink(i915->priorities);
>>> + kmem_cache_shrink(i915->dependencies);
>>> + kmem_cache_shrink(i915->requests);
>>> + kmem_cache_shrink(i915->luts);
>>> + kmem_cache_shrink(i915->vmas);
>>> + kmem_cache_shrink(i915->objects);
>>> +}
>>> +
>>> static inline bool
>>> new_requests_since_last_retire(const struct drm_i915_private *i915)
>>> {
>>> @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
>>> GEM_BUG_ON(!dev_priv->gt.awake);
>>> i915_queue_hangcheck(dev_priv);
>>> }
>>> +
>>> + /*
>>> + * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
>>> + * returned to the system imediately but only after an RCU grace
>>> + * period. We want to encourage such pages to be returned and so
>>> + * incorporate a RCU barrier here to provide some rate limiting
>>> + * of the driver and flush the old pages before we free a new batch
>>> + * from the next round of shrinking.
>>> + */
>>> + rcu_barrier();
>>
>> Should this go into the conditional below? I don't think it makes a
>> difference effectively, but may be more logical.
>
> My thinking was to have the check after the sleep as the state is
> subject to change. I'm not concerned about the random unnecessary pauses
> on this wq, since it is subject to struct_mutex delays, so was quite
The delay doesn't worry me, but just that it is random - neither the
appearance of new requests, or completion of existing ones, has nothing
to do with one RCU grace period.
> happy to think of this as being "we shall only do one idle pass per RCU
> grace period".
Idle worker is probably several orders of magnitude less frequent than
RCU grace periods so I don't think that can be a concern.
Hm..
>>> +
>>> + if (!new_requests_since_last_retire(dev_priv)) {
>>> + __i915_gem_free_work(&dev_priv->mm.free_work);
... you wouldn't want to pull this up under the struct mutex section? It
would need a different flavour of a function to be called, and some
refactoring of the existing ones.
shrink_caches could be left here under the same check and preceded by
rcu_barrier.
>> I thought for a bit if re-using the worker from here is completely fine
>> but I think it is. We expect only one pass when called from here so
>> need_resched will be correctly neutralized/not-relevant from this path.
>
> At present, I was only thinking about the single path. This was meant to
> resemble i915_gem_drain_objects(), without the recursion :)
>
>> Hm, unless if we consider mmap_gtt users.. so we could still have new
>> objects appearing on the free_list after the 1st pass. And then
>> need_resched might kick us out. What do you think?
>
> Not just mmap_gtt, any user freeing objects (coupled with RCU grace
> periods). I don't think it matters if we happen to loop until the
> timeslice is consumed as we are doing work that we would be doing
> anyway on this i915->wq.
Yeah doesn't matter - I was thinking if we should explicitly not
consider need_resched when called from the idle worker and only grab the
first batch - what's currently on the freed list.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list