[Intel-gfx] [PATCH 2/3] drm/i915: Return immediately if trylock fails for direct-reclaim
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Nov 9 07:30:34 UTC 2018
On 08/11/2018 16:48, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-11-08 16:23:08)
>>
>> On 08/11/2018 08:17, Chris Wilson wrote:
>>> Ignore trying to shrink from i915 if we fail to acquire the struct_mutex
>>> in the shrinker while performing direct-reclaim. The trade-off being
>>> (much) lower latency for non-i915 clients at an increased risk of being
>>> unable to obtain a page from direct-reclaim without hitting the
>>> oom-notifier. The proviso being that we still keep trying to hard
>>> obtain the lock for oom so that we can reap under heavy memory pressure.
>>>
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> ---
>>> drivers/gpu/drm/i915/i915_gem_shrinker.c | 24 +++++++++++-------------
>>> 1 file changed, 11 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
>>> index ea90d3a0d511..d461f458f4af 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
>>> @@ -36,7 +36,9 @@
>>> #include "i915_drv.h"
>>> #include "i915_trace.h"
>>>
>>> -static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)
>>> +static bool shrinker_lock(struct drm_i915_private *i915,
>>> + unsigned int flags,
>>> + bool *unlock)
>>> {
>>> switch (mutex_trylock_recursive(&i915->drm.struct_mutex)) {
>>> case MUTEX_TRYLOCK_RECURSIVE:
>>> @@ -45,15 +47,11 @@ static bool shrinker_lock(struct drm_i915_private *i915, bool *unlock)
>>>
>>> case MUTEX_TRYLOCK_FAILED:
>>> *unlock = false;
>>> - preempt_disable();
>>> - do {
>>> - cpu_relax();
>>> - if (mutex_trylock(&i915->drm.struct_mutex)) {
>>> - *unlock = true;
>>> - break;
>>> - }
>>> - } while (!need_resched());
>>> - preempt_enable();
>>> + if (flags & I915_SHRINK_ACTIVE) {
>>
>> So until I915_SHRINK_ACTIVE, which is the last ditch attempt to shrink
>> in the normal case (direct reclaim?) or oom, we bail out on the first
>> sign of struct mutex contention. Doesn't this make our shrinker much
>> less effective at runtime and why is that OK?
>
> As I said, it's a tradeoff between blocking others for _several_
> _seconds_ and making no progress and returning immediately and making no
> progress. My argument is along the lines of if direct-reclaim is running
> in another process and something else is engaged in the driver hopefully
> the driver will be cleaning up as it goes along or else what remains is
> active and won't be reaped anyway. If direct reclaim is failing, the
> delay before trying the oom path is insignificant.
What was the rationale behind busy looping there btw? Compared to
perhaps an alternative of micro-sleeps and trying a few times? I know it
would be opposite from what this patch is trying to achieve, I Just
don't had a good judgment on what makes most sense for the shrinker. Is
it better to perhaps try a little bit harder instead of giving up
immediately, but try a little bit harder in a softer way? Or that ends
up blocking the callers and has the same effect of making no progress?
>> Or in other words, for what use cases, tests or benchmark was the
>> existing approach of busy looping a problem?
>
> Do something like 'find / -exec cat' while running i915 and see how long
> you have to wait for a khungtaskd :|
I couldn't reproduce anything strange with this. Assuming you meant
something like -exec cat { } \; >dev/null.
Either case I think explanations like this should go into the commit
message.
> gem_syslatency reports max latencies of over 600s, and I'm sure it's
> pretty much unbounded. It's bad enough that I expect we are the cause of
> significant hitching (mainly in other tasks) on the desktop running at
> memory capacity.
Running gem_syslatency in parallel to the find, or gem_syslatency -b in
parallel or not did not do anything.
Then I tries gem_shrink but that doesn't seem to be making any progress
with or without the patch. But it's not even hitting the i915 hard, but
perhaps I need to turn off lockdep.. Which would be a bummer since I
wanted to have a lockdep to check the next bit..
>>> + mutex_lock_nested(&i915->drm.struct_mutex,
>>> + I915_MM_SHRINKER);
>>
>> _nested is needed since abandoning trylock would otherwise cause lockdep
>> issues? But is it really safe? I don't know.. how can it be? It is
>> guaranteed to be a different process here otherwise the result wouldn't
>> be MUTEX_TRYLOCK_FAILED.
>
> The easy option was to only force the mutex_lock for kswapd. However,
> noting that we do need to forcibly shrink before oom, I opted for the
> more inclusive attempt to take it in both. For the oom approach, my only
> handwaving is we shouldn't be nested under oom serialisation and so
> avoid an ABBA deadlock. It's survived some oom abuse, but the machine
> still becomes unresponsive (but pingable) eventually (as it does before
> the patch).
...which I completely did not understand! :(
When you say kswapd - that means mutex->owner != current in the shrinker
and otherwise it is direct reclaim?
If true, how is the nested annotation correct for the kswapd case?
>
>> Also, I915_MM_SHRINKER is so far only documented to apply to obj->mm.lock.
>
> It suits very nicely as it being applied for the same purpose.
Maybe rename or at least update the comment where they are defined?
Regards,
Tvrtko
More information about the Intel-gfx
mailing list