[Intel-gfx] [PATCH] drm/i915: Limit the backpressure for i915_request allocation

Wed Sep 12 14:47:21 UTC 2018

On Wed, Sep 12, 2018 at 3:42 PM, Chris Wilson <chris at chris-wilson.co.uk> wrote:
> Quoting Tvrtko Ursulin (2018-09-12 14:34:16)
>>
>> On 12/09/2018 12:11, Chris Wilson wrote:
>> > If we try and fail to allocate a i915_request, we apply some
>> > backpressure on the clients to throttle the memory allocations coming
>> > from i915.ko. Currently, we wait until completely idle, but this is far
>> > too heavy and leads to some situations where the only escape is to
>> > declare a client hung and reset the GPU. The intent is to only ratelimit
>> > the allocation requests, so we need only wait for a jiffie before using
>> > the normal direct reclaim.
>> >
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
>> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>> > ---
>> >   drivers/gpu/drm/i915/i915_request.c | 2 +-
>> >   1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> > index 09ed48833b54..588bc5a4d18b 100644
>> > --- a/drivers/gpu/drm/i915/i915_request.c
>> > +++ b/drivers/gpu/drm/i915/i915_request.c
>> > @@ -736,7 +736,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>> >               ret = i915_gem_wait_for_idle(i915,
>> >                                            I915_WAIT_LOCKED |
>> >                                            I915_WAIT_INTERRUPTIBLE,
>> > -                                          MAX_SCHEDULE_TIMEOUT);
>> > +                                          1);
>> >               if (ret)
>> >                       goto err_unreserve;
>> >
>> >
>>
>> What is the remaining value of even trying to wait for idle, instead of
>> maybe just i915_request_retire and sleep for a jiffie? The intention
>> would potentially read clearer since it is questionable there is any
>> relationship with idle and rate limiting clients. In fact, now that I
>> think of it, waiting for idle is a nice way to starve an unlucky client
>> forever.
>
> Better to starve the unlucky client, than to allow the entire system to
> grind to a halt.
>
> One caveat to using RCU is that it is our responsibility to apply
> backpressure as none is applied by the vm.

So instead of 1 jiffie, should we wait for 1 rcu grace period? My
understanding is that under very heavy load these can be extended
(since batching is more effective for throughput, if you don't run out
of memory). Just a random comment from the sidelines really :-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch