[Intel-gfx] [PATCH] drm/i915: Limit the backpressure for i915_request allocation

Wed Sep 12 13:53:43 UTC 2018

On 12/09/2018 14:38, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-09-12 14:34:16)
>>
>> On 12/09/2018 12:11, Chris Wilson wrote:
>>> If we try and fail to allocate a i915_request, we apply some
>>> backpressure on the clients to throttle the memory allocations coming
>>> from i915.ko. Currently, we wait until completely idle, but this is far
>>> too heavy and leads to some situations where the only escape is to
>>> declare a client hung and reset the GPU. The intent is to only ratelimit
>>> the allocation requests, so we need only wait for a jiffie before using
>>> the normal direct reclaim.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 09ed48833b54..588bc5a4d18b 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -736,7 +736,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>>>                ret = i915_gem_wait_for_idle(i915,
>>>                                             I915_WAIT_LOCKED |
>>>                                             I915_WAIT_INTERRUPTIBLE,
>>> -                                          MAX_SCHEDULE_TIMEOUT);
>>> +                                          1);
>>>                if (ret)
>>>                        goto err_unreserve;
>>>    
>>>
>>
>> What is the remaining value of even trying to wait for idle, instead of
>> maybe just i915_request_retire and sleep for a jiffie?
> 
> There's no point in the wait if actually idle? We just want the retire
> and kicking of slabs.

Why there is no point? If allocation failed and we want to apply 
backpressure, lets apply it.

Why would it matter which client first spotted GPU is idle and skipped 
the sleep? What would happen if we added a random back-off in this case 
between some bounds, like Ethernet frames if my memory serves me well.

Don't get me wrong, in practice it is minor point, I am purely thinking 
that the message presence of i915_gem_wait_for_idle is maybe not clear 
or is misleading.

Regards,

Tvrtko

>> The intention
>> would potentially read clearer since it is questionable there is any
>> relationship with idle and rate limiting clients. In fact, now that I
>> think of it, waiting for idle is a nice way to starve an unlucky client
>> forever.
> 
> See i915_gem_shrink. It's a seriously moot point. You can define any
> pathology you'd like, someone has to lose.
> -Chris
>