[Intel-gfx] [PATCH] drm/i915: Mitigate retirement starvation a bit

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Thu Feb 4 13:30:30 UTC 2016

On 04/02/16 12:40, Chris Wilson wrote:
> On Thu, Feb 04, 2016 at 12:25:24PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> In execlists mode internal house keeping of the discarded
>> requests (and so contexts and VMAs) relies solely on the retire
>> worker, which can be prevented from running by just being
>> unlucky when busy clients are hammering on the big lock.
>> Prime example is the gem_close_race IGT, which due to this
>> effect causes internal lists to grow to epic proportions, with
>> a consequece of object VMA traversal to growing exponentially
>> and resulting in tens of minutes test runtime. Memory use is
>> also very high and a limiting factor on some platforms.
>> Since we do not want to run this internal house keeping more
>> frequently, due concerns that it may affect performance, and
>> the scenario being statistically not very likely in real
>> workloads, one possible workaround is to run it when new
>> client handles are opened.
>> This will solve the issues with this particular test case,
>> making it complete in tens of seconds instead of tens of
>> minutes, and will not add any run-time penalty to running
>> clients.
>> It can only slightly slow down new client startup, but on a
>> realisticaly loaded system we are expecting this to be not
>> significant. Even with heavy rendering in progress we can have
>> perhaps up to several thousands of requests pending retirement,
>> which, with a typical retirement cost of 80ns to 1us per
>> request, is not significant.
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Testcase: igt/gem_close_race/gem-close-race
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> Still doesn't fix actual workloads where this is demonstrably bad, which
> can be demonstrated with a single fd.

Which are those?

> The most effective treatment I found is moving the retire-requests from
> execbuf (which exists for similar reasons) to get-pages.
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=75f4e53f1c9141ba2dd8847396a1bcc8dbeecd55

I struggle to understand how it is OK to stall get pages or even the 
object close when you objected to those in the past?



More information about the Intel-gfx mailing list