[Intel-gfx] [PATCH] drm/i915: Mitigate retirement starvation a bit

Chris Wilson chris at chris-wilson.co.uk
Thu Feb 4 13:37:56 UTC 2016


On Thu, Feb 04, 2016 at 01:30:30PM +0000, Tvrtko Ursulin wrote:
> 
> 
> On 04/02/16 12:40, Chris Wilson wrote:
> >On Thu, Feb 04, 2016 at 12:25:24PM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>
> >>In execlists mode internal house keeping of the discarded
> >>requests (and so contexts and VMAs) relies solely on the retire
> >>worker, which can be prevented from running by just being
> >>unlucky when busy clients are hammering on the big lock.
> >>
> >>Prime example is the gem_close_race IGT, which due to this
> >>effect causes internal lists to grow to epic proportions, with
> >>a consequece of object VMA traversal to growing exponentially
> >>and resulting in tens of minutes test runtime. Memory use is
> >>also very high and a limiting factor on some platforms.
> >>
> >>Since we do not want to run this internal house keeping more
> >>frequently, due concerns that it may affect performance, and
> >>the scenario being statistically not very likely in real
> >>workloads, one possible workaround is to run it when new
> >>client handles are opened.
> >>
> >>This will solve the issues with this particular test case,
> >>making it complete in tens of seconds instead of tens of
> >>minutes, and will not add any run-time penalty to running
> >>clients.
> >>
> >>It can only slightly slow down new client startup, but on a
> >>realisticaly loaded system we are expecting this to be not
> >>significant. Even with heavy rendering in progress we can have
> >>perhaps up to several thousands of requests pending retirement,
> >>which, with a typical retirement cost of 80ns to 1us per
> >>request, is not significant.
> >>
> >>Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >>Testcase: igt/gem_close_race/gem-close-race
> >>Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >
> >Still doesn't fix actual workloads where this is demonstrably bad, which
> >can be demonstrated with a single fd.
> 
> Which are those?

OglDrvCtx and clones.

> >The most effective treatment I found is moving the retire-requests from
> >execbuf (which exists for similar reasons) to get-pages.
> >
> >http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=breadcrumbs&id=75f4e53f1c9141ba2dd8847396a1bcc8dbeecd55
> 
> I struggle to understand how it is OK to stall get pages or even the
> object close when you objected to those in the past?

Benchmarks. Taking a hit here avoids situations that end up invoking the
shrinker.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list