[Intel-gfx] [PATCH 15/27] drm/i915: Split execlist priority queue into rbtree + linked list
Chris Wilson
chris at chris-wilson.co.uk
Mon Apr 24 12:18:04 UTC 2017
On Mon, Apr 24, 2017 at 12:07:47PM +0100, Chris Wilson wrote:
> On Mon, Apr 24, 2017 at 11:28:32AM +0100, Tvrtko Ursulin wrote:
> >
> > On 19/04/2017 10:41, Chris Wilson wrote:
> > Sounds attractive! What workloads show the benefit and how much?
>
> The default will show the best, since everything is priority 0 more or
> less and so we reduce the rbtree search to a single lookup and list_add.
> It's hard to measure the impact of the rbtree though. On the dequeue
> side, the mmio access dominates. On the schedule side, if we have lots
> of requests, the dfs dominates.
>
> I have an idea on how we might stress the rbtree in submit_request - but
> still it requires long queues untypical of most workloads. Still tbd.
I have something that does show a difference in that path (which is
potentially in hardirq). Overal time is completely dominated by the
reservation_object (ofc, we'll get back around to its scalability
patches at some point). For a few thousand prio=0 requests inflight, the
difference in execlists_submit_request() is about 6x, and for
intel_lrc_irq_hander() is about 2x (just a factor that I sent a lot of
coalesceable requests and so the reduction of rb_next to list_next).
Completely synthetic testing, I would be worried if the rbtree was that
tall in practice (request generation >> execution). The neat part of the
split, I think is that make the resubmission of a gazzumped request
easier - instead of writing a parallel rbtree sort, we just put the old
request at the head of the plist.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
More information about the Intel-gfx
mailing list