[Intel-gfx] [PATCH 15/27] drm/i915: Split execlist priority queue into rbtree + linked list

Mon Apr 24 12:18:04 UTC 2017

On Mon, Apr 24, 2017 at 12:07:47PM +0100, Chris Wilson wrote:
> On Mon, Apr 24, 2017 at 11:28:32AM +0100, Tvrtko Ursulin wrote:
> > 
> > On 19/04/2017 10:41, Chris Wilson wrote:
> > Sounds attractive! What workloads show the benefit and how much?
> 
> The default will show the best, since everything is priority 0 more or
> less and so we reduce the rbtree search to a single lookup and list_add.
> It's hard to measure the impact of the rbtree though. On the dequeue
> side, the mmio access dominates. On the schedule side, if we have lots
> of requests, the dfs dominates.
> 
> I have an idea on how we might stress the rbtree in submit_request - but
> still it requires long queues untypical of most workloads. Still tbd.

I have something that does show a difference in that path (which is
potentially in hardirq). Overal time is completely dominated by the
reservation_object (ofc, we'll get back around to its scalability
patches at some point). For a few thousand prio=0 requests inflight, the
difference in execlists_submit_request() is about 6x, and for
intel_lrc_irq_hander() is about 2x (just a factor that I sent a lot of
coalesceable requests and so the reduction of rb_next to list_next).

Completely synthetic testing, I would be worried if the rbtree was that
tall in practice (request generation >> execution). The neat part of the
split, I think is that make the resubmission of a gazzumped request
easier - instead of writing a parallel rbtree sort, we just put the old
request at the head of the plist.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre