[Intel-gfx] [PATCH v3 11/14] HACK drm/i915/scheduler: emulate a scheduler for guc

Thu Dec 1 13:01:06 UTC 2016

On Thu, Dec 01, 2016 at 12:45:18PM +0000, Tvrtko Ursulin wrote:
> 
> On 01/12/2016 11:18, Chris Wilson wrote:
> >On Thu, Dec 01, 2016 at 10:45:51AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 14/11/2016 08:57, Chris Wilson wrote:
> >>>+static bool i915_guc_dequeue(struct intel_engine_cs *engine)
> >>>+{
> >>>+	struct execlist_port *port = engine->execlist_port;
> >>>+	struct drm_i915_gem_request *last = port[0].request;
> >>>+	unsigned long flags;
> >>>+	struct rb_node *rb;
> >>>+	bool submit = false;
> >>>+
> >>>+	spin_lock_irqsave(&engine->timeline->lock, flags);
> >>>+	rb = engine->execlist_first;
> >>>+	while (rb) {
> >>>+		struct drm_i915_gem_request *cursor =
> >>>+			rb_entry(rb, typeof(*cursor), priotree.node);
> >>>+
> >>>+		if (last && cursor->ctx != last->ctx) {
> >>
> >>Not sure if GVT comes into the picture here, but it does not sounds
> >>like it would harm to use can_merge_ctx here?
> >
> >I wasn't sure what path GVT would take either. So just went with the
> >simple version that looked as similar to the current guc submission as
> >possible. Also offloading the scheduling to the guc via semaphores will
> >likely make this whole chain look completely different.
> 
> Hmm I am not up to speed with that. So you are saying it doesn't
> make sense to unify this?

Just not sure yet. Too much duplication, too much engineering are both
traps we may make for ourselves.

> >>>+			if (port != engine->execlist_port)
> >>>+				break;
> >>
> >>It may be an overkill for the first version, but I was thinking that
> >>we don't have to limit it to two at a time. And it would depend on
> >>measuring of course. But perhaps it would make sense to do the
> >>generalisation of the number of supported ports straight away.
> >
> >Definitely. I was just looking at a minimal conversion, hence reusing
> >the existing tracking, and limits.
> 
> Definitely leave it for later, or definitely it makes sense to
> generalise right now? I was just thinking that when someone goes to
> test this and finds the throughput regresses, that it might be
> easier to just say please try i915.guc_submit_ports=8 or something.

It was "definitely not worth it in this patch and definitely makes sense
to investigate". Very rapid diminishing returns, it comes down to how
many requests will complete in the service time of the first irq. You'll
be looking at the no-op switching workloads that stress the driver,
rather than the actual workloads that stress the system. The cheapest
typical ping-pong is GL client -> display server -> GL client, though
OpenCL may beat that, but for that GL sequence, 3 ports would easily
cover us. [1 active, 2 pending slots really.]
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre