[Intel-gfx] [PATCH 2/2] drm/i915/guc: default to using GuC submission where possible

Mon Apr 25 08:29:42 UTC 2016

On Mon, Apr 25, 2016 at 08:31:07AM +0100, Dave Gordon wrote:
> On 22/04/16 19:51, Chris Wilson wrote:
> >On Fri, Apr 22, 2016 at 07:45:15PM +0100, Chris Wilson wrote:
> >>On Fri, Apr 22, 2016 at 07:22:55PM +0100, Dave Gordon wrote:
> >>>This patch simply changes the default value of "enable_guc_submission"
> >>>from 0 (never) to -1 (auto). This means that GuC submission will be
> >>>used if the platform has a GuC, the GuC supports the request submission
> >>>protocol, and any required GuC firmwware was successfully loaded. If any
> >>>of these conditions are not met, the driver will fall back to using
> >>>execlist mode.
> >
> >I just remembered something else.
> >
> >  * Work Items:
> >  * There are several types of work items that the host may place into a
> >  * workqueue, each with its own requirements and limitations. Currently only
> >  * WQ_TYPE_INORDER is needed to support legacy submission via GuC, which
> >  * represents in-order queue. The kernel driver packs ring tail pointer and an
> >  * ELSP context descriptor dword into Work Item.
> >
> >Is this right? You only allocate a single client covering all engines and
> >specify INORDER. We expect parallel execution between engines, is this
> >supported? Empirically it seems like guc is only executing commands in
> >series across engines and not in parallel.
> >-Chris
> 
> AFAIK, INORDER represents in-order executions of elements in the
> GuC's (internal) submission queue, which is per-engine; i.e. this
> option bypasses the GuC's internal scheduling algorithms and makes
> the GuC behave as a simple dispatcher. It demultiplexes work queue
> items into the multiple submission queues, then executes them in
> order from there.
> 
> Alex can probably confirm this in the GuC code, but I really think
> we'd have noticed if execution were serialised across engines. For a
> start, the validation tests that have one engine busy-spin while
> waiting for a batch on a different engine to update a buffer
> wouldn't ever finish.

That doesn't seem to be the issue, we can run in parallel it seems
(busy-spin on one engine doesn't prevent a write on the second). It's 
just the latency it seems. Overall the execution latency goes up
substantially with guc, and in this case it does not seem to be executing
the second execbuf on the second ring until after the first completes.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre