[Intel-gfx] [PATCH 13/15] drm/i915: Integrate GuC-based command submission

Dave Gordon david.s.gordon at intel.com
Fri Jun 19 11:18:22 PDT 2015


On 16/06/15 10:22, Chris Wilson wrote:
> On Mon, Jun 15, 2015 at 07:36:31PM +0100, Dave Gordon wrote:
>> From: Alex Dai <yu.dai at intel.com>
>>
>> GuC-based submission is mostly the same as execlist mode, up to
>> intel_logical_ring_advance_and_submit(), where the context being
>> dispatched would be added to the execlist queue; at this point
>> we submit the context to the GuC backend instead.
>>
>> There are, however, a few other changes also required, notably:
>> 1.  Contexts must be pinned at GGTT addresses accessible by the GuC
>>     i.e. NOT in the range [0..WOPCM_SIZE), so we have to add the
>>     PIN_OFFSET_BIAS flag to the relevant GGTT-pinning calls.
>>
>> 2.  The GuC's TLB must be invalidated after a context is pinned at
>>     a new GGTT address.
>>
>> 3.  GuC firmware uses the one page before Ring Context as shared data.
>>     Therefore, whenever driver wants to get base address of LRC, we
>>     will offset one page for it. LRC_PPHWSP_PN is defined as the page
>>     number of LRCA.
>>
>> 4.  In the work queue used to pass requests to the GuC, the GuC
>>     firmware requires the ring-tail-offset to be represented as an
>>     11-bit value, expressed in QWords. Therefore, the ringbuffer
>>     size must be reduced to the representable range (4 pages).
> 
> I don't like how this sabotages the existing execlists implementation
> in order for i915_guc_submission (an interesting choice of file name,
> since we go i915_gem_execbuffer (API) -> intel_execlists (HW) ->
> i915_guc_submission (HW), not fitting into our, admittedly loose, naming
> convention very well) to share a few functions. Even a couple of which
> are already vfunc.
> -Chris

Not really "sabotages"; big ringbuffers are a waste of space anyway.
Four pages is enough to have at least 64 batchbuffers queued up for the
engine to process (1 second of video, or 0.00012 of a bitcoin). When the
scheduler lands, it will generally reduce the number of batches in the
h/w rings anyway, mostly to improve responsiveness and fair-sharing
among different applications.

I quite agree that the execlists implementation, which is mostly in a
file called intel_lrc.c, should really be split into LRC-related code
(common to execlists and GuC modes) with the rest moved into
intel_execlist_submission.c. Or is that i915_execlist_submission.c?

If we took contexts as primary, rather than "rings" (engines) we could
also share a lot more between GuC/execlists and legacy ringbuffer mode.
At least we should get some improvement with AntiOLR, so we'll have

    request->context<->ringbuffer->engine->submission mechanism :)

.Dave.



More information about the Intel-gfx mailing list