[Intel-gfx] [PATCH] drm/i915: Make wa_tail_dwords flexible for future platforms.

Dave Gordon david.s.gordon at intel.com
Wed Jan 27 04:27:16 PST 2016


On 26/01/16 14:06, Chris Wilson wrote:
> On Tue, Jan 26, 2016 at 01:51:19PM +0000, Rodrigo Vivi wrote:
>>     On Tue, Jan 26, 2016 at 12:30 AM Chris Wilson
>>     <[1]chris at chris-wilson.co.uk> wrote:
>>
>>       On Mon, Jan 25, 2016 at 09:17:15PM +0000, Chris Wilson wrote:
>>       > On Mon, Jan 25, 2016 at 11:29:19AM -0800, Rodrigo Vivi wrote:
>>       > > +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>       > > @@ -764,18 +764,18 @@ intel_logical_ring_advance_and_submit(struct
>>       drm_i915_gem_request *request)
>>       > >  {
>>       > >     struct intel_ringbuffer *ringbuf = request->ringbuf;
>>       > >     struct drm_i915_private *dev_priv = request->i915;
>>       > > +   int i;
>>       > >
>>       > >     intel_logical_ring_advance(ringbuf);
>>       > >     request->tail = ringbuf->tail;
>>       > >
>>       > >     /*
>>       > > -    * Here we add two extra NOOPs as padding to avoid
>>       > > +    * Here we add extra NOOPs as padding to avoid
>>       > >      * lite restore of a context with HEAD==TAIL.
>>       > > -    *
>>       > > -    * Caller must reserve WA_TAIL_DWORDS for us!
>>       > >      */
>>       > > -   intel_logical_ring_emit(ringbuf, MI_NOOP);
>>       > > -   intel_logical_ring_emit(ringbuf, MI_NOOP);
>>       > > +   for (i = 0; i < ringbuf->wa_tail_dwords; i++)
>>       > > +           intel_logical_ring_emit(ringbuf, MI_NOOP);
>>       > > +
>>       > >     intel_logical_ring_advance(ringbuf);
>>       > >
>>       > >     if (intel_ring_stopped(request->ring))
>>       > > @@ -876,6 +876,16 @@ int intel_logical_ring_begin(struct
>>       drm_i915_gem_request *req, int num_dwords)
>>       > >     if (ret)
>>       > >             return ret;
>>       > >
>>       > > +   if (IS_GEN8(req->ring->dev) || IS_GEN9(req->ring->dev))
>>       >
>>       > req->i915
>>       >
>>       > This is attrocious. Just allocate the extra space when required.
>>
>>     by this logic I should just emit the mi_noops when required as well,
>>     right?
>
> Yes, I didn't like the placement of the wa_tail but I went with that to
> avoid the code duplication.
>
>>       Slightly less grumpy this morning.
>>
>>     thanks
>>
>>       1. This is duplicating the reserved-space mechanism, by open-coding the
>>       requirements for execlists. Fine-tuning the reserved space per ring may
>>       be worth it, but probably not. Over reserving space is not a hung issue
>>       (it just effectively reduces the size of the ring), and the granularity
>>       is the size of the average request.
>>
>>     forgive this clueless mind here, but I don't see how I'm duplicating the
>>     reserved-space...
>
> You are extending every begin by the overallocation required to emit
> the tail dwords. We already extend every begin by the overallocation
> required to emit the request (until we come to emit the request, where
> there is no more overallocation applied).
>
>>       2. You are hiding how much space is actually used during request
>>       emission. This makes review impossible, and we depend upon review to
>>       verify that the intel_ring_begin() matches the number of dwords emitted.
>>
>>     but the mi_noops are hidden on the submit and advance... shouldn't we move
>>     it back to the places that allocates it.
>
> Hence why I stressed that in the comments - but it is a tail call, just
> read it as one function. The important sequence is that
>
> intel_ring_begin(count)
> ...
> count x intel_ring_emit
> ...
> intel_ring_advance()
>
> is clear to the reader. Yes, this breaks that rule by replacing
> intel_ring_advance() with a custom lr_ring_advance_and_submit() and
> perhaps it would be clearer to add lr_ring_begin_for_submit() or
> something to stress the slight discrepancy, but still make the pairing
> clear.
>
>>       3. Is this even the right mechanism considering the number of other ways
>>       of automatically emitting instructions between batches and contexts? We
>>       cannot answer that as this patch is out of context.
>>
>>     yeap, sorry again, I was just going to the easiest path to be able to
>>     avoid the nulls per platform without adding 3 ifs..
>>     But I wonder if you mean on comment "1." that we can live with
>>     WA_TAIL_DWORDS 2 and avoid only the NULLs when needed... Is this the case?
>
> If you want more dwords in the add_request callback, we need to add
> those to the MIN_SPACE_FOR_ADD_REQUEST. If we need to add a lot, then
> making it variable seems fine - but it should just hook into the common
> mechanism i.e. the minimum space should be computed during engine
> initialisation and the reservation applied at i915_gem_eequest_alloc().
> -Chris

I think the cleanest partitioning of the functionality would be:
     1. The space for the NOOPs should be accounted for in the reserved
        space, because it's just part of the total space required to
        complete an add_request/emit_request(). Since the amount
        reserved is determined in intel_{logical_}ring_reserve_space()
        it could be added only in the LRC path, if we were concerned
        about the extra space (which I don't think we should be).

     2. callers do begin(N), N*emit(), advance(), add_request(). They
        don't bother about extra NOOPs.

     3. gen8_emit_request() shouldn't have to bother with them either, or
        even with claiming the space for them.

     4. advance_and_submit() (which is execlist specific) can do an extra
        begin() just to keep begin/advance balanced -- it can't fail or
        wait, 'cos it's in the reserved space -- and emits the extra
        NOOPs. This is where it can be made conditional on specific GENs,
        if you want that to be explicit, though since the overhead is so
        small I'd be inclined to always enable it here, and only check
        whether to actually apply the TAIL-bump in the ELSP-poking code.

In summary: mostly as Chris had it, but without the extra space being 
added to the begin() call in gen8_emit_request() (as Rodrigo has it).

.Dave.


More information about the Intel-gfx mailing list