[Intel-gfx] [PATCH v5] drm/i915/icl: Enhanced execution list support

Mon Jan 22 17:32:57 UTC 2018

Quoting Daniele Ceraolo Spurio (2018-01-22 16:09:28)
> 
> 
> On 22/01/18 07:13, Chris Wilson wrote:
> > Quoting Mika Kuoppala (2018-01-22 15:08:16)
> >> Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com> writes:
> >>
> >>> On 19/01/18 05:05, Mika Kuoppala wrote:
> >>>> Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com> writes:
> >>>>
> >>>>> From: Thomas Daniel <thomas.daniel at intel.com>
> >>>>>
> >>>>> Enhanced Execlists is an upgraded version of execlists which supports
> >>>>> up to 8 ports. The lrcs to be submitted are written to a submit queue,
> >>>>> which is then loaded on the HW. When writing to the ELSP register, the
> >>>>> lrcs are written cyclically in the queue from position 0 to position 7.
> >>>>> Alternatively, it is possible to write directly in the individual
> >>>>> positions of the queue using the ELSQ registers. To be able to re-use
> >>>>> all the existing code we're using the latter method and we're currently
> >>>>> limiting ourself to only using 2 elements.
> >>>>>
> >>>>> The preemption flow is sligthly different with enhanced execlists, so
> >>>>> this patch turns preemption off temporarily for Gen11+ while we wait for
> >>>>> the new mechanism to land.
> >>>>>
> >>>>> v2: Rebase.
> >>>>> v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio).
> >>>>> v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio)
> >>>>> v5: Reword commit, rename regs to be closer to specs, turn off
> >>>>>       preemption (Daniele), reuse engine->execlists.elsp (Chris)
> >>>>>
> >>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
> >>>>> Signed-off-by: Thomas Daniel <thomas.daniel at intel.com>
> >>>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
> >>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
> >>>>
> >>>> Was going to adopt this patch from Rodrigo but you were faster.
> >>>>
> >>>> I choose to stash the elsq and use it as a gen11 vs rest toggle:
> >>>>
> >>>> Relevant bits:
> >>>>
> >>>> +static inline void write_port(struct intel_engine_execlists * const execlists,
> >>>> +                             unsigned int n,
> >>>> +                             u64 desc)
> >>>> +{
> >>>> +       if (execlists->elsq)
> >>>> +               gen11_elsq_write(desc, n, execlists->elsq);
> >>>> +       else
> >>>> +               gen8_elsp_write(desc, execlists->elsp);
> >>>> +}
> >>>> +
> >>>> +static inline void submit_ports(struct intel_engine_execlists * const execlists)
> >>>> +{
> >>>> +       /* for gen11+ we need to manually load the submit queue */
> >>>> +       if (execlists->elsq) {
> >>>> +               struct intel_engine_cs *engine =
> >>>> +                       container_of(execlists,
> >>>> +                                    struct intel_engine_cs,
> >>>> +                                    execlists);
> >>>> +               struct drm_i915_private *dev_priv = engine->i915;
> >>>> +
> >>>> +               I915_WRITE_FW(RING_ELCR(engine), ELCR_LOAD);
> >>>> +       }
> >>>> +}
> >>>> +
> >>>>
> >>>
> >>> I was undecided about hiding the code in sub-functions because of the
> >>> pre-emption path. There is no need in gen11 to inject a context to
> >>> preempt to idle,
> > 
> > Really? The preempt-to-idle is so that we can sync the bookkeeping with
> > the pending CS interrupts. The HW doesn't require it currently either,
> > it's the SW that does. If you have a way to avoid that, that should be
> > applicable to the current code as well?
> > -Chris
> > 
> 
> We can't avoid preempt-to-idle, we can do it in a simpler way. There is 
> a bit in RING_EXECLIST_CONTROL that triggers a preemp-to-idle, without 
> the need to do a ctx injection. We'll need to move preemption completion 
> detection to the CSB value instead of the HWSP write, not sure about the 
> impact of that on our bookkeeping.

Ah, shucks :( I was hoping there's a simple way to avoid idling.

I have to ask, is it worth it? If we still have to do a CS interrupt
round trip, what's the difference? Hmm. I wonder if assume that the
preemption is nearly instantaneous (say 10us), we could short-circuit
the interrupt and poll. Falling back to interrupt if longer, and/or
resetting the GPU to guarantee latencies.
-Chris