[Intel-gfx] [PATCH v5] drm/i915/icl: Enhanced execution list support

Mon Jan 22 21:38:01 UTC 2018

On 22/01/18 09:32, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2018-01-22 16:09:28)
>>
>>
>> On 22/01/18 07:13, Chris Wilson wrote:
>>> Quoting Mika Kuoppala (2018-01-22 15:08:16)
>>>> Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com> writes:
>>>>
>>>>> On 19/01/18 05:05, Mika Kuoppala wrote:
>>>>>> Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com> writes:
>>>>>>
>>>>>>> From: Thomas Daniel <thomas.daniel at intel.com>
>>>>>>>
>>>>>>> Enhanced Execlists is an upgraded version of execlists which supports
>>>>>>> up to 8 ports. The lrcs to be submitted are written to a submit queue,
>>>>>>> which is then loaded on the HW. When writing to the ELSP register, the
>>>>>>> lrcs are written cyclically in the queue from position 0 to position 7.
>>>>>>> Alternatively, it is possible to write directly in the individual
>>>>>>> positions of the queue using the ELSQ registers. To be able to re-use
>>>>>>> all the existing code we're using the latter method and we're currently
>>>>>>> limiting ourself to only using 2 elements.
>>>>>>>
>>>>>>> The preemption flow is sligthly different with enhanced execlists, so
>>>>>>> this patch turns preemption off temporarily for Gen11+ while we wait for
>>>>>>> the new mechanism to land.
>>>>>>>
>>>>>>> v2: Rebase.
>>>>>>> v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio).
>>>>>>> v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio)
>>>>>>> v5: Reword commit, rename regs to be closer to specs, turn off
>>>>>>>        preemption (Daniele), reuse engine->execlists.elsp (Chris)
>>>>>>>
>>>>>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>>>>> Signed-off-by: Thomas Daniel <thomas.daniel at intel.com>
>>>>>>> Signed-off-by: Rodrigo Vivi <rodrigo.vivi at intel.com>
>>>>>>> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
>>>>>>
>>>>>> Was going to adopt this patch from Rodrigo but you were faster.
>>>>>>
>>>>>> I choose to stash the elsq and use it as a gen11 vs rest toggle:
>>>>>>
>>>>>> Relevant bits:
>>>>>>
>>>>>> +static inline void write_port(struct intel_engine_execlists * const execlists,
>>>>>> +                             unsigned int n,
>>>>>> +                             u64 desc)
>>>>>> +{
>>>>>> +       if (execlists->elsq)
>>>>>> +               gen11_elsq_write(desc, n, execlists->elsq);
>>>>>> +       else
>>>>>> +               gen8_elsp_write(desc, execlists->elsp);
>>>>>> +}
>>>>>> +
>>>>>> +static inline void submit_ports(struct intel_engine_execlists * const execlists)
>>>>>> +{
>>>>>> +       /* for gen11+ we need to manually load the submit queue */
>>>>>> +       if (execlists->elsq) {
>>>>>> +               struct intel_engine_cs *engine =
>>>>>> +                       container_of(execlists,
>>>>>> +                                    struct intel_engine_cs,
>>>>>> +                                    execlists);
>>>>>> +               struct drm_i915_private *dev_priv = engine->i915;
>>>>>> +
>>>>>> +               I915_WRITE_FW(RING_ELCR(engine), ELCR_LOAD);
>>>>>> +       }
>>>>>> +}
>>>>>> +
>>>>>>
>>>>>
>>>>> I was undecided about hiding the code in sub-functions because of the
>>>>> pre-emption path. There is no need in gen11 to inject a context to
>>>>> preempt to idle,
>>>
>>> Really? The preempt-to-idle is so that we can sync the bookkeeping with
>>> the pending CS interrupts. The HW doesn't require it currently either,
>>> it's the SW that does. If you have a way to avoid that, that should be
>>> applicable to the current code as well?
>>> -Chris
>>>
>>
>> We can't avoid preempt-to-idle, we can do it in a simpler way. There is
>> a bit in RING_EXECLIST_CONTROL that triggers a preemp-to-idle, without
>> the need to do a ctx injection. We'll need to move preemption completion
>> detection to the CSB value instead of the HWSP write, not sure about the
>> impact of that on our bookkeeping.
> 
> Ah, shucks :( I was hoping there's a simple way to avoid idling.
> 
> I have to ask, is it worth it? If we still have to do a CS interrupt
> round trip, what's the difference? Hmm. I wonder if assume that the
> preemption is nearly instantaneous (say 10us), we could short-circuit
> the interrupt and poll. Falling back to interrupt if longer, and/or
> resetting the GPU to guarantee latencies.
> -Chris
> 

The SW cost shouldn't be too bad, so having 1 less ctx switch should 
give us some some benefits. Trying a poll with a fall-back sounds nice, 
but we'll probably have to wait until we have timing info to see if the 
polling makes sense at all.

Daniele