[Intel-gfx] [PATCH 00/50] Execlists v2

Thu May 15 16:15:44 CEST 2014

> I've done a very cursory read of this, and my original comment from my
> original high-level review on the internal list still stands: I'm freaked out by how
> invasive this is into the existing ring code. All the changes in i915_dma.c look
> very suspicious, since that code is for the legacy ums crap and will _never_ run
> on bdw. Nor even on anything more modern than g4x platforms (gen4).
> 
> Apparently that review has been lost/ignored, so I'll quote it in full:

Back in March 2013 I wasn´t involved on Execlists (or, FWIW, i915) at all so, yes, I´m afraid it got lost. Sorry :(

> "In reading through your patches the big thing which jumped out is how the
> new execlist code is intermingled with the old legacy gen8 framebuffer stuff.
> Imo those two modes don't match at all, and we need to resolve this mismatch
> with another abstraction layer ;-)

IMHO, I have minimized quite a lot the number of conflicts (very few "if execlists enabled" scattered over the legacy code remeining). Rather than "being intermingled", I would say it now "smoothly rides" on the legacy stuff.
If you give it a look with fresh eyes, maybe you will like it?

> "I'm thinking of dev_priv->gt.do_execbuf which takes care of all the lower-level
> request tracking and command submission. Feel free to massively bikeshed the
> name. I'm thinking that we should move everything from
> i915_gem_execbuffer_move_to_gpu to
> i915_gem_execbuffer_retire_commands into that callback. With the current
> code that'd include the active list tracking, maybe we should move that part out
> again. Otoh if we go wild with scheduling and preemption, active bo tracking
> _will_ be rather different from previous platforms. To support execlist we
> might need some more vfuncs in the ringbuffer struct to support execlist
> specific stuff (submit execlist, enable context switch interrupts), but a lot of the
> existing stuff will be redudant.
> At the end (once things settles) we should then document which kind of
> do_execbuf uses which kinds of low-level ring interfaces.

I´m afraid I would still need all the currently existing low-level ring interfaces, plus a bunch of new ones :(
(do_execbuf is not the only user of low-level ring interfaces, see below).

> "With that abstraction:
> - We can separate gen8 execlist from legacy gen8 code, and so should avoid
> regressions (and so blocking mesa).
> - Play around with different execlist approaches (guc, deferred execution,
> whatever, ...) since it'll just be differen copy&pasta.

Ok, yes, I understand what you say. Playing around with other submission approaches will become harder and harder if we reuse too much legacy code. But can´t we do it as a cleanup on top of this code?

> "Finally I think our immediate focus for execlist enabling should be to get multi-
> context execlists going, so that we can validate whether that'll work together
> with mesa/hw contexts. If it doesn't, not much point in bothering.
> The simplest way is to just block in the ->do_execbuf callback if we can't submit
> the new context right away. It'll suck a bit perf-wise, but will get the hw going.

I think this is proved by now (QED!).

> So essentially what I'd prefer is we keep all the existing ringbuffer code as-is,
> and throw in a complete new set (with fairly new datastructures) in for
> execlists. Then only interaction points would be:
> - Ring init either calls into legacy ring init or new fancy execlist ring
>   init.
> - Execbuf calls ring->do_submit with ring/engine, ctx object, batch state
>   and otherwise doesn't care one bit how it will all get submitted.

That´s easy for execbufer, but what about the code that put things directly into the ringbuffer? I refer to constructions like:

intel_ring_begin()
intel_ring_emit()
...
intel_ring_emit()
intel_ring_advance()

And also others like direct calls to i915_add_request outside the execbuffer path.

> - Context state needs to be frobbed a bit so that we create the correct
>   backing object (i.e. legacy hw state or execlist ring+ctx). To make this
>   feasible it's probably best to switch the implicit per-fd ctx to be
>   per-ring. That way we still have the fixed hw-contxt->ring/engine
>   relationship and don't need to play tricks with lazy context allocation
>   (because those beats are so big with execlists).

Sorry, I don´t get this: no more implicit per-fd ctx? so everybody uses the Aliasing PPGTT by default again?