[Intel-gfx] [PATCH 00/50] Execlists v2

Thu May 15 23:09:52 CEST 2014

On Thu, May 15, 2014 at 02:15:44PM +0000, Mateo Lozano, Oscar wrote:
> > I've done a very cursory read of this, and my original comment from my
> > original high-level review on the internal list still stands: I'm freaked out by how
> > invasive this is into the existing ring code. All the changes in i915_dma.c look
> > very suspicious, since that code is for the legacy ums crap and will _never_ run
> > on bdw. Nor even on anything more modern than g4x platforms (gen4).
> > 
> > Apparently that review has been lost/ignored, so I'll quote it in full:
> 
> Back in March 2013 I wasn´t involved on Execlists (or, FWIW, i915) at
> all so, yes, I´m afraid it got lost. Sorry :(

Yeah that seems to have been lost int the too many handovers unfortunately
:(

> > "In reading through your patches the big thing which jumped out is how the
> > new execlist code is intermingled with the old legacy gen8 framebuffer stuff.
> > Imo those two modes don't match at all, and we need to resolve this mismatch
> > with another abstraction layer ;-)
> 
> IMHO, I have minimized quite a lot the number of conflicts (very few "if
> execlists enabled" scattered over the legacy code remeining). Rather
> than "being intermingled", I would say it now "smoothly rides" on the
> legacy stuff.  If you give it a look with fresh eyes, maybe you will
> like it?

Can do again ...

> > "I'm thinking of dev_priv->gt.do_execbuf which takes care of all the lower-level
> > request tracking and command submission. Feel free to massively bikeshed the
> > name. I'm thinking that we should move everything from
> > i915_gem_execbuffer_move_to_gpu to
> > i915_gem_execbuffer_retire_commands into that callback. With the current
> > code that'd include the active list tracking, maybe we should move that part out
> > again. Otoh if we go wild with scheduling and preemption, active bo tracking
> > _will_ be rather different from previous platforms. To support execlist we
> > might need some more vfuncs in the ringbuffer struct to support execlist
> > specific stuff (submit execlist, enable context switch interrupts), but a lot of the
> > existing stuff will be redudant.
> > At the end (once things settles) we should then document which kind of
> > do_execbuf uses which kinds of low-level ring interfaces.
> 
> I´m afraid I would still need all the currently existing low-level ring interfaces, plus a bunch of new ones :(
> (do_execbuf is not the only user of low-level ring interfaces, see below).

Hm I guess if you see value in it, we could keep the intel_ring_begin
interfaces. But I wouldn't mind just a bit of duplicating really - most of
the ring vfuncs seem unecessary in an execlist world.

> > "With that abstraction:
> > - We can separate gen8 execlist from legacy gen8 code, and so should avoid
> > regressions (and so blocking mesa).
> > - Play around with different execlist approaches (guc, deferred execution,
> > whatever, ...) since it'll just be differen copy&pasta.
> 
> Ok, yes, I understand what you say. Playing around with other submission
> approaches will become harder and harder if we reuse too much legacy
> code. But can´t we do it as a cleanup on top of this code?

In my experience as soon as something is marked as feature complete
everyone moves on and the promised cleanups rarely materialize. I know
it's suboptimal for engineers, but I've run a bit out of hands to burn on
this particular stove. So generally rather vary of large scale and
invasive features.

The other point is that code tends to break least if you don't touch it.

> > "Finally I think our immediate focus for execlist enabling should be to get multi-
> > context execlists going, so that we can validate whether that'll work together
> > with mesa/hw contexts. If it doesn't, not much point in bothering.
> > The simplest way is to just block in the ->do_execbuf callback if we can't submit
> > the new context right away. It'll suck a bit perf-wise, but will get the hw going.
> 
> I think this is proved by now (QED!).

Yeah like I've said, that mail is old and actually preceeding silicon ;-)

> > So essentially what I'd prefer is we keep all the existing ringbuffer code as-is,
> > and throw in a complete new set (with fairly new datastructures) in for
> > execlists. Then only interaction points would be:
> > - Ring init either calls into legacy ring init or new fancy execlist ring
> >   init.
> > - Execbuf calls ring->do_submit with ring/engine, ctx object, batch state
> >   and otherwise doesn't care one bit how it will all get submitted.
> 
> That´s easy for execbufer, but what about the code that put things
> directly into the ringbuffer? I refer to constructions like:
> 
> intel_ring_begin()
> intel_ring_emit()
> ...
> intel_ring_emit()
> intel_ring_advance()
> 
> And also others like direct calls to i915_add_request outside the execbuffer path.

Ok, quick survey of intel_ring_begin:
- pageflip in intel_display.c. Imo doesn't hurt to have a new
  gen8_execlist version.
- Request emission in the do_execbuf ioctls, mostly by calling into
  functions in intel_ringbuffer.c
- ppgtt loading in i915_gem_gtt.c: I've thought we need a new fancy dance
  for these with execlists?
- L3-DPF loading: Atm not enabled on gen8, so not a problem.
- hw context switching in i915_gem_context.c: Supperseeded by execlists
  context I guess.

I have to admit that I've forgotten about the pageflips, otoh the plan
seems to be to switch to mmio flips anyway. Everything else is pre-gen8
legacy stuff, so better not to touch it.

> > - Context state needs to be frobbed a bit so that we create the correct
> >   backing object (i.e. legacy hw state or execlist ring+ctx). To make this
> >   feasible it's probably best to switch the implicit per-fd ctx to be
> >   per-ring. That way we still have the fixed hw-contxt->ring/engine
> >   relationship and don't need to play tricks with lazy context allocation
> >   (because those beats are so big with execlists).
> 
> Sorry, I don´t get this: no more implicit per-fd ctx? so everybody uses
> the Aliasing PPGTT by default again?

Actually more implicit per-fd ctx: One per ring object. I think with the
big execlist contexts that's the sanest option we seem to have really. All
these per-ring per-fd implicit ctx would share a per-fd ppgtt - we should
have all the lifetime fun set up correctly already so that n:1 ctx:ppgtt
relationships should work out.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch