[Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring submission mechanism

Mon Jun 23 21:10:10 CEST 2014

On Mon, Jun 23, 2014 at 07:35:38AM -0700, Mateo Lozano, Oscar wrote:
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > Sent: Monday, June 23, 2014 2:42 PM
> > To: Mateo Lozano, Oscar
> > Cc: Volkin, Bradley D; intel-gfx at lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical ring
> > submission mechanism
> > 
> > On Mon, Jun 23, 2014 at 01:36:07PM +0000, Mateo Lozano, Oscar wrote:
> > > > -----Original Message-----
> > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > Sent: Monday, June 23, 2014 2:27 PM
> > > > To: Mateo Lozano, Oscar
> > > > Cc: Volkin, Bradley D; intel-gfx at lists.freedesktop.org
> > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > ring submission mechanism
> > > >
> > > > On Mon, Jun 23, 2014 at 01:18:35PM +0000, Mateo Lozano, Oscar
> > wrote:
> > > > > > -----Original Message-----
> > > > > > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > > > > > Sent: Monday, June 23, 2014 2:14 PM
> > > > > > To: Mateo Lozano, Oscar
> > > > > > Cc: Volkin, Bradley D; intel-gfx at lists.freedesktop.org
> > > > > > Subject: Re: [Intel-gfx] [PATCH 26/53] drm/i915/bdw: New logical
> > > > > > ring submission mechanism
> > > > > >
> > > > > > On Mon, Jun 23, 2014 at 01:09:37PM +0000, Mateo Lozano, Oscar
> > > > wrote:
> > > > > > > So far, yes, but that´s only because I artificially made
> > > > > > > intel_lrc.c self-
> > > > > > contained, as Daniel requested. What if we need to execute
> > > > > > commands from somewhere else, like in intel_gen7_queue_flip()?
> > > > > > >
> > > > > > > And this takes me to another discussion: this logical ring vs
> > > > > > > legacy ring split
> > > > > > is probably a good idea (time will tell), but we should provide
> > > > > > a way of sending commands for execution without knowing if
> > > > > > Execlists are enabled or not. In the early series that was easy
> > > > > > because we reused the ring_begin, ring_emit & ring_advance
> > > > > > functions, but this is not the case anymore. And without this,
> > > > > > sooner or later somebody will break legacy or execlists (this
> > > > > > already happened last week, when somebody here was implementing
> > > > > > native sync without knowing
> > > > about Execlists).
> > > > > > >
> > > > > > > So, the questions is: how do you feel about a dev_priv.gt
> > > > > > > vfunc that takes a
> > > > > > context, a ring, an array of DWORDS and a BB length and does the
> > > > > > intel_(logical)_ring_begin/emit/advance based on
> > i915.enable_execlists?

There are 3 cases of non-execbuffer submissions that I can think of: flips,
render state, and clear-buffer (proposed patches on the list). I wonder if the
right approach might be to use batchbuffers with a small wrapper around the
dispatch_execbuffer/emit_bb_start vfuncs. Basically the rule would be to only
touch a ringbuffer from within the intel_engine_cs vfuncs, which always know
which set of functions to use.

For flips, we could use MMIO flips. Render state already uses the existing
dispatch_execbuffer() and add_request(). The clear code could potentially do
the same. There would obviously be some overhead in using a batch buffer for
what could end up being just a few commands. Perhaps the batch buffer pool
code from the command parser would help though.

> > > > > >
> > > > > > I'm still baffled by the design. intel_ring_begin() and friends
> > > > > > should be able to find their context (logical or legacy) from
> > > > > > the ring and
> > > > dtrt.
> > > > > > -Chris
> > > > >
> > > > > Sorry, Chris, I obviously don´t have the same experience with 915 you
> > have:
> > > > how do you propose to extract the right context from the ring?
> > > >
> > > > The rings are a set of buffers and vfuncs that are associated with a
> > context.
> > > > Before you can call intel_ring_begin() you must know what context
> > > > you want to operate on and therefore can pick the right
> > > > logical/legacy ring and interface for RCS/BCS/VCS/etc -Chris
> > >
> > > Ok, but then you need to pass some extra stuff down together with the
> > intel_engine_cs, either intel_context or intel_ringbuffer, right? Because
> > that´s exactly what I did in previous versions, plumbing intel_context
> > everywhere where it was needed (I could have plumbed intel_ringbuffer
> > instead, it really doesn´t matter). This was rejected for being too intrusive
> > and not allowing easy maintenance in the future.
> > 
> > Nope. You didn't redesign the ringbuffers to function as we expected but
> > tacked on extra information and layering violations.

Not sure what your proposed alternative is here Chris. I'll elaborate on what
I had proposed r.e. plumbing intel_ringbuffer instead of intel_context, which
might be along the lines of what you envision. At this point, we're starting
to go in circles, so I don't know if it's worth revisting beyond that.

The earlier versions of the series modified all of the intel_ring_* functions
to accept (engine, context) as parameters. At or near the bottom of the call
chain (e.g. in the engine vfuncs), we called a new function,
intel_ringbuffer_get(engine, context), to return the appropriate ringbuffer
in both legacy and lrc modes. However, a given struct intel_ringbuffer is
only ever used with a particular engine (for both legacy and lrc) and with
a particular context (for lrc only). So I suggested that we:

- Add a back pointer from struct intel_rinbuffer to intel_context (would only
  be valid for lrc mode)
- Move the intel_ringbuffer_get(engine, context) calls up to the callers
- Pass (engine, ringbuf) instead of (engine, context) to intel_ring_* functions
- Have the vfunc implemenations get the context from the ringbuffer where
  needed and ignore it where not

Looking again, we could probably add a back pointer to the intel_engine_cs as
well and then just pass around the ringbuf. In any case, we went back and forth
on this a bit and decided to just stick with passing (engine, context). Which we
then decided was too invasive, and here we are. So is that closer to what you're
thinking of, or did you have something else in mind?

Thanks,
Brad

> > -Chris
> 
> I know is no excuse, but as I said, I don´t know the code as well as you do. Let me explain to you the history of this one and maybe you can help me out discovering where I got it all wrong:
> 
> - The original design I inherited from Ben and Jesse created a completely new "struct intel_ring_buffer" per context, and passed that one on instead of passing one of the &dev_priv->ring[i] ones. When it was time to submit a context to the ELSP, they simply took it from ring->context. The problem with that was that creating an unbound number of "struct intel_ring_buffer" meant there were also an unbound number of things like "struct list_head active_list" or "u32 irq_enable_mask", which made little sense.
> - What we really needed, I naively thought, is to get rid of the concept of ring: there are no rings, there are engines (like the render engine, the vebox, etc...) and there are ringbuffers (as in circular buffer with a head offset, tail offset, and control information). So I went on and renamed the old "intel_ring_buffer" to "intel_engine_cs", then extracting a few things into a new "intel_ringbuffer" struct. Pre-Execlists, there was a 1:1 relationship between the ringbuffers and the engines. With Execlists, however, this 1:1 relationship is between the ringbuffers and the contexts.
> - I remember explaining this problem in a face-to-face meeting in the UK with some OTC people back in February (I think they tried to get you on the phone but they didn´t manage. I do remember they got Daniel though). Here, I proposed that an easy solution (easy, but maybe not right) was to plumb the context down: in Execlists mode, we would retrieve the ringbuffer from the context while, in legacy mode, we would get it from the engine. Everybody seemed to agree with this.
> - I worked on this premise for several versions that I sent to the mailing list for review (first the internal, then intel-gfx). Daniel only complained last month, when he pointed out that he asked, a long long time ago, for a completely separate execution path for Execlists. And this is where we are now...
> 
> And now back to the problem at hand: what do you suggest I do now (other than ritual seppuku, I mean)? I need to know the engine (for a lot of reasons), the ringbuffer (to know where to place commands) and the context (to know what to submit to the ELSP). I can spin it a thousand different ways, but I still need to know those three things. At the same time, I cannot reuse the old intel_ringbuffer code because Daniel won´t approve. What would you do?