[Intel-gfx] [PATCH 00/50] Execlists v2

Daniel Vetter daniel at ffwll.ch
Tue May 13 15:48:24 CEST 2014


On Fri, May 09, 2014 at 01:08:30PM +0100, oscar.mateo at intel.com wrote:
> From: Oscar Mateo <oscar.mateo at intel.com>
> 
> For a description of this patchset, please check the previous cover letter [1].
> 
> Together with this patchset, I'm also submitting an IGT test: gem_execlist [2].
> 
> v2:
> - Use the same context struct for all the different engines (suggested by Brad Volkin).
> - Rename write_tail to submit (suggested by Brad).
> - Simplify hardware submission id creation by using LRCA[31:11] as hwCtxId[18:0].
> - Non-render contexts are only two pages long (suggested by Damien Lespiau).
> - Disable HWSTAM, as no one listens to it anyway (suggested by Damien).
> - Do not write PDPs in the context every time, doing it at context creation time is enough.
> - Various kmalloc changes in gen8_switch_context_queue (suggested by Damien).
> - Module parameter to disable Execlists (as per Damien's patches).
> - Update the HW read pointer in CONTEXT_STATUS_PTR (suggested by Damien).
> - Fixed gpu reset and basic error reporting (verified by the new gem_error_capture test).
> - Fix for unexpected full preemption in some scenarios (instead of lite restore).
> - Ack the context switch interrupts as soon as possible (fix by Bob Beckett).
> - Move default context backing object creation to intel_init_ring.
> - Take into account the second BSD ring.
> - Help out the ctx switch interrupt handler by sharing the burden of squashing requests
>   together.
> 
> What I haven't done in this release:
> 
> - Get the context sizes from the CXT_SIZE registers, as suggested by Damien: the BSpec is full 
>   of holes with regards to the various CXT_SIZE registers, but the hardcoded values seem pretty
>   clear.
> - Allocate the ringbuffer together with the context, as suggested by Damien: now that every
>   context has NUM_RINGS ringbuffers on it, the advantage of this is not clear anymore.
> - Damien pointed out that we are missing the RS context restore, but I don't see any RS values
>   that are needed on the first execution (the first save should take care of these).
> - I have added a comment to clarify how the context population takes place (MI_LOAD_REGISTER_IMM
>   plus <reg,value> pairs) but I haven't provided names for each position (as Jeff Mcgee suggested)
>   or created an OUT_BATCH_REG_WRITE(reg, value) (as Daniel Vetter suggested).
> 
> [1]
> http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
> [2]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html

I've done a very cursory read of this, and my original comment from my
original high-level review on the internal list still stands: I'm freaked
out by how invasive this is into the existing ring code. All the changes
in i915_dma.c look very suspicious, since that code is for the legacy ums
crap and will _never_ run on bdw. Nor even on anything more modern than
g4x platforms (gen4).

Apparently that review has been lost/ignored, so I'll quote it in full:

"In reading through your patches the big thing which jumped out
is how the new execlist code is intermingled with the old legacy gen8
framebuffer stuff. Imo those two modes don't match at all, and we need to
resolve this mismatch with another abstraction layer ;-)

"I'm thinking of dev_priv->gt.do_execbuf which takes care of all the
lower-level request tracking and command submission. Feel free to massively
bikeshed the name. I'm thinking that we should move everything from
i915_gem_execbuffer_move_to_gpu to i915_gem_execbuffer_retire_commands into
that callback. With the current code that'd include the active list tracking,
maybe we should move that part out again. Otoh if we go wild with scheduling
and preemption, active bo tracking _will_ be rather different from previous
platforms. To support execlist we might need some more vfuncs in the
ringbuffer struct to support execlist specific stuff (submit execlist, enable
context switch interrupts), but a lot of the existing stuff will be redudant.
At the end (once things settles) we should then document which kind of
do_execbuf uses which kinds of low-level ring interfaces.

"With that abstraction:
- We can separate gen8 execlist from legacy gen8 code, and so should avoid
regressions (and so blocking mesa).
- Play around with different execlist approaches (guc, deferred execution,
whatever, ...) since it'll just be differen copy&pasta.

"Maybe we also need a similar abstraction for a wait_seqno/request interface.
But since has/fulsim can't simulate interrupts properly that discussion is a
bit moot for now.

"Finally I think our immediate focus for execlist enabling should be to get
multi-context execlists going, so that we can validate whether that'll work
together with mesa/hw contexts. If it doesn't, not much point in bothering.
The simplest way is to just block in the ->do_execbuf callback if we can't
submit the new context right away. It'll suck a bit perf-wise, but will get
the hw going.

"In summary execlists looks like a big&invasive feature. My aim here with those
ideas is purely risk mitigation: I want to avoid committing resources for a
(potentially) dead-end design, while still giving us enough flexibility to do
the necessary prototyping to figure out the right answers."

That mail was from Mar 25th, 2013.

So essentially what I'd prefer is we keep all the existing ringbuffer code
as-is, and throw in a complete new set (with fairly new datastructures) in
for execlists. Then only interaction points would be:
- Ring init either calls into legacy ring init or new fancy execlist ring
  init.
- Execbuf calls ring->do_submit with ring/engine, ctx object, batch state
  and otherwise doesn't care one bit how it will all get submitted.
- Context state needs to be frobbed a bit so that we create the correct
  backing object (i.e. legacy hw state or execlist ring+ctx). To make this
  feasible it's probably best to switch the implicit per-fd ctx to be
  per-ring. That way we still have the fixed hw-contxt->ring/engine
  relationship and don't need to play tricks with lazy context allocation
  (because those beats are so big with execlists).

Yes this means that a bunch of things like e.g. seqno emission and flusing
in intel_ringbuffer.c will be duplicated into intel_lrc.c. But imo that's
a feature, not a bug - I would be massively suprised if there's not
subtile differences here we need to be able to cope with.

Cheers, Daniel
> 
> Ben Widawsky (13):
>   drm/i915: s/for_each_ring/for_each_active_ring
>   drm/i915: for_each_ring
>   drm/i915: Extract trivial parts of ring init (early init)
>   drm/i915/bdw: Macro and module parameter for LRCs (Logical Ring
>     Contexts)
>   drm/i915/bdw: Rework init code for Logical Ring Contexts
>   drm/i915/bdw: A bit more advanced context init/fini
>   drm/i915/bdw: Populate LR contexts (somewhat)
>   drm/i915/bdw: Status page for LR contexts
>   drm/i915/bdw: Enable execlists in the hardware
>   drm/i915/bdw: LR context ring init
>   drm/i915/bdw: GEN8 new ring flush
>   drm/i915/bdw: Implement context switching (somewhat)
>   drm/i915/bdw: Print context state in debugfs
> 
> Michel Thierry (1):
>   drm/i915/bdw: Get prepared for a two-stage execlist submit process
> 
> Oscar Mateo (33):
>   drm/i915: Simplify a couple of functions thanks to for_each_ring
>   drm/i915: Extract ringbuffer destroy, make destroy & alloc outside
>     accesible
>   drm/i915: s/intel_ring_buffer/intel_engine
>   drm/i915: Split the ringbuffers and the rings
>   drm/i915: Rename functions that mention ringbuffers (meaning rings)
>   drm/i915: Plumb the context everywhere in the execbuffer path
>   drm/i915: s/__intel_ring_advance/intel_ringbuffer_advance_and_submit
>   drm/i915: Write a new set of context-aware ringbuffer management
>     functions
>   drm/i915: Final touches to ringbuffer and context plumbing and
>     refactoring
>   drm/i915: s/write_tail/submit
>   drm/i915: Introduce one context backing object per engine
>   drm/i915: Make i915_gem_create_context outside accessible
>   drm/i915: Option to skip backing object allocation during context
>     creation
>   drm/i915: Extract context backing object allocation
>   drm/i915/bdw: New file for Logical Ring Contexts and Execlists
>   drm/i915/bdw: Allocate ringbuffer backing objects for default global
>     LRC
>   drm/i915/bdw: Allocate ringbuffer for user-created LRCs
>   drm/i915/bdw: Deferred creation of user-created LRCs
>   drm/i915/bdw: Allow non-default, non-render, user-created LRCs
>   drm/i915/bdw: Execlists ring tail writing
>   drm/i915/bdw: Set the request context information correctly in the LRC
>     case
>   drm/i915/bdw: Always write seqno to default context
>   drm/i915/bdw: Write the tail pointer, LRC style
>   drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
>   drm/i915/bdw: Start queueing contexts to be submitted
>   drm/i915/bdw: Display execlists info in debugfs
>   drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
>   drm/i915/bdw: Document execlists and logical ring contexts
>   drm/i915/bdw: Avoid non-lite-restore preemptions
>   drm/i915/bdw: Make sure gpu reset still works with Execlists
>   drm/i915/bdw: Make sure error capture keeps working with Execlists
>   drm/i915/bdw: Help out the ctx switch interrupt handler
>   drm/i915/bdw: Enable logical ring contexts
> 
> Thomas Daniel (3):
>   drm/i915/bdw: Add forcewake lock around ELSP writes
>   drm/i915/bdw: LR context switch interrupts
>   drm/i915/bdw: Handle context switch events
> 
>  drivers/gpu/drm/i915/Makefile              |   1 +
>  drivers/gpu/drm/i915/i915_cmd_parser.c     |  16 +-
>  drivers/gpu/drm/i915/i915_debugfs.c        | 180 ++++++-
>  drivers/gpu/drm/i915/i915_dma.c            |  48 +-
>  drivers/gpu/drm/i915/i915_drv.h            |  97 +++-
>  drivers/gpu/drm/i915/i915_gem.c            | 172 ++++---
>  drivers/gpu/drm/i915/i915_gem_context.c    | 220 +++++---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |  85 ++--
>  drivers/gpu/drm/i915/i915_gem_gtt.c        |  41 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.h        |   2 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c      |  19 +-
>  drivers/gpu/drm/i915/i915_irq.c            | 102 ++--
>  drivers/gpu/drm/i915/i915_params.c         |   6 +
>  drivers/gpu/drm/i915/i915_reg.h            |  11 +
>  drivers/gpu/drm/i915/i915_trace.h          |  26 +-
>  drivers/gpu/drm/i915/intel_display.c       |  26 +-
>  drivers/gpu/drm/i915/intel_drv.h           |   4 +-
>  drivers/gpu/drm/i915/intel_lrc.c           | 729 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_overlay.c       |  12 +-
>  drivers/gpu/drm/i915/intel_pm.c            |  18 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.c    | 792 +++++++++++++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.h    | 196 ++++---
>  22 files changed, 2107 insertions(+), 696 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
> 
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch



More information about the Intel-gfx mailing list