[Intel-gfx] [PATCH 00/53] Execlists v3

Wed Jun 18 23:26:17 CEST 2014

On Fri, Jun 13, 2014 at 04:37:18PM +0100, oscar.mateo at intel.com wrote:
> From: Oscar Mateo <oscar.mateo at intel.com>
> 
> For a description of this patchset, please check the previous cover letters: [1] and [2].
> 
> The main difference with v2 is in how we get to the point of context submission: this time
> around, instead of massaging the legacy ringbuffer submission functions (mostly located in intel_ringebuffer.c), I have effectively created a separate path for Execlists submission
> in intel_lrc.c (even though everybody knows you shouldn't mess with split timelines). The
> alternative path is mostly a clone of the previous, but the idea is that it will differ
> significantly in the (so in exchange for duplicated code, we gain the ability to perform
> big changes without breaking legacy hardware support). This change was a suggestion by
> Daniel Vetter [3].
> 
> I know many patches here will be very controversial, so I would appreciate early feedback
> in the direction this effort is taking.

Ok, I've read through this and I like what I've seen. Some detail polish,
but we can do that later on on top. The only really serious issue is the
add_request vfunc, I'd prefer if we can remove that. Alternatively I'd
like to really understand why it's needed ...

Damien and Brad are signed up to shepherd the series through review. I'd
prefer if we fan that out a bit to the wider global team for better
knowledge osmosis (i.e. pls look for reviewers outside of Oscar's team).

One thing I've noticed though is how you've structured your patch series,
so I'll drop my guidelines here for the next big thing. Jesse might want
to include this into his "big patch series" best practices documentation.

1. Build new code outside-in
----------------------------

The major thing I've noticed is that very often you add something in a
patch which is only used later on, e.g. the entire lrc submission path is
created from the endpoints over almost 20 patches until it's finally
plugged into the execbuf submission path.

My puny brain is by far not good enough to hold that much context, I'll
give up after 1 patch. A better approach is to build up code from the
bottom-up. So in this case:
- Start with extracting the execbuf submit vfunc for legacy code.
- Fill in a bare-bones lrc submission function with mostly TODO comments.
- Fill them out step-by-step by recursing down until you hit the
  low-level ring flush/invalidate/foo/bar functions copy-pasted from
  intel_ringbuffer.c.

Another example would be a new ioctl: Start with a dummy ioctl then fill
it out. The key is to put every new piece of code you add to immediate
use, so that your reviewer can look at it, review it and then forget about
it.

3rd example (much smaller) is the frontbuffer tracking I've just done:
First I added new state with self-checks to make sure it's consistent.
This means the code is functional (so can be reviewed), but the state it
tracks isn't used yet. Then I added events for frontbuffer invalidation,
flushing and flippping, explaining in the cover letter and kerneldoc
comments the semantics I'm aiming for. Again reviewable as a functional,
stand-alone thing. Then I've rewired the psr code to be called from the
new code. At each step something working&self-consistent was added, which
means the details can be forgotten again when moving to the next chunk.

I know that it's tempting to do magic and spring the rabbit in the very
last patch onto your unsuspecting audience. But we try to explain the
tricks here, not showcase them ;-)

2. Always add users for new stuff in the same patch
---------------------------------------------------

Same idea holds at a smaller scale for individual patches: Don't extract
code without using the new functions - otherwise the reviewer has to have
2 patches open (since most don't even apply your patches, so can't check
what the code does when you've changed it).

The only exception where it's ok to split new code from it's first user is
for new register #defines. Reviewing that with Bspec is a task in itself
and many people prefer to have that split out from the actual code. But
besides that try hard to avoid to introduce anything that isn't used right
away in the same patch.

One image I have for a complete patch is a small math proof: You start
with the motivation and overall idea (commit message), then go through a
bunch of technicalities in lemmas (functions) and finally stitch it
together in the big theorem (i.e. the function that uses all the new
code). Especially when adding completely new code and as long as you order
functions to avoid forward declarations this is the natural order in the
resulting diff.

3. Split big patch series into sub-parts
----------------------------------------

Books have chapters and patch series should be split into parts. The big
reason for that is that if you don't you need 1 reviewer to look at
everything. Which doesn't scale and means review will take months. So
split it into different topics which should be as independent as possible.
For this series here we'd have:
- Rework of the context allocation/freeing and associated tracking.
- lrc init/teardown code in general.
- Reworking of execbuf submission.
- lrc submission implementation into the list.
- Handling interrupts and actual low-level submission.

It's hard to tell for sure with all the forward references you have but my
impression from a single read-through is that all these different topics
are smeared around a bit. Which means when the review is split there's a
lot of overlap, which slows things down.

So when designing the final series try to keep this in mind, potentially
reworking the patch split and ordering a bit. The sweet spot for such a
chapter is ime around 10 patches. That usually translates into a
reasonable review load of a few hours, which can be squeezed into ongoing
tasks without too much trouble. If you have bigger chunks it becomes much
harder to find available reviewers.

For submission you can either send chapters as individual threads or
explain the differen parts in the cover letter with a paragraph or so each
(instead of the per-chapter cover letter).

Anyway all very fluffy, but we need to start collecting these best
practices somewhere.

Comments and approches from other people on our team highly welcome.

Cheers, Daniel

> 
> The previous IGT test [4] still applies.
> 
> [1]
> http://lists.freedesktop.org/archives/intel-gfx/2014-March/042563.html
> [2]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044847.html
> [3]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/045139.html
> [4]
> http://lists.freedesktop.org/archives/intel-gfx/2014-May/044846.html
> 
> Ben Widawsky (2):
>   drm/i915/bdw: Implement context switching (somewhat)
>   drm/i915/bdw: Print context state in debugfs
> 
> Michel Thierry (1):
>   drm/i915/bdw: Two-stage execlist submit process
> 
> Oscar Mateo (48):
>   drm/i915: Extract context backing object allocation
>   drm/i915: Rename ctx->obj to ctx->render_obj
>   drm/i915: Add a dev pointer to the context
>   drm/i915: Extract ringbuffer destroy & make alloc outside accesible
>   drm/i915: Move i915_gem_validate_context() to i915_gem_context.c
>   drm/i915/bdw: Introduce one context backing object per engine
>   drm/i915/bdw: New file for Logical Ring Contexts and Execlists
>   drm/i915/bdw: Macro for LRCs and module option for Execlists
>   drm/i915/bdw: Initialization for Logical Ring Contexts
>   drm/i915/bdw: A bit more advanced context init/fini
>   drm/i915/bdw: Allocate ringbuffers for Logical Ring Contexts
>   drm/i915/bdw: Populate LR contexts (somewhat)
>   drm/i915/bdw: Deferred creation of user-created LRCs
>   drm/i915/bdw: Render moot context reset and switch when LRCs are
>     enabled
>   drm/i915/bdw: Don't write PDP in the legacy way when using LRCs
>   drm/i915/bdw: Skeleton for the new logical rings submission path
>   drm/i915/bdw: Generic logical ring init and cleanup
>   drm/i915/bdw: New header file for LRs, LRCs and Execlists
>   drm/i915: Extract pipe control fini & make init outside accesible
>   drm/i915/bdw: GEN-specific logical ring init
>   drm/i915/bdw: GEN-specific logical ring set/get seqno
>   drm/i915: Make ring_space more generic and outside accesible
>   drm/i915: Generalize intel_ring_get_tail
>   drm/i915: Make intel_ring_stopped outside accesible
>   drm/i915/bdw: GEN-specific logical ring submit context (somewhat)
>   drm/i915/bdw: New logical ring submission mechanism
>   drm/i915/bdw: GEN-specific logical ring emit request
>   drm/i915/bdw: GEN-specific logical ring emit flush
>   drm/i915/bdw: Emission of requests with logical rings
>   drm/i915/bdw: Ring idle and stop with logical rings
>   drm/i915/bdw: Interrupts with logical rings
>   drm/i915/bdw: GEN-specific logical ring emit batchbuffer start
>   drm/i915: Extract the actual workload submission mechanism from
>     execbuffer
>   drm/i915: Make move_to_active and retire_commands outside accesible
>   drm/i915/bdw: Workload submission mechanism for Execlists
>   drm/i915: Abstract the workload submission mechanism away
>   drm/i915/bdw: Write the tail pointer, LRC style
>   drm/i915/bdw: Avoid non-lite-restore preemptions
>   drm/i915/bdw: Make sure gpu reset still works with Execlists
>   drm/i915/bdw: Make sure error capture keeps working with Execlists
>   drm/i915/bdw: Help out the ctx switch interrupt handler
>   drm/i915/bdw: Do not call intel_runtime_pm_get() in an interrupt
>   drm/i915/bdw: Display execlists info in debugfs
>   drm/i915/bdw: Display context backing obj & ringbuffer info in debugfs
>   drm/i915: Extract render state preparation
>   drm/i915/bdw: Render state init for Execlists
>   drm/i915/bdw: Document Logical Rings, LR contexts and Execlists
>   drm/i915/bdw: Enable logical ring contexts
> 
> Sourab Gupta (1):
>   !UPSTREAM: drm/i915: Use MMIO flips
> 
> Thomas Daniel (1):
>   drm/i915/bdw: Handle context switch events
> 
>  drivers/gpu/drm/i915/Makefile                |    1 +
>  drivers/gpu/drm/i915/i915_debugfs.c          |  150 +-
>  drivers/gpu/drm/i915/i915_dma.c              |    1 +
>  drivers/gpu/drm/i915/i915_drv.h              |   60 +-
>  drivers/gpu/drm/i915/i915_gem.c              |   70 +-
>  drivers/gpu/drm/i915/i915_gem_context.c      |  242 +++-
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c   |  328 ++---
>  drivers/gpu/drm/i915/i915_gem_gtt.c          |    5 +
>  drivers/gpu/drm/i915/i915_gem_render_state.c |   39 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c        |   16 +-
>  drivers/gpu/drm/i915/i915_irq.c              |   53 +-
>  drivers/gpu/drm/i915/i915_params.c           |   11 +
>  drivers/gpu/drm/i915/i915_reg.h              |    5 +
>  drivers/gpu/drm/i915/intel_display.c         |  148 +-
>  drivers/gpu/drm/i915/intel_drv.h             |    6 +
>  drivers/gpu/drm/i915/intel_lrc.c             | 1902 ++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.h             |   99 ++
>  drivers/gpu/drm/i915/intel_renderstate.h     |   13 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c      |  101 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h      |   53 +-
>  20 files changed, 2974 insertions(+), 329 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.c
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc.h
> 
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch