[Intel-gfx] [PATCH 00/66] [v1] Full PPGTT minus soft pin

Tue Jul 2 00:36:13 CEST 2013

On Mon, Jul 01, 2013 at 11:39:30PM +0200, Daniel Vetter wrote:
> Hi Ben
> 
> So first things first: I rather like what the code looks like overall at
> the end. I've done a light read-through (by far not a full review) and
> besides a few bikesheds (all mentioned by mail already) the big thing is
> the 1:1 context:ppgtt address space relationship.
> 
> We've discussed this at length in private irc and agreed that we need to
> changes this two a n:1 relationship, so I'll just reiterate the reasons
> for that on the list:
> 
> - Current userspace expects that different contexts created on the same fd
>   all use the same address space (since there's really only one). So if we
>   want to no add a new ABI (and for testing I really think we want to
>   enable ppgtt on current unchanged userspace) we must keep that promise.
>   Hence we need to be able to point the different contexts created on an
>   fd all at the same (per-fd) address space.
> 
>   If we want we could later on extend this and allow a context to have a
>   private address space all of its own.
> 
> - HW contexts are pretty much the execution/scheduling primitive the hw
>   provides us with. On current platforms that's a bit a stretch, but it's
>   much clearer on future platforms.
> 
>   The equivalent concept on the cpu would be threads, and history
>   unanimously established that having multiple threads in the same process
>   is a useful concept. So I think allowing the same N:1 context:ppgtt
>   relation on gpus is sound. Of course that does not preclude other ways
>   to share individual buffers, which we already support with
>   flink/dma_buf.
> 
> With that big issue resolved there's only the bikesheds left. I'm not
> really worried about those, and in any case we already have some good
> discussions going on about them.

I've discussed this with the Mesa team, and I believe this is what they
want. I'll highlight the important bit for TL;DR people:
>   Hence we need to be able to point the different contexts created on an
>   fd all at the same (per-fd) address space.

If one wants a new address space, they will have to open a new fd.

> 
> So merge plan:
> 
> Since the context<->ppgtt relation needs to be revamped I'd like to punt
> on patches in that area at first and concentrate on merging the
> address_space/vma conversion and all the prep work leading to that first.

The main idea [ack, or nak] is the ppgtt becomes *ppgtt, and the context
will refcount it on context creation/destrution - that way all the
existing tricky context refcounting should still work, and the last
context to down ref a ppgtt will destroy it in it's handler.

> 
> We're already discussing some of the details, so I won't repeat that here
> again. I think for the overall approach I wouldn't bother with rebasing -
> shuffling around such a massive&invasive patch series is a major pain and
> rather error-prone.
> 
> Instead I'd just freeze the current branch as a known-working reference
> point and cherry-pick individual subseries out of it. Also some patches
> will need to be redone, but thanks to the benefit of hindsight I hope that
> v2 will have much less churn. Again I've tossed around a few ideas in
> replies to individual patches.
> 
> For prep work I see a few pieces:
> 
> - drm_mm related prep work, especially drm_mm.c core changes. I think
>   simply reordering all the relevant patches and resubmitting them (with
>   cc: dri-devel) is all we need to get those in (minus the oddball last
>   minute bikeshed).
> 
> - Small static inline functions to ease the pain of the conversion. I
>   think those need to be redone with as much foreshadowing as possible (so
>   that later patches don't suffer from overblown diff sizes). Maybe we
>   should also do the lookup-helpers upfront.
> 
> - Other prep work like killing obj->gtt_offset and stuff I've missed (but
>   which doesn't touch the context<->ppgtt relation).

I think this might be mergeable now. Did you try and have conflicts?

> 
> - I think we can also merge as much of the hw interfacing code as possible
>   up-front (or in paralle), e.g. converting the pde loading to LRI.

I was thinking this as well. I wasn't sure how you'd feel about the
idea. I'd really like that. This should also have no conflicts at
present (that I'm aware of).

> 
> Then we can slurp all the address_space/vma conversion patches. Imo the
> important part is to fledge out the commit message with the hindsight
> insights and explain a bit why certain stuff gets moved and why other
> stuff should stay where it currently is. We also need to review whether
> bisecting isn't broken anywhere, but since we don't yet add a real ppgtt I
> don't expect any issues.
> 
> Once that's all in we can revisit the context vs. ppgtt question and
> figure out how to make it work. I expect that we need to refcount ppgtt
> address spaces. But if we keep the per-fd contexts around (imo a sane idea
> anyway) we should get by when just the contexts hold references onto the
> ppgtt object. Since the context is guaranteed to stay around until the
> last object is inactive again that should ensure that the ppgtt address
> space stays around for long enough, too. And it would avoid the ugliness
> of adding more tricky active refcounting on top of the context/obj
> refcounting we already have.
> 
> Comments?

The high level plan sounds totally fine to me, I still have some issues
with how to break up the huge changes in the VMA/VM conversion since
many interfaces need to change, and that gets messy. In the end I like
what I did where I split out the 6 or 7 major changes for review. Would
you be okay with that again? Maybe if all goes well, it won't be a
problem.

> 
> Cheers, Daniel
> 
> On Thu, Jun 27, 2013 at 04:30:01PM -0700, Ben Widawsky wrote:
> > First, I don't think this whole series is ready for merge yet. It is
> > however ready for a review, and I think a lot of the prep patches in the
> > series could be merged to make my rebasing life a bit easier. I cannot
> > continue ignoring pretty much all emails/bugs as I have for the last
> > month to wrap up this series. The current state is on my IVB, things are
> > pretty stable. I've seen one unexplained hang, but I'm hopeful review
> > might help me uncover/explain.
> > 
> > This patch series introduces the next step in enabling full PPGTT, which
> > is per fd address space/context, and it also contains the previously
> > unmerged patches (some of which have been reworked, modified, or
> > rebased). In regards to the continued VMA changes, I think in these, the
> > delta with regard to the last posting is the bound list was per VM. It
> > is now global. I've also moved the active list to per VM.
> > 
> > Brand new in the series we take the previous series' context per fd
> > (with address space) one step further and actually switch the address
> > spaces when we do context switches. In order to make this happen, the
> > series continues to chip away at removing the notion of an object only
> > ever being bound into one address space via the struct
> > i915_address_space and struct i915_vma data structures which are really
> > abstractions for a page directory, and current mapped ptes respectively.
> > The error state is improved since the last series (though still some
> > work there is probably needed). It also serves to remove the notion of
> > the aliasing PPGTT since in theory everything bound into the GGTT
> > shouldn't benefit from an aliasing PPGTT (fact check).
> > 
> > With every context having it's own address space, and every open DRM fd
> > having it's own context, it's trivial on execbuf to lookup a context and
> > do the pinning in the proper address space. More importantly, it's
> > implicit that a context exists, which made this impossible to do
> > earlier.
> > 
> > *A note on patch ordering:* In order to work this series incrementally, the
> > final patch ordering is admittedly a little bit strange. I'm more than willing
> > to rework these as requested, but I'd really prefer not to do really heavy
> > reordering unless there is a major benefit, or of course to fix bugs.
> > 
> > # What is not in this patch series in the order I think we should handle it
> > (and I acknowledge some of this stuff is non-trivial):
> > 
> > ## Review + QA coverage
> > 
> > ## Porting to HSW
> > 
> > It shouldn't be too much extra work, if any, to add support. I haven't looked
> > into it yet.
> > 
> > ## Better vm/ppgtt info in error state collection
> > 
> > In particular, I want to dump all the PTEs at hang, and at the very least the
> > guilt PTEs.  This isn't difficult, and can be done with copypasta from the
> > existing dumper I have.
> > 
> > ## User space and the implications
> > 
> > Now that contexts are valid on all rings, userspace should begin emitting the
> > context for all rings if it expects both rings to be able to access both
> > objects in the same offset. The mesa change looks to be just one line which
> > simplies emits the context for batch->is_blit, I doubt libva is using contexts,
> > and SNA seems not to. The plan to support mesa will be to do the detection in
> > libdrm, and go ahead with the simple mesa one liner. I've been using the
> > oneliner if mesa for a while now, but we will need to support old user space in
> > the kernel. So there might be a bit of work even on the kernel side here. We
> > also need some IGT tools test updates. I have messy versions of these locally
> > already.
> > 
> > ## Performance data
> > 
> > I think it doesn't preclude preliminary review of the patches since the main
> > goal of PPGTT is really abourt security, correctness, and enabling other
> > things. I will update with some numbers after I work on it a bit more.
> > 
> > 
> > ## Testing on SNB
> > 
> > If our current code is correct, then I think these patches might work on SNB
> > as is, but it's untested. There is currently no way to disconnect contexts +
> > PPGTT from the whole thing; so if this doesn't work - we'll need rework some of
> > the code. I think it should just entail bringing back aliasing ppgtt, and not
> > doing the address space switch when switching contexts (aliasing ppgtt will
> > have a null switch_mm()).
> > 
> > ## Soft pin interface
> > 
> > I'd like to defer the discussion until these patches are merged.
> > 
> > ## On demand page table allocation
> > 
> > This is a potentially very useful optimization for at least the following
> > reasons:
> > * any app using contexts will have an extra set of page tables it isn't using;
> >   wasted memory
> > * Reduce DCLV to reduce pd fetch latency
> > * Allows Better use of a switch to default context for low memory situations
> >   (evicting unused page tables for example)
> > 
> > ## Per VMA cache levels/control
> > 
> > There are situations in the code where we have to flush the GPU pipeline in
> > order to change cache levels.  This should no longer be the case for unaffected
> > VMs (I think). The same may be true with domain tracking.
> > 
> > ## dmabuf/prime integration
> > 
> > I haven't looked into what's missing to support it. If I'm lucky, it just works.
> > 
> > 
> > 
> > With that, if you haven't already moved on, chanting tl;dr - all comments
> > welcome.
> > 
> > ---
> > 
> > Ben Widawsky (65):
> >   drm/i915: Remove extra error state NULL
> >   drm/i915: Extract error buffer capture
> >   drm/i915: make PDE|PTE platform specific
> >   drm/i915: Don't clear gtt with 0 entries
> >   drm/i915: Conditionally use guard page based on PPGTT
> >   drm/i915: Use drm_mm for PPGTT PDEs
> >   drm/i915: cleanup context fini
> >   drm/i915: Do a fuller init after reset
> >   drm/i915: Split context enabling from init
> >   drm/i915: destroy i915_gem_init_global_gtt
> >   drm/i915: Embed PPGTT into the context
> >   drm/i915: Unify PPGTT codepaths on gen6+
> >   drm/i915: Move ppgtt initialization down
> >   drm/i915: Tie context to PPGTT
> >   drm/i915: Really share scratch page
> >   drm/i915: Combine scratch members into a struct
> >   drm/i915: Drop dev from pte_encode
> >   drm/i915: Use gtt shortform where possible
> >   drm/i915: Move fbc members out of line
> >   drm/i915: Move gtt and ppgtt under address space umbrella
> >   drm/i915: Move gtt_mtrr to i915_gtt
> >   drm/i915: Move stolen stuff to i915_gtt
> >   drm/i915: Move aliasing_ppgtt
> >   drm/i915: Put the mm in the parent address space
> >   drm/i915: Move active/inactive lists to new mm
> >   drm/i915: Create a global list of vms
> >   drm/i915: Remove object's gtt_offset
> >   drm: pre allocate node for create_block
> >   drm/i915: Getter/setter for object attributes
> >   drm/i915: Create VMAs (part 1)
> >   drm/i915: Create VMAs (part 2) - kill gtt space
> >   drm/i915: Create VMAs (part 3) - plumbing
> >   drm/i915: Create VMAs (part 3.5) - map and fenceable tracking
> >   drm/i915: Create VMAs (part 4) - Error capture
> >   drm/i915: Create VMAs (part 5) - move mm_list
> >   drm/i915: Create VMAs (part 6) - finish error plumbing
> >   drm/i915: create an object_is_active()
> >   drm/i915: Move active to vma
> >   drm/i915: Track all VMAs per VM
> >   drm/i915: Defer request freeing
> >   drm/i915: Clean up VMAs before freeing
> >   drm/i915: Replace has_bsd/blt with a mask
> >   drm/i915: Catch missed context unref earlier
> >   drm/i915: Add a context open function
> >   drm/i915: Permit contexts on all rings
> >   drm/i915: Fix context fini refcounts
> >   drm/i915: Better reset handling for contexts
> >   drm/i915: Create a per file_priv default context
> >   drm/i915: Remove ring specificity from contexts
> >   drm/i915: Track which ring a context ran on
> >   drm/i915: dump error state based on capture
> >   drm/i915: PPGTT should take a ppgtt argument
> >   drm/i915: USE LRI for switching PP_DIR_BASE
> >   drm/i915: Extract mm switching to function
> >   drm/i915: Write PDEs at init instead of enable
> >   drm/i915: Disallow pin with full ppgtt
> >   drm/i915: Get context early in execbuf
> >   drm/i915: Pass ctx directly to switch/hangstat
> >   drm/i915: Actually add the new address spaces
> >   drm/i915: Use multiple VMs
> >   drm/i915: Kill now unused ppgtt_{un,}bind
> >   drm/i915: Add PPGTT dumper
> >   drm/i915: Dump all ppgtt
> >   drm/i915: Add debugfs for vma info per vm
> >   drm/i915: Getparam full ppgtt
> > 
> > Chris Wilson (1):
> >   drm: Optionally create mm blocks from top-to-bottom
> > 
> >  drivers/gpu/drm/drm_mm.c                   | 134 +++---
> >  drivers/gpu/drm/i915/i915_debugfs.c        | 215 ++++++++--
> >  drivers/gpu/drm/i915/i915_dma.c            |  25 +-
> >  drivers/gpu/drm/i915/i915_drv.c            |  57 ++-
> >  drivers/gpu/drm/i915/i915_drv.h            | 353 ++++++++++------
> >  drivers/gpu/drm/i915/i915_gem.c            | 639 +++++++++++++++++++++--------
> >  drivers/gpu/drm/i915/i915_gem_context.c    | 279 +++++++++----
> >  drivers/gpu/drm/i915/i915_gem_debug.c      |  11 +-
> >  drivers/gpu/drm/i915/i915_gem_evict.c      |  64 +--
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 138 ++++---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c        | 541 ++++++++++++++----------
> >  drivers/gpu/drm/i915/i915_gem_stolen.c     |  87 ++--
> >  drivers/gpu/drm/i915/i915_gem_tiling.c     |  21 +-
> >  drivers/gpu/drm/i915/i915_irq.c            | 197 ++++++---
> >  drivers/gpu/drm/i915/i915_trace.h          |  38 +-
> >  drivers/gpu/drm/i915/intel_display.c       |  40 +-
> >  drivers/gpu/drm/i915/intel_drv.h           |   7 -
> >  drivers/gpu/drm/i915/intel_fb.c            |   8 +-
> >  drivers/gpu/drm/i915/intel_overlay.c       |  26 +-
> >  drivers/gpu/drm/i915/intel_pm.c            |  65 +--
> >  drivers/gpu/drm/i915/intel_ringbuffer.c    |  32 +-
> >  drivers/gpu/drm/i915/intel_sprite.c        |   8 +-
> >  include/drm/drm_mm.h                       | 147 ++++---
> >  include/uapi/drm/i915_drm.h                |   1 +
> >  24 files changed, 2044 insertions(+), 1089 deletions(-)
> > 
> > -- 
> > 1.8.3.1
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Ben Widawsky, Intel Open Source Technology Center