[Intel-gfx] [PATCH] drm/i915: Disable full ppgtt by default

Ben Widawsky ben at bwidawsk.net
Thu Mar 6 22:55:05 CET 2014


On Thu, Mar 06, 2014 at 09:30:01PM +0100, Daniel Vetter wrote:
> On Thu, Mar 06, 2014 at 10:17:12AM -0800, Ben Widawsky wrote:
> > On Thu, Mar 06, 2014 at 12:14:21PM +0100, Daniel Vetter wrote:
> > > There are too many oustanding issues:
> > > 
> > > - Fence handling in the current code is broken. There's a patch series
> > >   from me, but it's blocked on and extended review (which includes
> > >   writing the testcases).
> > > 
> > > - IOMMU mapping handling is broken, we need to properly refcount it -
> > >   currently it gets destroyed when the first vma is unbound, so way
> > >   too early.
> > > 
> > > - There's a pending reset issue on snb. Since Mika's reset work and
> > >   full ppgtt have been pulled in in separate branches and ended up
> > >   intermittingly breaking each another it's unclear who's the exact
> > >   culprit here.
> > > 
> > > - We still have persistent evidince of crazy recursion bugs through
> > >   vma_unbind and ppgtt_relase, e.g.
> > > 
> > >   https://bugs.freedesktop.org/show_bug.cgi?id=73383
> > > 
> > >   This issue (and a few others meanwhile resolved) have blocked our
> > >   performance measuring/tuning group since 3 months.
> > > 
> > > - Secure batch dispatching is broken. This is blocking Brad Volkin's
> > >   command checker work since 3 months.
> > > 
> > > All these issues are confirmed to only happen when full ppgtt is
> > > enabled, falling back to aliasing ppgtt resolves them. But even
> > > aliasing ppgtt itself still has a regression:
> > > 
> > > - We currently unconditionally bind objects into the aliasing ppgtt,
> > >   which means all priviledged objects like ringbuffers are visible to
> > >   unpriviledged access again. On top of that this also breaks the
> > >   command checker for aliasing ppgtt, since it can't hide the
> > >   validated batch any more.
> > > 
> > > Furthermore topic/full-ppgtt has never been reviewed:
> > > 
> > > - Lifetime rules around vma unbinding/release are unclear, resulting
> > >   into this awesome hack called ppgtt_release. Which seems to take the
> > >   blame for most of the recursion fallout.
> > > 
> > > - Context/ring init works different on gpu reset than anywhere else.
> > >   Such differeneces have in the past always lead to really hard to
> > >   track down bugs.
> > > 
> > > - Aliasing ppgtt is treated in a bunch of places as a real address
> > >   space, but it isn't - the real address space is always the global
> > >   gtt in that case. This results in a bit a mess between contexts and
> > >   ppgtt object, further complication the context/ppgtt/vma lifetime
> > >   rules.
> > > 
> > > - We don't have any docs describing the overall concepts introduced
> > >   with full ppgtt. A short, concise overview describing vmas and some
> > >   of the strange bits around them (like the unbound vmas used by
> > >   execbuf, or the new binding rules) really is needed.
> > > 
> > > Note that a lot of the post topic/full-ppgtt merge fallout has already
> > > been addressed, this entire list here of 10 issues really only contains
> > > the still outstanding issues.
> > > 
> > > Finally the 3.15 merge window is approaching and I think we need to
> > > use the remaining time to ensure that our fallback option of using
> > > aliasing ppgtt is in solid shape. Hence I think it's time to throw the
> > > switch. While at it demote the helper from static inline status
> > > because really.
> > > 
> > > Cc: Ben Widawsky <ben at bwidawsk.net>
> > > Cc: Dave Airlie <airlied at gmail.com>
> > > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch>
> > 
> > [snip]
> > 
> > I want a concise list in the commit message so it's obvious as we fix
> > things if we've achieved the goal or not. If you want to have nice prose
> > describing the reason and/or your feelings, that's fine, but please put
> > it after the concise list.
> > 
> > I'll start what I want, and please fill in as needed. I believe this is
> > all 10 you mentioned.
> > * Fence handling broken: BUG #
> 
> We have patches from me, and Paulo is signed up to do the review+igt
> testcase on our review board.
> 
> > * IOMMU Broken: BUG #
> 
> No bug report thus far. I can create one if people want, but that's more
> work than firing up my damn ivb, enabling dmar again and fixing it. Imo
> the short description above should be good enough to understand the bug.
> 
> > * "Reset issue": Bug #
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=74100
> 
> Mika is working on this afaik.
> 
> > * Secure dispatch: Failing testcase: 
> 
> There's a Jira with full details: VIZ-3490
> 
> > * Bug: https://bugs.freedesktop.org/show_bug.cgi?id=73383
> 
> There's also been reports from Ville and iirc Chris also had pending
> issues, at least compared to dinq. Note that plain igt doesn't seem to be
> sufficient to provoke all these cases :(
> 
> > * Documentation
> 
> The above description is imo sufficient as a task description. There's
> also jira for this with more details: VIZ-3468
> 
> > Then there is fuzzy stuff that you "want" which need more clarification
> > on exactly what will satisfy you.
> > * Lifetime rules: No clear requirement from you.
> 
> Whomever tracks down the recursion bugs will likely have to sort his out.
> Essentially I want ppgtt_release gone, that entire function is just a
> giant hack.

I still don't see a concrete task here. Using the word, "essentially" is
not helpful for describing what your requirements are.

> 
> > * Context/ring init differences: What do you want?
> 
> http://lists.freedesktop.org/archives/intel-gfx/2013-November/036482.html
> 
> I really want this to be address before we start wreaking more havoc in
> this area, i.e. the patch series currently pending internally.
> 

I never agreed with you on this topic. I will try to find time to look
at internal patches that potentially fix it. But again, since I haven't
seen any real issue to fix, and I have had a very large deficit of time
, it would probably be great if someone else can help.

> > * Aliasing PPGTT real address treatment: What do you want?
> 
> i915_gem_context_free in i195_gem_context.c has a comment:
> 
> 	/* We refcount even the aliasing PPGTT to keep the code symmetric */
> 
> That comment is a lie and _not_ refcounting the aliasing ppgtt would allow
> us to ditch a bunch of hard-to-understand (at least for me) cornercase in
> the lifetime rules. At least that was the case when I've reviewed the
> ppgtt branch 3 months ago. Given that we have issues around the lifetimes
> of these suckers it can't help to have as much clarity as possible.
> 
> One really important thing you've dropped is fixing up the ppgtt binding
> rules for aliasing ppgtt. That's a regression compared to 3.14 and will
> render the cmd parser void with aliasing ppgtt. We need this asap both in
> time for the 3.15 merge and to finally unblock Brad Volkin's command
> parser.
> 
> I've signed myself up for this and started working on it.

Well, that's great! It's been so long since I wrote any of the code, and
a good deal has changed since then. It seemed like a good idea at the
time.

> -Daniel

I think you've misunderstood what I was requesting. I just want a
concise list in the commit (preferably at the top) with the issues. My
last email had *most* of what I wanted, sans any existing BUG # or JIRA
tasks you might have excluded. If we know people are working on it,
putting that information would be equally helpful. This will make it
really easy to review the list at a given time and determine when we're
done. If you prefer to put it in a bugzilla and link it at the top of a
commit, I am fine with that as well. In the past, it has been very lossy
when I need to read through long emails to decipher your requests.
That's not blame or criticism, it's likely my fault. Having it in the
commit message allows easy references from future commits, ie. This
addresses issue #3 from commit FOO.

It also probably does make sense to convert any internal JIRA tasks to
FDO tasks, since these are now all public, and your publicly issuing
reasons for disabling the feature that people on the mailing list may
not have access to... However, I agree it's not worth the time to do
this.

-- 
Ben Widawsky, Intel Open Source Technology Center



More information about the Intel-gfx mailing list