[PATCH 1/2] drm: refernce count event->completion

Daniel Vetter daniel at ffwll.ch
Thu Feb 9 17:20:29 UTC 2017


On Thu, Feb 09, 2017 at 04:39:29PM +0200, Jani Nikula wrote:
> On Wed, 04 Jan 2017, Daniel Vetter <daniel at ffwll.ch> wrote:
> > On Wed, Dec 21, 2016 at 12:08:45PM +0100, Maarten Lankhorst wrote:
> >> Op 21-12-16 om 11:36 schreef Chris Wilson:
> >> > On Wed, Dec 21, 2016 at 11:23:30AM +0100, Daniel Vetter wrote:
> >> >> When writing the generic nonblocking commit code I assumed that
> >> >> through clever lifetime management I can assure that the completion
> >> >> (stored in drm_crtc_commit) only gets freed after it is completed. And
> >> >> that worked.
> >> >>
> >> >> I also wanted to make nonblocking helpers resilient against driver
> >> >> bugs, by having timeouts everywhere. And that worked too.
> >> >>
> >> >> Unfortunately taking boths things together results in oopses :( Well,
> >> >> at least sometimes: What seems to happen is that the drm event hangs
> >> >> around forever stuck in limbo land. The nonblocking helpers eventually
> >> >> time out, move on and release it. Now the bug I tested all this
> >> >> against is drivers that just entirely fail to deliver the vblank
> >> >> events like they should, and in those cases the event is simply
> >> >> leaked. But what seems to happen, at least sometimes, on i915 is that
> >> >> the event is set up correctly, but somohow the vblank fails to fire in
> >> >> time. Which means the event isn't leaked, it's still there waiting for
> >> >> evevntually a vblank to fire. That tends to happen when re-enabling the
> >> >> pipe, and then the trap springs and the kernel oopses.
> >> >>
> >> >> The correct fix here is simply to refcount the crtc commit to make
> >> >> sure that the event sticks around even for drivers which only
> >> >> sometimes fail to deliver vblanks for some arbitrary reasons. Since
> >> >> crtc commits are already refcounted that's easy to do.
> >> > Or make the event a part of the atomic state?
> >> > -Chris
> >> >
> >> afaict crtc commit is already taken to wait for completion, so this patch makes sense.
> >> 
> >> There's just a minor typo in the subject. :)
> >> Not sure that release_commit should be a function pointer, regardless..
> >> 
> >> Reviewed-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> >
> > It didn't help the bug reporters against oopses (but the reporters are
> > supremely confusing, I have no idea what's really being tested, the
> > bugzilla is a mess), but I still think the patch is useful for more
> > robuestness, I dropped the cc: stable and applied it to drm-misc.
> 
> Agreed on the bug [1] being a mess. However, the bug has a reliable
> bisect result, the revert was posted by some of the reporters on the
> lists and in the bug, and now something that will not help anyone in
> v4.9 or v4.10 was pushed. :(

Latest report just says that the revert isn't helping either. I suspect
the report is a giantic conflagration of everything ever that kills
various reporters boxes. I still believe that the patch here fixes the
original bug, but there might be a lot more hiding.

It's at least seen quite a pile of testing, so I think it's sounds, and we
could cherry-pick it to dinf with cc: stable for 4.9+. Worst case it's not
going to help for the other problems.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list