DRM framebuffer refcounting bug... still present

Daniel Vetter daniel at ffwll.ch
Sat Oct 11 12:12:32 PDT 2014


On Sat, Oct 11, 2014 at 10:26:20AM +0100, Russell King - ARM Linux wrote:
> Daniel,
> 
> A while back you had a go at fixing the framebuffer reference counting.
> Unfortunately, I'm still seeing leaks, particularly with the framebuffer
> used with a mode set which is subsequently flipped.  It is only this
> framebuffer which is leaked, flips of an already flipped framebuffer do
> not.

Apparently gmail ate your reply and you have this fixed already. But since
your fixes haven't shown up yet in my inbox I wager a guess at what's off.

> Looking at what happens to this framebuffer, I see these refcount changes:
> 
> ADDFB
> - 0->1 - kref_init
> - 1->2 - drm_framebuffer_reference() creation

Aside: Since

commit 83f45fc360c8e16a330474860ebda872d1384c8c
Author: Daniel Vetter <daniel.vetter at ffwll.ch>
Date:   Wed Aug 6 09:10:18 2014 +0200

    drm: Don't grab an fb reference for the idr

the idr doesn't hold a ref any more, so ADDFB grabs one less ref and RMFB
frees one less. Doesn't change really anything in your analysis though.

> SETCRTC
> - 2->3 - drm_framebuffer_lookup() in drm_mode_setcrtc()
> - 3->4 - drm_framebuffer_reference() in armada_drm_crtc_mode_set()

I think the problem is that you grab a armada-private reference for the
new framebuffer here.

> - 4->5 - drm_framebuffer_reference() in drm_mode_set_config_internal()
> - 5->4 - drm_framebuffer_unreference() in drm_mode_setcrtc()
> * current scanout framebuffer has refcount of 4
> 
> PAGE_FLIP
> - 4->5 - drm_framebuffer_reference() in armada_drm_crtc_page_flip()
> - 5->4 - drm_framebuffer_unreference(old_fb) in drm_mode_page_flip_ioctl()
> - 4->3 - drm_framebuffer_unreference() in armada_drm_unref_work()

But in the page-flip you don't update that reference. The armda-private
refcounting you do is just the temporary reference for the async worker to
make sure the fb object stays alive until that work has completed.

My guess is that if you add the missing refcounting to
armada_drm_crtc_page_flip then this should be resolved, i.e. drop a
reference on the current (old) fb and grab a new one for the new fb that
will get pageflipped to. If I read your approaches below correctly (too
much travel atm so brain isn't working too well ...) I think you haven't
yet tried that one.

Another approach might be to drop the refcounting from
armada_drm_crtc_mode_set and rely on the drm-core for the long-lived
references for the current framebuffer. Then you'd only keep the temporary
references needed for the async cleanup worker (and maybe for clarity you
could push the refcount grabbing down into your finish_fb function). But I
haven't read your async cleanup code too closely, so might have missed
something important.

Cheers, Daniel

> 
> RMFB
> - 3->2 - __drm_framebuffer_unreference()
> - 2->1 - drm_framebuffer_unreference() - *LEAK*
> 
> For the framebuffer being flipped in:
> 
> ADDFB
> - 0->1 - kref_init
> - 1->2 - drm_framebuffer_reference() creation
> 
> PAGE_FLIP (incoming)
> - 2->3 - drm_framebuffer_lookup() in drm_mode_page_flip_ioctl()
> * current scanout framebuffer has refcount of 3 - different from setcrtc.
> 
> PAGE_FLIP (outgoing)
> - 3->4 - drm_framebuffer_reference() in armada_drm_crtc_page_flip()
> - 4->3 - drm_framebuffer_unreference(old_fb) in drm_mode_page_flip_ioctl()
> - 3->2 - drm_framebuffer_unreference() in armada_drm_unref_work()
> 
> RMFB
> - 2->1 - __drm_framebuffer_unreference()
> - 1->0 - drm_framebuffer_unreference() - *free*
> 
> Of course, this isn't going to be nice given appropriate timing of a
> subsequent SETCRTC removing the FB and dropping its refcounts.
> 
> 
> 
> 
> If I get rid of the referencing of the old framebuffer in PAGE_FLIP,
> then:
> 
> ADDFB
> - 0->1 - kref_init
> - 1->2 - drm_framebuffer_reference() creation
> 
> SETCRTC
> - 2->3 - drm_framebuffer_lookup() in drm_mode_setcrtc()
> - 3->4 - drm_framebuffer_reference() in armada_drm_crtc_mode_set()
> - 4->5 - drm_framebuffer_reference() in drm_mode_set_config_internal()
> - 5->4 - drm_framebuffer_unreference() in drm_mode_setcrtc()
> * current scanout framebuffer has refcount of 4
> 
> PAGE_FLIP
> - 4->3 - drm_framebuffer_unreference(old_fb) in drm_mode_page_flip_ioctl()
> - 3->2 - drm_framebuffer_unreference() in armada_drm_unref_work()
> 
> RMFB
> - 2->1 - __drm_framebuffer_unreference()
> - 1->0 - drm_framebuffer_unreference() - *free*
> 
> So that looks good, but let's look at what happens to the framebuffer
> being flipped in:
> 
> ADDFB
> - 0->1 - kref_init
> - 1->2 - drm_framebuffer_reference() creation
> 
> PAGE_FLIP (incoming)
> - 2->3 - drm_framebuffer_lookup() in drm_mode_page_flip_ioctl()
> * current scanout framebuffer has refcount of 3 - different from setcrtc.
> 
> PAGE_FLIP (outgoing)
> - 3->2 - drm_framebuffer_unreference(old_fb) in drm_mode_page_flip_ioctl()
> - 2->1 - drm_framebuffer_unreference() in armada_drm_unref_work()
> 
> RMFB
> - 1->0 - __drm_framebuffer_unreference() - *OOPS*
> - 0->-1 - drm_framebuffer_unreference() - *broken*
> 
> So that doesn't work - and it still isn't good because the framebuffer
> which was flipped in has a different refcount to the one going out.
> 
> 
> 
> 
> If I instead try taking a reference on the new framebuffer and dropping
> the previous ref in PAGE_FLIP:
> 
> ADDFB
> - 0->1 - kref_init
> - 1->2 - drm_framebuffer_reference() creation
> 
> SETCRTC
> - 2->3 - drm_framebuffer_lookup() in drm_mode_setcrtc()
> - 3->4 - drm_framebuffer_reference() in armada_drm_crtc_mode_set()
> - 4->5 - drm_framebuffer_reference() in drm_mode_set_config_internal()
> - 5->4 - drm_framebuffer_unreference() in drm_mode_setcrtc()
> * current scanout framebuffer has refcount of 4
> 
> PAGE_FLIP
> - 4->3 - drm_framebuffer_unreference() in armada_drm_crtc_page_flip()
> - 3->2 - drm_framebuffer_unreference(old_fb) in drm_mode_page_flip_ioctl()
> - 2->1 - drm_framebuffer_unreference() in armada_drm_unref_work()
> 
> RMFB
> - 1->0 - __drm_framebuffer_unreference() - *OOPS*
> - 0->-1 - drm_framebuffer_unreference() - (never happens)
> 
> I can see no way to solve this in driver code, other than tracking where
> the framebuffer came from, and conditionally adjusting the refcount to
> keep things sane.  The real problem here is the different behaviour in
> the DRM core code between the refcounts resulting on the active scanout
> framebuffer.
> 
> Any other thoughts how to solve this without having such refcount hacks
> in drivers?
> 
> Thanks.
> 
> -- 
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list