vmwgfx leaking bo pins?

Daniel Vetter daniel at ffwll.ch
Mon Mar 15 20:38:57 UTC 2021


On Mon, Mar 15, 2021 at 6:57 PM Zack Rusin <zackr at vmware.com> wrote:
>
> On 3/12/21 5:06 AM, Thomas Hellström (Intel) wrote:
> >
> > On 3/12/21 12:02 AM, Zack Rusin wrote:
> >>
> >>> On Mar 11, 2021, at 17:35, Thomas Hellström (Intel)
> >>> <thomas_os at shipmail.org> wrote:
> >>>
> >>> Hi, Zack
> >>>
> >>> On 3/11/21 10:07 PM, Zack Rusin wrote:
> >>>>> On Mar 11, 2021, at 05:46, Thomas Hellström (Intel)
> >>>>> <thomas_os at shipmail.org> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I tried latest drm-fixes today and saw a lot of these: Fallout from
> >>>>> ttm rework?
> >>>> Yes, I fixed this in d1a73c641afd2617bd80bce8b71a096fc5b74b7e it was
> >>>> in drm-misc-next in the drm-misc tree for a while but hasn’t been
> >>>> merged for 5.12.
> >>>>
> >>>> z
> >>>>
> >>> Hmm, yes but doesn't that fix trip the ttm_bo_unpin()
> >>> dma_resv_assert_held(bo->base.resv)?
> >> No, doesn’t seem to. TBH I’m not sure why myself, but it seems to be
> >> working fine.
> >>
> >>
> > With CONFIG_PROVE_LOCKING=y I see this:
> >
> > [    7.117145] [drm] FIFO at 0x00000000fe000000 size is 8192 kiB
> > [    7.117284] [drm] VRAM at 0x00000000e8000000 size is 131072 kiB
> > [    7.117291] INFO: trying to register non-static key.
> > [    7.117295] the code is fine but needs lockdep annotation.
> > [    7.117298] turning off the locking correctness validator
> >
> > Which will probably mask that dma_resv_assert_held(bo->base.resv)
> >
>
> Ah, yes, you're right. After fixing that I do see the
> dma_resv_assert_held triggered. Technically trivially fixable with
> ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but it's a little
> ugly that some places e.g. vmw_bo_unreference will require
> ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but some won't e.g.
> vmw_mob_destroy won't because its lock is held by ttm_bo_delayed_delete
> without a very clear indication within the function which is which.

Not sure it applies here, but if the refcount is down to 0 we know
we're in destruction code and can't race with anything anymore, so
maybe we can lift the debug check.

Otoh I think at that point we might still be on lru lists, so the
rules become rather tricky whether it's really always legal to skip
the dma_resv_lock. But we could perhaps figure out something if it's
too annoying to have a consistent calling context in drivers.

I'm not a huge fan of dropping the requirement from unpin and
switching to atomic_t for the pin count, since pin/unpin is an
extremely slow path, adding complexity in how we protect stuff for a
function that's maybe called 60/s (for page flipping we pin/unpin)
doesn't strike me as the right balance. And atomic commit helpers are
explicitly designed to allow drivers to grab dma_resv_lock in their
->cleanup_fb hook, since that part is _not_ in the dma_fence
signalling critical path for the atomic flip. But if all else fails I
guess that's an option too.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the dri-devel mailing list