vmwgfx leaking bo pins?

Zack Rusin zackr at vmware.com
Wed Mar 17 01:51:00 UTC 2021


On 3/15/21 6:35 PM, Thomas Hellström (Intel) wrote:
> 
> On 3/15/21 9:38 PM, Daniel Vetter wrote:
>> On Mon, Mar 15, 2021 at 6:57 PM Zack Rusin <zackr at vmware.com> wrote:
>>> On 3/12/21 5:06 AM, Thomas Hellström (Intel) wrote:
>>>> On 3/12/21 12:02 AM, Zack Rusin wrote:
>>>>>> On Mar 11, 2021, at 17:35, Thomas Hellström (Intel)
>>>>>> <thomas_os at shipmail.org> wrote:
>>>>>>
>>>>>> Hi, Zack
>>>>>>
>>>>>> On 3/11/21 10:07 PM, Zack Rusin wrote:
>>>>>>>> On Mar 11, 2021, at 05:46, Thomas Hellström (Intel)
>>>>>>>> <thomas_os at shipmail.org> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I tried latest drm-fixes today and saw a lot of these: Fallout from
>>>>>>>> ttm rework?
>>>>>>> Yes, I fixed this in d1a73c641afd2617bd80bce8b71a096fc5b74b7e it was
>>>>>>> in drm-misc-next in the drm-misc tree for a while but hasn’t been
>>>>>>> merged for 5.12.
>>>>>>>
>>>>>>> z
>>>>>>>
>>>>>> Hmm, yes but doesn't that fix trip the ttm_bo_unpin()
>>>>>> dma_resv_assert_held(bo->base.resv)?
>>>>> No, doesn’t seem to. TBH I’m not sure why myself, but it seems to be
>>>>> working fine.
>>>>>
>>>>>
>>>> With CONFIG_PROVE_LOCKING=y I see this:
>>>>
>>>> [    7.117145] [drm] FIFO at 0x00000000fe000000 size is 8192 kiB
>>>> [    7.117284] [drm] VRAM at 0x00000000e8000000 size is 131072 kiB
>>>> [    7.117291] INFO: trying to register non-static key.
>>>> [    7.117295] the code is fine but needs lockdep annotation.
>>>> [    7.117298] turning off the locking correctness validator
>>>>
>>>> Which will probably mask that dma_resv_assert_held(bo->base.resv)
>>>>
>>> Ah, yes, you're right. After fixing that I do see the
>>> dma_resv_assert_held triggered. Technically trivially fixable with
>>> ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but it's a little
>>> ugly that some places e.g. vmw_bo_unreference will require
>>> ttm_bo_reserve/ttm_bo_unreserve around ttm_bo_unpin but some won't e.g.
>>> vmw_mob_destroy won't because its lock is held by ttm_bo_delayed_delete
>>> without a very clear indication within the function which is which.
> 
> It looks like, like Daniel hints below, for the mob pagetable bos since 
> they are pinned and hence not on a LRU list, the parent bo is holding 
> the only reference, which is utilized in vmw_mob_unbind() to make sure 
> the tryreserve always succeeds. (unpin could be called in vmw_mob_unbind 
> for the pagetable bo just after fencing if necessary), 

That's a little tricky because then we'd have to pin on bind, otherwise 
after moves, which unbind, we wouldn't be pinned anymore. Plus bind 
would have to check if the bo is already pinned (i.e. it's the first 
time bind is called on it) since we pin on creation. Or just stop 
pinning on creation and do it explicitly in bind/unbind.

In general we probably should make pinning explicit in vmwgfx like in 
the other drivers because, as is, we sometimes have to treat pin_count 
as a boolean (e.g. in vmw_bo_pin_reserved) because often times the 
object has already been pinned during creation. This will break if we'll 
have drm utilities use pin/unpin.

That's a problem of our own making though, the issue of whether or not 
the bo has already been reserved is a little more muddy because having 
multiple nested pin/unpin sites (as long as they're consistent in their 
matching pin/unpin usage) isn't a problem, but having nested reserved 
calls is a problem. Although this might be a case of an old man yelling 
at the sun because I'm too lazy to remember whether or not each callback 
is already called reserved and instead should just add 
dma_resv_assert_held(bo->resv) in more places to clarify which is which 
or like you mention use tryreserve everywhere.

> But it's constructs like that, that really makes me think we shouldn't 
> need to reserve to unpin.

Yea, that would be pretty convenient.

z


More information about the dri-devel mailing list