[PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

Wed Nov 25 10:57:13 UTC 2020

Am 25.11.20 um 11:36 schrieb Daniel Vetter:
> On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:
>> Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:
>>> Hi
>>>
>>> Am 24.11.20 um 15:09 schrieb Daniel Vetter:
>>>> On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:
>>>>> Hi
>>>>>
>>>>> Am 24.11.20 um 14:36 schrieb Christian König:
>>>>>> Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:
>>>>>>> [SNIP]
>>>>>>>>>>> First I wanted to put this into
>>>>>>>>>>> drm_gem_ttm_vmap/vunmap(), but then wondered why
>>>>>>>>>>> ttm_bo_vmap() doe not acquire the lock internally?
>>>>>>>>>>> I'd expect that vmap/vunmap are close together and
>>>>>>>>>>> do not overlap for the same BO.
>>>>>>>>>> We have use cases like the following during command submission:
>>>>>>>>>>
>>>>>>>>>> 1. lock
>>>>>>>>>> 2. map
>>>>>>>>>> 3. copy parts of the BO content somewhere else or patch
>>>>>>>>>> it with additional information
>>>>>>>>>> 4. unmap
>>>>>>>>>> 5. submit BO to the hardware
>>>>>>>>>> 6. add hardware fence to the BO to make sure it doesn't move
>>>>>>>>>> 7. unlock
>>>>>>>>>>
>>>>>>>>>> That use case won't be possible with vmap/vunmap if we
>>>>>>>>>> move the lock/unlock into it and I hope to replace the
>>>>>>>>>> kmap/kunmap functions with them in the near term.
>>>>>>>>>>
>>>>>>>>>>> Otherwise, acquiring the reservation lock would
>>>>>>>>>>> require another ref-counting variable or per-driver
>>>>>>>>>>> code.
>>>>>>>>>> Hui, why that? Just put this into
>>>>>>>>>> drm_gem_ttm_vmap/vunmap() helper as you initially
>>>>>>>>>> planned.
>>>>>>>>> Given your example above, step one would acquire the lock,
>>>>>>>>> and step two would also acquire the lock as part of the vmap
>>>>>>>>> implementation. Wouldn't this fail (At least during unmap or
>>>>>>>>> unlock steps) ?
>>>>>>>> Oh, so you want to nest them? No, that is a rather bad no-go.
>>>>>>> I don't want to nest/overlap them. My question was whether that
>>>>>>> would be required. Apparently not.
>>>>>>>
>>>>>>> While the console's BO is being set for scanout, it's protected from
>>>>>>> movement via the pin/unpin implementation, right?
>>>>>> Yes, correct.
>>>>>>
>>>>>>> The driver does not acquire the resv lock for longer periods. I'm
>>>>>>> asking because this would prevent any console-buffer updates while
>>>>>>> the console is being displayed.
>>>>>> Correct as well, we only hold the lock for things like command
>>>>>> submission, pinning, unpinning etc etc....
>>>>>>
>>>>> Thanks for answering my questions.
>>>>>
>>>>>>>> You need to make sure that the lock is only taken from the FB
>>>>>>>> path which wants to vmap the object.
>>>>>>>>
>>>>>>>> Why don't you lock the GEM object from the caller in the generic
>>>>>>>> FB implementation?
>>>>>>> With the current blitter code, it breaks abstraction. if vmap/vunmap
>>>>>>> hold the lock implicitly, things would be easier.
>>>>>> Do you have a link to the code?
>>>>> It's the damage blitter in the fbdev code. [1] While it flushes
>>>>> the shadow
>>>>> buffer into the BO, the BO has to be kept in place. I already
>>>>> changed it to
>>>>> lock struct drm_fb_helper.lock, but I don't think this is
>>>>> enough. TTM could
>>>>> still evict the BO concurrently.
>>>> So I'm not sure this is actually a problem: ttm could try to
>>>> concurrently
>>>> evict the buffer we pinned into vram, and then just skip to the next
>>>> one.
>>>>
>>>> Plus atm generic fbdev isn't used on any chip where we really care about
>>>> that last few mb of vram being useable for command submission (well atm
>>>> there's no driver using it).
>>> Well, this is the patchset for radeon. If it works out, amdgpu and
>>> nouveau are natural next choices. Especially radeon and nouveau support
>>> cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
>>> matter.
>>>
>>>> Having the buffer pinned into system memory and trying to do a
>>>> concurrent
>>>> modeset that tries to pull it in is the hard failure mode. And holding
>>>> fb_helper.lock fully prevents that.
>>>>
>>>> So not really clear on what failure mode you're seeing here?
>>> Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
>>> is running.) The fbdev BO is a few MiBs and not in use, so TTM would
>>> want to evict it if memory gets tight.
>>>
>>> What I have in mind is a concurrent modeset that requires the memory. If
>>> we do a concurrent damage blit without protecting against eviction,
>>> things go boom. Same for concurrent 3d graphics with textures, model
>>> data, etc.
>> Completely agree.
>>
>> This needs proper lock protection of the memory mapped buffer. Relying on
>> that some other code isn't run because we have some third part locks taken
>> is not sufficient here.
> We are still protected by the pin count in this scenario. Plus, with
> current drivers we always pin the fbdev buffer into vram, so occasionally
> failing to move it out isn't a regression.
>
> So I'm still not seeing how this can go boom.

Well as far as I understand it the pin count is zero for this buffer in 
this case here :)

I might be wrong on this because I don't know the FB code at all, but 
Thomas seems to be pretty clear that this is the shadow buffer which is 
not scanned out from.

Regards,
Christian.

>
> Now long term it'd be nice to cut everything over to dma_resv locking, but
> the issue there is that beyond ttm, none of the helpers (and few of the
> drivers) use dma_resv. So this is a fairly big uphill battle. Quick
> interim fix seems like the right solution to me.
> -Daniel
>
>> Regards,
>> Christian.
>>
>>> Best regards
>>> Thomas
>>>
>>>>> There's no recursion taking place, so I guess the reservation
>>>>> lock could be
>>>>> acquired/release in drm_client_buffer_vmap/vunmap(), or a
>>>>> separate pair of
>>>>> DRM client functions could do the locking.
>>>> Given how this "do the right locking" is a can of worms (and I think
>>>> it's
>>>> worse than what you dug out already) I think the fb_helper.lock hack is
>>>> perfectly good enough.
>>>>
>>>> I'm also somewhat worried that starting to use dma_resv lock in generic
>>>> code, while many helpers/drivers still have their hand-rolled locking,
>>>> will make conversion over to dma_resv needlessly more complicated.
>>>> -Daniel
>>>>
>>>>> Best regards
>>>>> Thomas
>>>>>
>>>>> [1] https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/drm_fb_helper.c?id=ac60f3f3090115d21f028bffa2dcfb67f695c4f2#n394
>>>>>
>>>>>> Please note that the reservation lock you need to take here is part of
>>>>>> the GEM object.
>>>>>>
>>>>>> Usually we design things in the way that the code needs to take a lock
>>>>>> which protects an object, then do some operations with the object and
>>>>>> then release the lock again.
>>>>>>
>>>>>> Having in the lock inside the operation can be done as well, but
>>>>>> returning with it is kind of unusual design.
>>>>>>
>>>>>>> Sorry for the noob questions. I'm still trying to understand the
>>>>>>> implications of acquiring these locks.
>>>>>> Well this is the reservation lock of the GEM object we are
>>>>>> talking about
>>>>>> here. We need to take that for a couple of different operations,
>>>>>> vmap/vunmap doesn't sound like a special case to me.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Best regards
>>>>>>> Thomas
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>> -- 
>>>>> Thomas Zimmermann
>>>>> Graphics Driver Developer
>>>>> SUSE Software Solutions Germany GmbH
>>>>> Maxfeldstr. 5, 90409 Nürnberg, Germany
>>>>> (HRB 36809, AG Nürnberg)
>>>>> Geschäftsführer: Felix Imendörffer
>>>>
>>>>
>>>>
>>>>
>>>>